What You'll Do

Design and implement robust data architectures that support large-scale processing and analytics. Develop optimized data workflows using PySpark and SQL, ensuring high performance and reliability across distributed systems. Collaborate with data scientists and analysts to understand requirements and deliver scalable solutions on cloud platforms, primarily within Azure and Databricks environments.

Take ownership of end-to-end data pipeline development, from ingestion to transformation and storage. Use Python to build reusable components and automate complex data operations. Continuously refine existing systems for efficiency, maintainability, and scalability.

Requirements

Minimum of six years of professional experience in data engineering roles
Proven expertise with PySpark for large-volume data processing
Strong proficiency in writing and tuning SQL queries
Hands-on background in Python for data pipeline development
Direct experience building solutions in Databricks
Familiarity with Hadoop and Hive for batch processing workloads
Working knowledge of version control systems such as GIT
Experience orchestrating workflows using tools like Airflow or Dagster
Solid track record working in Azure cloud environments

Technical Stack

PySpark, SQL, Python, Databricks, Hadoop, Hive, GIT, Airflow, Dagster, Azure

Work Mode

This is a globally remote-friendly position, open to candidates regardless of location. The role supports flexible scheduling with asynchronous collaboration across time zones.

Nexthire is hiring a Principal Data Engineer

What You'll Do

Requirements

Technical Stack

Work Mode

Similar Jobs

Data Engineer - Tieto Tech Consulting (m/f/d)

Data Architect

Senior Data Analyst - Fraud F/M (Hybrid)

Contract: Lead Analytics Engineer - Data Team

Fabric Engineer (Remote, Full-Time) [HR167]

Founding Senior Machine Learning Engineer

Related Articles

Become an AI Developer: Your Career Guide