What You'll Do
Design and implement robust data architectures that support large-scale processing and analytics. Develop optimized data workflows using PySpark and SQL, ensuring high performance and reliability across distributed systems. Collaborate with data scientists and analysts to understand requirements and deliver scalable solutions on cloud platforms, primarily within Azure and Databricks environments.
Take ownership of end-to-end data pipeline development, from ingestion to transformation and storage. Use Python to build reusable components and automate complex data operations. Continuously refine existing systems for efficiency, maintainability, and scalability.
Requirements
- Minimum of six years of professional experience in data engineering roles
- Proven expertise with PySpark for large-volume data processing
- Strong proficiency in writing and tuning SQL queries
- Hands-on background in Python for data pipeline development
- Direct experience building solutions in Databricks
- Familiarity with Hadoop and Hive for batch processing workloads
- Working knowledge of version control systems such as GIT
- Experience orchestrating workflows using tools like Airflow or Dagster
- Solid track record working in Azure cloud environments
Technical Stack
PySpark, SQL, Python, Databricks, Hadoop, Hive, GIT, Airflow, Dagster, Azure
Work Mode
This is a globally remote-friendly position, open to candidates regardless of location. The role supports flexible scheduling with asynchronous collaboration across time zones.
