About the Role
The Data Engineer will develop and maintain data integration pipelines using Dremio, Spark, and Airflow in a fully remote setup. The role emphasizes building reliable data workflows, optimizing performance, and supporting scalable data solutions over a six-month contract period.
Responsibilities
- Design and implement data pipelines using Spark and Airflow
- Configure and manage Dremio for data acceleration and access
- Ensure data accuracy and reliability across systems
- Collaborate with analysts and engineers on data needs
- Monitor pipeline performance and resolve failures
- Document data workflows and system architecture
- Support integration of new data sources
- Optimize data storage and retrieval processes
- Maintain data lineage and transformation logic
- Participate in code reviews and technical planning
Requirements
- Proficiency in Dremio for data lakehouse operations
- Hands-on experience with Apache Spark for distributed data processing
- Strong background in Airflow for workflow orchestration
- Experience building and maintaining ETL pipelines
- Solid understanding of data warehousing concepts
- Familiarity with cloud platforms such as AWS or GCP
- Knowledge of SQL and schema design principles
- Ability to optimize query performance
- Experience with version control systems like Git
- Understanding of data governance and security practices
- Skill in troubleshooting data pipeline issues
- Experience with containerization tools such as Docker
- Familiarity with CI/CD pipelines for data applications
- Ability to work independently in a remote setting
- Strong communication skills for technical collaboration
Nice to Have
- Prior work with Dremio in production environments
- Experience with data cataloging and metadata management
- Knowledge of Python for data engineering tasks
- Background in big data platforms like Databricks
- Exposure to real-time data streaming technologies
Compensation
Market rate for 6-month contract, details provided upon qualification
Work Arrangement
100 percent remote
Team
Collaborative environment with data teams focused on scalable data solutions
Contract Duration
Six months with potential for extension based on project needs
Technology Stack
- Primary tools: Dremio, Apache Spark, Apache Airflow
- Supporting technologies: SQL, Git, cloud infrastructure
Not available


