About the Role
This position involves developing and managing robust data infrastructure to support analytics and machine learning initiatives, ensuring high data quality and system reliability across platforms.
Responsibilities
- Design and implement data pipelines for large-scale data processing
- Build and maintain ETL workflows to ensure data accuracy and timeliness
- Collaborate with analytics and machine learning teams to define data requirements
- Optimize data storage and retrieval for performance and cost efficiency
- Monitor data systems for errors, latency, and failures
- Troubleshoot and resolve data pipeline issues
- Ensure data governance and compliance standards are met
- Work with cloud-based data platforms and services
- Support data warehouse architecture and modeling
- Develop automated testing for data workflows
- Document data processes and system designs
- Participate in code reviews and technical design discussions
- Contribute to data security and access control practices
- Evaluate and integrate new data technologies
- Support production deployments and incident response
- Improve data observability and monitoring tools
- Collaborate with cross-functional teams on data needs
- Maintain metadata management systems
- Ensure scalability of data solutions as data volumes grow
- Assist in capacity planning for data infrastructure
- Drive best practices in data engineering across teams
- Participate in agile development cycles
- Mentor junior engineers on data design and implementation
- Stay current with data engineering trends and tools
- Contribute to technical documentation and knowledge sharing
Compensation
Competitive salary and benefits package
Work Arrangement
Remote with South Africa time zone alignment
Team
Collaborative data engineering team focused on scalable data solutions
About the Team
- The team builds and maintains data infrastructure that powers analytics and machine learning across the organization.
- Engineers work closely with data scientists, analysts, and product teams to deliver trusted, scalable data solutions.
Technology Stack
- Primary tools include Python, SQL, Apache Airflow, Spark, and Google Cloud Platform.
- The team uses modern DevOps practices with Git, CI/CD, and infrastructure as code.
Not available