Responsibilities
- Design, develop, and operate scalable and maintainable data pipelines in the Azure Databricks environment
- Develop all technical artefacts as code, implemented in professional IDEs, with full version control and CI/CD automation
- Enable data-driven decision-making in Manufacturing & Engineering (M&E) and Quality by ensuring high data availability, quality, and reliability
- Implement data products and analytical assets using software engineering principles in close alignment with business domains and functional IT
- Apply rigorous software engineering practices such as modular design, test-driven development, and artifact reuse in all implementations
- Provide cross-functional data engineering support across Manufacturing & Engineering domains with a global delivery footprint
- Collaborate with business stakeholders, functional IT partners, product owners, architects, ML/AI engineers, and Power BI developers
- Operate within an agile, product-team structure embedded in an enterprise-scale Azure environment
Requirements
- Design scalable batch and streaming pipelines in Azure Databricks using PySpark and/or Scala
- Implement ingestion from structured and semi-structured sources (e.g., SAP, APIs, flat files)
- Build bronze/silver/gold data layers following the defined lakehouse layering architecture & governance
- Implement use-case driven dimensional models (star/snowflake schema) tailored to Manufacturing & Engineering (M&E) and Quality needs
- Ensure compatibility with reporting tools (e.g., Power BI) via curated data marts and semantic models
- Implement enterprise-level data warehouse models (domain-driven 3NF models) for Manufacturing & Engineering (M&E) and Quality data, closely aligned with data engineers for other business domains
- Develop and apply master data management strategies (e.g., Slowly Changing Dimensions)
- Develop automated data validation tests using frameworks
- Monitor pipeline health, identify anomalies, and implement quality thresholds
- Establish data quality transparency by defining and implementing meaningful data quality rules with source system and business stakeholders and implementing related reports
- Develop and structure pipelines using modular, reusable code in a professional IDE
- Apply test-driven development (TDD) principles with automated unit, integration, and validation tests
- Integrate tests into CI/CD pipelines to enable fail-fast deployment strategies
- Commit all artifacts to version control with peer review and CI/CD integration
- Work closely with Product Owners to refine user stories and define acceptance criteria
- Translate business requirements into data contracts and technical specifications
- Participate in agile events such as sprint planning, reviews, and retrospectives
- Document pipeline logic, data contracts, and technical decisions in markdown or auto-generated docs from code
- Align designs with governance and metadata standards (e.g., Unity Catalog)
- Track lineage and audit trails through integrated tooling
- Profile and tune data transformation performance
- Reduce job execution times and optimize cluster resource usage
- Refactor legacy pipelines or inefficient transformations to improve scalability
Work Arrangement
Remote (Worldwide)