Responsibilities
- Lead the creation and deployment of internal SDKs and self-service tools that empower distributed engineering teams to independently manage data ingestion and transformation.
- Transition focus from building individual data pipelines to developing scalable platform solutions, establishing reusable methods for handling both batch and real-time event data.
- Own the cost efficiency of the Databricks environment by optimizing Spark execution plans, shuffle partitions, and implementing auto-scaling to control DBU usage.
- Maintain platform performance as data volumes increase, balancing latency, throughput, and cloud infrastructure costs.
- Enforce Schema-on-Write validation and implement Data Contracts to guarantee data from numerous internal services meets high quality standards prior to entering the Bronze layer.
- Collaborate with Data Architects and Data Stewards to uphold data privacy, including PII handling, security protocols, and end-to-end metadata traceability across the global data ecosystem.
- Promote adoption of AI-powered development tools such as GitHub Copilot and Cursor to speed up development cycles and enhance code quality.
- Guide engineers in best practices for distributed computing through in-depth code reviews emphasizing scalability and long-term maintainability.
Work Arrangement
Remote (Worldwide)