Responsibilities
- Develop and maintain high-volume, low-delay data pipelines handling billions of events daily.
- Create scalable frameworks for ingesting and transforming data using technologies such as Kafka, Flink, Spark, Airflow, Beam, or GCP-based tools.
- Guarantee fault tolerance, precise event processing, and the ability to replay data streams.
- Define reusable templates and tools to accelerate pipeline development, deployment, and monitoring.
- Design data models for streaming, analytics, and serving layers using patterns like Bronze/Silver/Gold, event sourcing, CDC, and dimensional modeling.
- Collaborate with data architects to refine domain-specific data models with clear semantics and traceability from origin to usage.
- Optimize performance and reduce costs through effective partitioning, clustering, and indexing in Databricks, BigQuery, and similar systems.
- Integrate observability, data lineage tracking, and quality checks into all pipeline stages.
- Set and uphold service level agreements for data timeliness, delivery, and correctness.
- Work with platform engineering to implement monitoring, alerting, and automated recovery for essential data pathways.
- Lead technical initiatives within the data engineering group, including design evaluations, proof-of-concept projects, and mentoring team members.
- Partner with AI, platform, and product teams to build real-time, event-driven features for intelligent applications.
- Advocate for engineering excellence, efficiency, and simplicity in all technical deliverables.
Compensation
Competitive salary and benefits package
Work Arrangement
Full-time, remote or hybrid options available
Team
Part of a high-performing data engineering team focused on real-time data systems and event-driven architectures
Responsibilities
- Design, implement, and optimize high-throughput, low-latency data pipelines capable of processing billions of events per day.
- Build scalable data ingestion and transformation frameworks leveraging Kafka, Flink, Spark, Airflow, Beam, and/or GCP related data engineering technologies.
- Ensure fault-tolerance, exactly-once semantics, and replayability in all event processing flows.
- Establish reusable patterns and accelerators for pipeline development, deployment, and monitoring.
- Design data models across streaming, analytical, and serving layers (Bronze/Silver/Gold, event sourcing, CDC, and dimensional schemas).
- Partner with Data Architects to evolve domain-driven data models ensuring semantic alignment and traceability from source to consumption.
- Implement optimal partitioning, clustering, and indexing strategies for performance and cost efficiency across Databricks, BigQuery and other storage & processing engines.
- Embed observability, lineage, and data quality validations into every pipeline.
- Define and enforce SLAs and SLOs for data delivery, freshness, and accuracy.
- Collaborate with Platform Engineers to integrate metrics, alerts, and self-healing mechanisms for critical data flows.
- Act as a technical leader within the Data Engineering team: leading design reviews, conducting architecture spikes, and mentoring engineers.
- Work closely with AI, Platform, and Product teams to deliver event-driven capabilities that power intelligent products in real time.
- Promote craftsmanship, performance, and simplicity as core engineering values.
Not available