Design and manage ETL and ELT workflows using Databricks with PySpark and Delta Lake.
Enhance data pipeline efficiency, cost-effectiveness, and scalability on Google Cloud Platform.
Develop both batch and real-time data processing systems using Spark Streaming and related tools.
Build data solutions leveraging BigQuery, Cloud Storage, Dataflow, Cloud Composer, and Vertex AI.
Follow cloud security standards, including IAM policies, monitoring setups, and cost controls.
Create and maintain data models such as dimensional schemas and data vault architectures.
Establish data quality processes, validation checks, and automated testing frameworks.
Handle data versioning, governance, and lineage tracking using Unity Catalog or GCP Data Catalog.
Work with diverse teams to convert business needs into technical data designs.
Offer technical direction and promote engineering best practices across projects.
Support the creation of documentation, system diagrams, and internal knowledge resources.

Capgemini is hiring an Associate Data Engineer

Similar Jobs