Capgemini is looking for an Associate Data Engineer to design, build, and optimize scalable data pipelines and analytics solutions on Databricks and Google Cloud Platform. You will work closely with data analysts, data scientists, and business stakeholders to deliver reliable, high-quality data products.
What You'll Do
- Design, build, and maintain ETL/ELT pipelines using Databricks (PySpark, Delta Lake).
- Optimize pipelines for performance, cost, and scalability within GCP.
- Develop batch and streaming data processes using Spark Streaming.
- Implement data solutions using BigQuery, Cloud Storage, Dataflow, Cloud Composer, and Vertex AI.
- Apply best practices for cloud security, IAM configuration, monitoring, and cost optimization.
- Build and maintain data models including dimensional and data vault structures.
- Implement data quality frameworks, validation rules, and automated testing.
- Manage data versioning, governance, and lineage using tools like Unity Catalog or GCP Data Catalog.
- Collaborate with cross-functional teams to translate business requirements into technical designs.
- Provide technical guidance and support engineering best practices.
- Contribute to documentation, architectural diagrams, and knowledge-sharing materials.
What We're Looking For
- 3+ years of experience as a Data Engineer or similar role.
- Strong hands-on experience with Databricks, including PySpark / Spark, Delta Lake, Databricks Workflows / Jobs.
- Proficiency with GCP, including BigQuery, Cloud Storage, Dataflow or Dataproc.
- Strong coding skills in Python and SQL.
- Strong understanding of distributed systems, data warehousing, and data architecture principles.
- Experience with CI/CD tools such as GitHub, GitLab, or Azure DevOps.
Nice to Have
- Databricks or GCP certifications (e.g., Data Engineer, Architect).
- Experience with Terraform or other Infrastructure-as-Code tools.
- Knowledge of ML workflows or MLOps frameworks.
- Familiarity with data governance tools such as Unity Catalog, Great Expectations, and dbt.
- Excellent problem-solving and analytical abilities.
- Strong communication skills with the ability to collaborate effectively across technical and non-technical teams.
- A growth mindset with a passion for continuous learning, innovation, and staying current with emerging technologies.
Technical Stack
- Databricks, PySpark, Spark, Delta Lake
- Google Cloud Platform (GCP), BigQuery, Cloud Storage, Dataflow, Dataproc, Cloud Composer, Vertex AI
- Python, SQL
- Terraform, Unity Catalog, Great Expectations, dbt
- GitHub, GitLab, Azure DevOps
Benefits & Compensation
- Compensation: $46,000 to $111,000
- Paid time off (Vacation: 12-25 days, Company paid holidays, Personal Days, Sick Leave)
- Medical, dental, and vision coverage
- Retirement savings plans (e.g., 401(k) in the U.S., RRSP in Canada)
- Life and disability insurance
- Employee assistance programs
Capgemini is an Equal Opportunity Employer encouraging inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to race, national origin, gender identity/expression, age, religion, disability, sexual orientation, genetics, veteran status, marital status or any other characteristic protected by law.



