Mastercard is looking for a Lead Data Engineer to join our Foundry R&D team. You will help shape our innovation roadmap by exploring new technologies and building scalable, data-driven prototypes and products. At Mastercard, we power economies and empower people in 200+ countries and territories worldwide through secure, smart, and accessible technology.

What You'll Do

Drive Data Architecture: Own the data architecture and modeling strategy for AI projects. Define how data is stored, organized, and accessed. Select technologies, design schemas, and ensure systems support scalable AI and analytics workloads.
Build Scalable Data Pipelines: Lead development of robust ETL/ELT workflows and data models. Build pipelines that move large datasets with high reliability and low latency to support training and inference for AI and generative AI systems.
Ensure Data Quality & Governance: Oversee data governance and compliance with internal standards and regulations. Implement data anonymization, quality checks, lineage, and controls for handling sensitive information.
Provide Technical Leadership: Offer hands-on leadership across data engineering projects. Conduct code reviews, enforce best practices, and promote clean, well-tested code. Introduce improvements in development processes and tooling.
Cross-Functional Collaboration: Work closely with engineers, scientists, and product stakeholders. Scope work, manage data deliverables in agile sprints, and ensure timely delivery of data components aligned with project milestones.

What We're Looking For

Bachelor’s degree in Computer Science, Engineering, or a related field.
8–12+ years in data engineering or backend engineering, including senior or lead roles.
Experience designing end-to-end data systems, solving scale and performance challenges, integrating diverse sources, and operating pipelines in production.
Strong skills in Python and/or Java/Scala.
Deep experience with Spark, Hadoop, Hive/Impala, and Airflow.
Hands-on work with AWS, Azure, or GCP using cloud-native processing and storage services like S3, Glue, EMR, or Data Factory.
Expert in ETL/ELT design and implementation, working with diverse data sources, transformations, and targets.
Strong experience scheduling and orchestrating pipelines using Airflow or similar tools.
Advanced Python and/or Scala/Java skills and strong software engineering fundamentals including version control, CI, and code reviews.
Excellent SQL abilities, including performance tuning on large datasets.
Hands-on Spark experience with RDDs, DataFrames, and optimization.
Familiar with Hadoop components (HDFS, YARN), Hive/Impala, and streaming systems like Kafka or Kinesis.
Experience deploying data systems on AWS, Azure, or GCP.
Familiar with cloud data lakes, warehouses like Redshift, BigQuery, or Snowflake, and cloud-based processing engines like EMR, Dataproc, Glue, or Synapse.
Comfortable with Linux and shell scripting.
Knowledge of data privacy regulations, PII handling, access controls, encryption/masking, and data quality validation.
Strong communication skills and experience working with cross-functional teams.
Ability to document designs clearly and deliver iteratively using agile practices.

Nice to Have

Experience with AWS data engineering services, Databricks, and Lakehouse/Delta Lake architectures including bronze/silver/gold layers.
Familiarity with dbt, Great Expectations, containerization with Docker/Kubernetes, and monitoring tools like Grafana or cloud-native monitoring.
Experience implementing CI/CD pipelines for data workflows and using IaC tools like Terraform or CloudFormation.
Knowledge of data versioning (e.g., Delta Lake time-travel) and supporting continuous delivery for ML systems.
Motivation to explore emerging technologies, especially in AI and generative AI data workflows.
Understanding of data needs for machine learning, including dataset preparation, feature/label management, and supporting real-time or batch training pipelines.
Experience with feature stores or streaming data.

Technical Stack

Languages: Python, Java, Scala
Processing & Orchestration: Spark, Hadoop, Hive, Impala, Airflow
Cloud Platforms: AWS, Azure, GCP
Cloud Services: S3, Glue, EMR, Data Factory, Kafka, Kinesis, Redshift, BigQuery, Snowflake, Dataproc, Synapse
Data Tools: Databricks, Delta Lake, dbt, Great Expectations
Infrastructure & Ops: Linux, Docker, Kubernetes, Grafana, Terraform, CloudFormation

Team & Environment

You will be joining the Mastercard Foundry R&D team, a group dedicated to exploring new technologies and building innovative, data-driven solutions.

Mastercard is an equal opportunity employer.