Bengaluru, Karnataka, India

Western Digital is hiring a Data Quality Engineer

The Data Quality Engineer is responsible for establishing and maintaining data trust, governance, and certification across the enterprise Data Lakehouse platform. This role involves developing automated data quality frameworks, semantic modeling, and governance processes that enable reliable data consumption for business intelligence and machine learning applications on technologies including Databricks, Iceberg, AWS, Dremio, Atlan, and Power BI.

Responsibilities

  • Design and manage automated data validation systems across data pipeline layers from ingestion to curated datasets.
  • Create and execute tests for schema changes, data anomalies, record reconciliation, timeliness, and referential integrity.
  • Integrate data quality checks into Databricks environments using Delta Lake, Delta Live Tables, and Unity Catalog.
  • Implement Iceberg-based pipeline validations with support for schema evolution and time travel capabilities.
  • Define and manage data certification processes to ensure only approved datasets are used for analytics and AI.
  • Use metadata tools like Atlan and AWS Glue Catalog for managing data lineage, business glossaries, and access policies.
  • Develop a governed semantic layer on high-quality data to serve BI and AI/ML workloads.
  • Support Power BI reporting with certified metrics and self-service data access.
  • Work with data stewards to align data models with business-defined terminology in Atlan.
  • Certify datasets used in conversational analytics and natural language query systems.
  • Collaborate with AI teams to connect LLM-based query interfaces with Dremio, Databricks SQL, and Power BI.
  • Ensure LLM-generated insights are based on verified, high-integrity datasets to prevent inaccurate outputs.
  • Produce and maintain feature-ready datasets for machine learning training and inference in SageMaker Studio.
  • Partner with ML engineers to validate that input data meets all nine data quality dimensions.
  • Monitor for data drift and maintain model performance reliability over time.
  • Enforce continuous compliance with the nine data quality dimensions: accuracy, completeness, consistency, timeliness, validity, uniqueness, integrity, conformity, and reliability.

Requirements

  • Extensive experience in data engineering, data quality, or data governance roles.
  • Proficiency in Python, PySpark, and SQL for data processing and validation.
  • Hands-on experience with Databricks, including Delta Lake, Unity Catalog, and Delta Live Tables.
  • Practical knowledge of Apache Iceberg and its integration into data pipelines.
  • Strong familiarity with AWS data services such as S3, Glue ETL, Glue Catalog, Athena, EMR, Redshift, and SageMaker Studio.
  • Experience with Power BI, including semantic model design, DAX, and dataset certification.
  • Working knowledge of query engines like Trino or Presto.
  • Experience using data quality frameworks such as Great Expectations, Deequ, or Soda.

Nice to Have

  • Exposure to conversational analytics or natural language query systems over data lakehouses or Power BI.
  • Experience integrating LLM pipelines using tools like LangChain, OpenAI, or AWS Bedrock with enterprise data platforms.
  • Familiarity with data observability platforms such as Monte Carlo, Bigeye, DataDog, or Grafana.
  • Understanding of data compliance standards including GDPR, CCPA, and HIPAA.
  • Hold cloud certifications such as AWS Data Analytics Specialty or Databricks Certified Data Engineer.

Tech Stack

Databricks, Apache Iceberg, Amazon S3, AWS Glue ETL, AWS Glue Catalog, Amazon Athena, Amazon EMR, Amazon Redshift, SageMaker Studio, Dremio, Atlan, Power BI, Delta Lake, Delta Live Tables, Unity Catalog, Python, PySpark, SQL, Trino, Presto, Great Expectations, Deequ, Soda, Monte Carlo, Bigeye, DataDog, Grafana, LangChain, OpenAI, AWS Bedrock

Benefits

  • Equal employment opportunity and non-discrimination policy
  • Inclusive workplace culture that values diversity, belonging, respect, and individual contribution
  • Accommodation support for applicants with disabilities
  • Commitment to global innovation and technological advancement
  • Valuing the power and potential of diversity
  • Fostering an inclusive environment where every individual can thrive
  • Promoting belonging, respect, and meaningful contribution
  • Leading in global innovation and technology development
  • Centering problem-solving in organizational approach
  • Advancing social impact through technology

Additional Information

  • The application deadline is expected to be October 25, 2024, though the position may close earlier if a suitable candidate is identified.
  • The organization does not require payment as a condition of applying or receiving a job offer.
  • Accommodation requests can be submitted to jobs.accommodations@wdc.com with a description of the need and the relevant job title or requisition number.
  • Harassment and discrimination based on legally protected characteristics are strictly prohibited.
  • Compliance with Equal Employment Opportunity laws and regulations is required.
  • Candidates are encouraged to report unethical recruitment practices to the WD Ethics Helpline or compliance@wdc.com.
Required Skills
DatabricksApache IcebergAWS GlueAWS Glue CatalogSageMaker StudioDremioAtlanPower BIPythonPySparkSQLDelta LakeUnity CatalogDelta Live TablesData Quality DatabricksApache IcebergAWS (S3, Glue ETL, Glue Catalog, Athena, EMR, Redshift, SageMaker Studio)DremioAtlanPower BIDelta LakeDelta Live TablesUnity CatalogPythonPySparkSQLTrinoPrestoGreat Expectations
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Western Digital
Western Digital builds, sells, and provides data storage solutions including hard drives, solid-state drives, and flash memory products.
All jobs at Western Digital Visit website
Job Details
Department Data and Analytics
Category data
Posted 2 months ago