The University of Chicago is hiring a Data Quality Engineer to ensure high-quality data and metadata is distributed to the cancer research community via the Genomic Data Commons. You will focus on data integrity and testing, designing QA infrastructure, automating frameworks, and collaborating across teams to validate pipelines and integrated datasets.
What You'll Do
- Drive the design of the data QA infrastructure and execution of testing protocols to validate pipelines, integrated datasets, and data products.
- Use a combination of exploratory, regression, and automated testing to ensure data quality standards.
- Assist in evaluation and development of data dictionaries and utilize data specification and code to validate data.
- Assist in data release planning and implementation based on stakeholder requirements and data availability.
- Proactively identify potential data issues and downstream impact. Identify existing data issues and perform research and root cause analyses to determine resolution.
- Establish and maintain processes and standards to improve data quality assurance and implement efficiencies in data management.
- Define measurements and metrics to conduct and present routine data reports to the project team and stakeholders.
- Participate in data acquisition and integration planning efforts including data modeling, data dictionary definitions, and data harmonization pipeline development.
- Develop a deep understanding of multiple genomic datasets and the technical data management software and processes of the underlying system.
- Define data quality and integrity criteria and develop a comprehensive data quality management plan.
- Contribute written knowledge and expertise to system documentation, user documentation, scientific manuscripts, reporting, grant proposals and reports, and presentation materials.
- Use a deep understanding of the data, scientific goals and methodology, and underlying biological and translational concepts to provide user support in high profile and troubling cases.
- Coordinate on user management and issue resolution with functional teams.
- Investigate, analyze and resolve day-to-day technical problems using standard procedures.
- Work with stakeholders to gather and analyze requirements for developmental programs.
- Perform code testing on components and work to ensure that appropriate implementation standards are met.
- Support and maintain existing applications. Work with web developers and respond to requests from users.
What We're Looking For
- College or university degree in a related field.
- Knowledge and skills developed through 2-5 years of work experience in a related job discipline.
Nice to Have
- Bachelor's degree in Computer Science, Informatics, Bioinformatics, Biological Sciences, or a related field.
- Minimum two (2) years of experience working in data quality and integrity engineering or testing.
- Experience with data modeling, analysis, design, development, testing, and documentation.
- Experience with data quality standards and practices.
- Experience writing and executing data-centric test cases to validate data.
- Experience writing database queries, reading and understanding database queries, and utilizing other database artifacts.
- Experience with Python.
- Experience working with Linux/Unix systems and basic shell scripting.
- Experience with biospecimen and clinical data curation.
- Experience with advanced high-throughput genomic technologies.
- Experience providing bioinformatics services or support.
- Experience using NCI datasets (TCGA, TARGET, and CGCI).
- Experience with graph and NoSQL databases.
Technical Stack
- Python
- Linux/Unix
- Shell Scripting
- Graph Databases
- NoSQL Databases
Team & Environment
Works across multiple teams including software engineers, bioinformaticians, and stakeholders.
Benefits & Compensation
- Salary range: $80,000.00 - $120,000.00
- Health insurance
- Retirement plans
- Paid time off
Work Mode
This is a hybrid position based in Chicago, IL.
The University of Chicago is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender, gender identity, or expression, national or ethnic origin, shared ancestry, age, status as an individual with a disability, military or veteran status, genetic information, or other protected classes under the law.



