Responsibilities
- Evolve and maintain pipelines for transforming raw trace data into ML-ready datasets.
- Clean, normalize, and enrich data while preserving semantic meaning and consistency.
- Prepare and format datasets for human labeling, and integrate results into ML datasets.
- Develop and maintain scalable ETL pipelines using Airflow, DBT, Go, and Python running on GCP
- Implement automated tests and validation to detect data drift or labeling inconsistencies.
- Collaborate with AI engineers, platform developers, and product teams to define data strategies in support of continuously improving the quality of Khan’s AI-based tutoring.
- Contribute to shared tools and documentation for dataset management and AI evaluation.
- Inform our data governance strategies for proper data retention, PII controls/scrubbing, and isolation of particularly sensitive data such as offensive test imagery
Requirements
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, related field, or equivalent professional experience.
- 5 years of Software Engineering experience, including significant time working with large ML datasets
- Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect).
- Experience with data versioning tools (e.g., DVC, LakeFS) and cloud storage systems.
- Familiarity with machine learning workflows — from training data preparation to evaluation.
- Familiarity with the architecture and operation of large language models, and a nuanced understanding of their capabilities and limitations.
- Attention to detail and an obsession with data quality and reproducibility.
Nice to Have
- Experience with labeling platforms (e.g., Label Studio, Scale AI, Toloka) or human-in-the-loop systems.
- Understanding of ML evaluation techniques, including prompt-based and generative model metrics.
- Exposure to MLOps practices such as model registry, feature store, or continuous evaluation.
- Background in education technology or other human-centered AI applications.
Benefits
- Competitive salaries
- Ample paid time off as needed – Your well-being is a priority
- 8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
- Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
- Generous parental leave
- An exceptional team that trusts you and gives you the freedom to do your best
- The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
- Opportunities to connect through affinity, ally, and social groups
- 401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life
Additional Information
- As part of our hiring process, we use a secure identity verification service through CLEAR® (in partnership with Greenhouse) to confirm that each applicant is who they claim to be. CLEAR® provides a safe, consistent way to confirm identity, helping protect both applicants and the company from impersonation or fraud.


