Khan Academy is seeking a Senior Platform Engineer I, AI Evaluation for a 24-month fixed-term engagement. You will drive the evolution and extension of our internal evaluation framework for assessing the quality of AI-driven learner experiences. This is a software development role requiring deep domain experience with AI evaluation to support hill-climbing and data science workflows.
What You'll Do
- Gather internal requirements and secure buy-in for changes to the evaluation framework.
- Develop documentation and training materials for the evaluation framework.
- Work closely with ML data engineers and platform developers to help internal teams adopt an eval-driven development process.
- Support offline benchmark tests and online experiments.
What We're Looking For
- A Bachelor’s or Master’s degree in Computer Science, Data Engineering, a related field, or equivalent professional experience.
- 5 years of Software Engineering experience, including significant time evaluating generative AI systems or assessing ML model quality.
- Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect).
- Familiarity with the architecture of large language models and their industry-standard APIs.
- Fluency in offline and online evaluation strategies and when to apply them.
- Intuitions about how to specify eval pipelines succinctly using declarative syntax.
- Understanding of stratified datasets and ground truth labeling.
- Appreciation of eval scoring schemes from human raters to automated LLMs-as-judge.
Nice to Have
- Experience with labeling platforms (e.g., Label Studio, Scale AI, Toloka) and human-in-the-loop concerns such as rubric development and inter-rater agreement.
- Exposure to MLOps practices such as model registry, feature store, or continuous evaluation.
- Background in education technology or other human-centered AI applications.
Technical Stack
- Go, GraphQL, JavaScript, React, React Native, Redux, Python, SQL, LLMs
Team & Environment
You will work closely with ML data engineers and platform developers.
Benefits & Compensation
- Competitive salaries
- Ample paid time off as needed
- 8 pre-scheduled Wellness Days in 2026
- Remote-first culture
- Generous parental leave
- 401(k) + 4% matching
- Comprehensive insurance including medical, dental, vision, and life
- Opportunities to connect through affinity, ally, and social groups
- Compensation range: $137,871 - $172,339 USD / $186,306 - $232,883 CAN
Work Mode
This is a remote position open to candidates in Mountain View, CA, or remote locations within the Continental US, Hawaii, or Canada.
We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, gender, gender identity or expression, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status.






