Remote (Country)

Khan Academy is hiring a Senior Platform Engineer I, AI Evaluation (24 months fixed-term)

About the Role

Khan Academy is seeking a Senior Platform Engineer I, AI Evaluation for a 24-month fixed-term engagement. You will drive the evolution and extension of our internal evaluation framework for assessing the quality of AI-driven learner experiences. This is a software development role requiring deep domain experience with AI evaluation to support hill-climbing and data science workflows.

What You'll Do

  • Gather internal requirements and secure buy-in for changes to the evaluation framework.
  • Develop documentation and training materials for the evaluation framework.
  • Work closely with ML data engineers and platform developers to help internal teams adopt an eval-driven development process.
  • Support offline benchmark tests and online experiments.

What We're Looking For

  • A Bachelor’s or Master’s degree in Computer Science, Data Engineering, a related field, or equivalent professional experience.
  • 5 years of Software Engineering experience, including significant time evaluating generative AI systems or assessing ML model quality.
  • Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect).
  • Familiarity with the architecture of large language models and their industry-standard APIs.
  • Fluency in offline and online evaluation strategies and when to apply them.
  • Intuitions about how to specify eval pipelines succinctly using declarative syntax.
  • Understanding of stratified datasets and ground truth labeling.
  • Appreciation of eval scoring schemes from human raters to automated LLMs-as-judge.

Nice to Have

  • Experience with labeling platforms (e.g., Label Studio, Scale AI, Toloka) and human-in-the-loop concerns such as rubric development and inter-rater agreement.
  • Exposure to MLOps practices such as model registry, feature store, or continuous evaluation.
  • Background in education technology or other human-centered AI applications.

Technical Stack

  • Go, GraphQL, JavaScript, React, React Native, Redux, Python, SQL, LLMs

Team & Environment

You will work closely with ML data engineers and platform developers.

Benefits & Compensation

  • Competitive salaries
  • Ample paid time off as needed
  • 8 pre-scheduled Wellness Days in 2026
  • Remote-first culture
  • Generous parental leave
  • 401(k) + 4% matching
  • Comprehensive insurance including medical, dental, vision, and life
  • Opportunities to connect through affinity, ally, and social groups
  • Compensation range: $137,871 - $172,339 USD / $186,306 - $232,883 CAN

Work Mode

This is a remote position open to candidates in Mountain View, CA, or remote locations within the Continental US, Hawaii, or Canada.

We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, gender, gender identity or expression, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status.

Required Skills
GoGraphQLJavaScriptReactReact NativeReduxPythonSQLLLMsAI EvaluationPlatform EngineeringAPI DesignSystem ArchitectureDistributed Systems
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Khan Academy

Khan Academy is a nonprofit with the mission to deliver a free, world-class education to anyone, anywhere. Our proven learning platform offers free, high-quality supplemental learning content and practice that cover Pre-K - 12th grade and early college core academic subjects, focusing on math and science.

Visit website
Job Details
Category infrastructure
Posted 3 months ago