Responsibilities
- Be fluent in the range of offline and online evaluation strategies, and when to apply the techniques over the lifecycle of development
- Have intuitions about how to specify eval pipelines succinctly using declarative syntax
- Understand the role of stratified datasets and ground truth labeling
- Appreciate the range of eval scoring schemes from human raters to automated LLMs-as-judge
Requirements
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, related field, or equivalent professional experience.
- 5 years of Software Engineering including significant time working on the evaluation of generative AI systems or other evaluations of ML model quality
- Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
- Familiarity with the architecture of large language models and their industry-standard APIs
Nice to Have
- Experience with labeling platforms (e.g., Label Studio, Scale AI, Toloka) and human-in-the-loop concerns such as rubric development and inter-rater agreement
- Exposure to MLOps practices such as model registry, feature store, or continuous evaluation
- Background in education technology or other human-centered AI applications
Benefits
- Competitive salaries
- Ample paid time off as needed
- 8 pre-scheduled Wellness Days in 2026
- Remote-first culture
- Generous parental leave
- An exceptional team that trusts you and gives you the freedom to do your best
- The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
- Opportunities to connect through affinity, ally, and social groups
- 401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life
Work Arrangement
Hybrid
Additional Information
- As part of our hiring process, we use a secure identity verification service through CLEAR® (in partnership with Greenhouse) to confirm that each applicant is who they claim to be. CLEAR® provides a safe, consistent way to confirm identity, helping protect both applicants and the company from impersonation or fraud.
- 24 months fixed-term


