Austin, TX, Remote (any location) Remote (Global) Employment $175k - $275k

Driver is hiring an Applied Data Scientist, LLM Evaluation

About the Role

The role involves building robust evaluation systems for large language models, analyzing model outputs, and translating findings into actionable insights to guide development and deployment.

Responsibilities

  • Develop evaluation methodologies tailored to large language model behaviors
  • Design experiments to assess model accuracy, coherence, and safety
  • Analyze model responses across diverse input conditions
  • Collaborate with engineering teams to integrate feedback loops
  • Identify edge cases and failure modes in model outputs
  • Create metrics to quantify model performance over time
  • Produce clear reports summarizing evaluation results
  • Work closely with research teams to refine model training objectives
  • Ensure evaluations align with ethical and safety standards
  • Iterate on testing frameworks based on new model capabilities

Nice to Have

  • Master’s or PhD in a relevant technical discipline
  • Direct experience evaluating large language models
  • Background in cognitive science or linguistics
  • Experience with A/B testing frameworks
  • Familiarity with model alignment research
  • Contributions to open-source NLP projects
  • Published work in machine learning or AI conferences

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexibility

Team

Collaborative team focused on advancing large language model performance

About the Team

  • The team focuses on improving the real-world performance of large language models through rigorous testing and analysis.
  • Members come from diverse technical and research backgrounds with shared interest in AI reliability.

What We Value

  • Rigorous empirical methods
  • Clear communication of technical findings
  • Commitment to ethical AI development

Available for qualified candidates

Need to work legally in Thailand?

Work permits without the paperwork nightmare

Thai immigration rules are strict and easy to get wrong. SVBL handles the bureaucracy — correct visa type, proper documentation, timely submissions. You focus on your work.

Right visa type for your situation
Document preparation & submission
Deadline tracking & renewals
Direct liaison with immigration
Talk to an expert
10+ years experience
About company
Driver

Driver is the best compiler for codebase context. Agentic development fails without context — Driver provides it.

Driver compiles codebase context ahead of time using a compiler-inspired architecture, delivering symbol-complete, deterministic, and structured documentation optimized for AI agents. This enables agents to start every session with comprehensive understanding of codebases, eliminating ad hoc exploration and manual context management.

The platform supports all programming languages and integrates with source code management systems like GitHub, GitLab, and Bitbucket. It automatically keeps context up to date across branches and repositories, providing accurate, scalable, and cross-codebase understanding for AI-powered development workflows.

All jobs at Driver Visit website
Job Details
Department Product & Engineering
Category data
Posted 2 hours ago