Austin, TX, Remote (any location) Remote (Global) Employment $175k - $275k

Driver is hiring an Applied Data Scientist, LLM Evaluation

About the Role

The role involves building robust evaluation systems for large language models, analyzing model outputs, and translating findings into actionable insights to guide development and deployment.

Responsibilities

Develop evaluation methodologies tailored to large language model behaviors
Design experiments to assess model accuracy, coherence, and safety
Analyze model responses across diverse input conditions
Collaborate with engineering teams to integrate feedback loops
Identify edge cases and failure modes in model outputs
Create metrics to quantify model performance over time
Produce clear reports summarizing evaluation results
Work closely with research teams to refine model training objectives
Ensure evaluations align with ethical and safety standards
Iterate on testing frameworks based on new model capabilities

Nice to Have

Master’s or PhD in a relevant technical discipline
Direct experience evaluating large language models
Background in cognitive science or linguistics
Experience with A/B testing frameworks
Familiarity with model alignment research
Contributions to open-source NLP projects
Published work in machine learning or AI conferences

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexibility

Team

Collaborative team focused on advancing large language model performance

About the Team

The team focuses on improving the real-world performance of large language models through rigorous testing and analysis.
Members come from diverse technical and research backgrounds with shared interest in AI reliability.

What We Value

Rigorous empirical methods
Clear communication of technical findings
Commitment to ethical AI development

Available for qualified candidates

Need to work legally in Thailand?

Work permits without the paperwork nightmare

Thai immigration rules are strict and easy to get wrong. SVBL handles the bureaucracy — correct visa type, proper documentation, timely submissions. You focus on your work.

Right visa type for your situation

Document preparation & submission

Deadline tracking & renewals

Direct liaison with immigration

Talk to an expert

10+ years experience

About company

Driver is the best compiler for codebase context. Agentic development fails without context — Driver provides it.

Driver compiles codebase context ahead of time using a compiler-inspired architecture, delivering symbol-complete, deterministic, and structured documentation optimized for AI agents. This enables agents to start every session with comprehensive understanding of codebases, eliminating ad hoc exploration and manual context management.

The platform supports all programming languages and integrates with source code management systems like GitHub, GitLab, and Bitbucket. It automatically keeps context up to date across branches and repositories, providing accurate, scalable, and cross-codebase understanding for AI-powered development workflows.

All jobs at Driver Visit website

Job Details

Department Product & Engineering

Category data

Posted 2 hours ago