Responsibilities
- Plan and execute experiments assessing AI model performance in reasoning, stylistic consistency, robustness, and alignment with user preferences
- Create novel evaluation frameworks, metrics, and testing protocols that extend beyond standard benchmarking approaches
- Interpret large datasets of human feedback and interaction patterns to identify trends in model behavior and user expectations
- Work closely with engineering teams to integrate research outcomes into scalable production environments
- Quickly build and assess experimental prototypes, maintaining scientific rigor while enabling fast iteration
- Produce technical documentation, internal analyses, and peer-reviewed publications to share insights with the machine learning community
- Engage with external model developers to define meaningful evaluation criteria and support ethical testing practices
- Support the accuracy, credibility, and openness of public leaderboards and evaluation tools
Compensation
The cash compensation for this position has not yet been finalized. Actual compensation will depend on job-related knowledge, skills, experience, and candidate location.
Work Arrangement
Hybrid