Responsibilities
- Improve and execute our fine-tuning strategies for adapting Claude to new domains and tasks
- Manage technical relationships with external data vendors, including evaluation of data quality and reward design
- Collaborate with domain experts to design data pipelines and evaluations
- Explore novel ways of creating RL environments for high value tasks
- Develop and improve QA frameworks to catch reward hacking and ensure environment quality
- Partner with other RL research teams and product teams to translate capability goals into training environments and evals
Requirements
- Have experience with fine-tuning large language models for specific domains or real-world use cases and/or domain expertise in an area where we would like to make our models more useful.
- Have experience with reinforcement learning, reward design, or training data curation for LLMs
- Are comfortable managing technical vendor relationships and iterating quickly on feedback
- Find value in reading through datasets to understand them and spot issues
- Have strong project management and interpersonal skills
- Are passionate about making AI more useful and accessible across different industries
- Are excited about a role that includes a combination of ML research, data operations, and project management
Nice to Have
- Have experience training production ML systems
- Be familiar with distributed systems and cloud infrastructure
- Have domain expertise in an area where we would like to make our models more useful
- Have experience working with external vendors or technical partners