About the Role
The role involves designing and implementing machine learning models to detect, analyze, and reduce risks in artificial intelligence systems, with a focus on interpretability, alignment, and robustness through empirical research and engineering.
Responsibilities
- Design and train models to detect harmful behaviors in AI systems
- Collaborate on developing techniques for model interpretability
- Evaluate AI systems for potential misuse or unintended outputs
- Implement algorithms that improve model alignment with human intent
- Conduct experiments to measure model robustness under adversarial conditions
- Analyze model internals to identify safety-relevant patterns
- Build tools for automated risk detection in language models
- Work with researchers to prototype new safeguard methods
- Scale experimental safety techniques to production-level models
- Document findings and contribute to internal safety benchmarks
- Refine training pipelines to reduce harmful output generation
- Assist in creating datasets for safety evaluation
- Monitor model behavior across diverse input distributions
- Improve detection of deceptive or manipulative model responses
- Support red teaming exercises with technical tooling
- Integrate feedback mechanisms into model training loops
- Develop metrics for tracking safety improvements over time
- Collaborate on cross-team initiatives to standardize safeguards
- Stay current with advancements in AI safety research
- Contribute to reproducible research workflows
Compensation
Competitive salary and equity offered based on experience and location.
Work Arrangement
Full-time position with flexible work options depending on team and role requirements.
Team
Part of a research-focused team working on AI safety, collaborating closely with scientists and engineers to implement and evaluate safeguard techniques.
About the Team
This team focuses on developing technical solutions to ensure AI systems behave safely and reliably. Work includes probing model vulnerabilities, designing detection mechanisms, and creating scalable interventions to prevent harmful behavior.
What We Value
We prioritize rigorous empirical work, intellectual honesty, and a proactive approach to identifying risks. Candidates should demonstrate curiosity about model behavior and a commitment to improving system safety through engineering.
Visa sponsorship available for qualified candidates requiring relocation.