About the Role

The role involves designing and implementing machine learning models to detect, analyze, and reduce risks in artificial intelligence systems, with a focus on interpretability, alignment, and robustness through empirical research and engineering.

Responsibilities

Design and train models to detect harmful behaviors in AI systems
Collaborate on developing techniques for model interpretability
Evaluate AI systems for potential misuse or unintended outputs
Implement algorithms that improve model alignment with human intent
Conduct experiments to measure model robustness under adversarial conditions
Analyze model internals to identify safety-relevant patterns
Build tools for automated risk detection in language models
Work with researchers to prototype new safeguard methods
Scale experimental safety techniques to production-level models
Document findings and contribute to internal safety benchmarks
Refine training pipelines to reduce harmful output generation
Assist in creating datasets for safety evaluation
Monitor model behavior across diverse input distributions
Improve detection of deceptive or manipulative model responses
Support red teaming exercises with technical tooling
Integrate feedback mechanisms into model training loops
Develop metrics for tracking safety improvements over time
Collaborate on cross-team initiatives to standardize safeguards
Stay current with advancements in AI safety research
Contribute to reproducible research workflows

Compensation

Competitive salary and equity offered based on experience and location.

Work Arrangement

Full-time position with flexible work options depending on team and role requirements.

Team

Part of a research-focused team working on AI safety, collaborating closely with scientists and engineers to implement and evaluate safeguard techniques.

About the Team

This team focuses on developing technical solutions to ensure AI systems behave safely and reliably. Work includes probing model vulnerabilities, designing detection mechanisms, and creating scalable interventions to prevent harmful behavior.

What We Value

We prioritize rigorous empirical work, intellectual honesty, and a proactive approach to identifying risks. Candidates should demonstrate curiosity about model behavior and a commitment to improving system safety through engineering.

Visa sponsorship available for qualified candidates requiring relocation.

Anthropic is hiring a ML/Research Engineer, Safeguards

About the Role

Responsibilities

Compensation

Work Arrangement

Team

About the Team

What We Value

Don't lose them over invoicing

Similar Jobs

Work From Home Customer Service Representative - Part Time

Video Editor

Sr Technical Program Manager I - Adoption and Enablement

Technical Support Engineer

Live Command Center Operator (LCC Operator)

Field Reimbursement Manager, Rare Disease-South