San Francisco, CA | New York City, NY Hybrid Employment $350,000 - $500,000 USD

Anthropic is hiring a ML/Research Engineer, Safeguards

About the Role

The role involves designing and implementing machine learning models to detect, analyze, and reduce risks in artificial intelligence systems, with a focus on interpretability, alignment, and robustness through empirical research and engineering.

Responsibilities

  • Design and train models to detect harmful behaviors in AI systems
  • Collaborate on developing techniques for model interpretability
  • Evaluate AI systems for potential misuse or unintended outputs
  • Implement algorithms that improve model alignment with human intent
  • Conduct experiments to measure model robustness under adversarial conditions
  • Analyze model internals to identify safety-relevant patterns
  • Build tools for automated risk detection in language models
  • Work with researchers to prototype new safeguard methods
  • Scale experimental safety techniques to production-level models
  • Document findings and contribute to internal safety benchmarks
  • Refine training pipelines to reduce harmful output generation
  • Assist in creating datasets for safety evaluation
  • Monitor model behavior across diverse input distributions
  • Improve detection of deceptive or manipulative model responses
  • Support red teaming exercises with technical tooling
  • Integrate feedback mechanisms into model training loops
  • Develop metrics for tracking safety improvements over time
  • Collaborate on cross-team initiatives to standardize safeguards
  • Stay current with advancements in AI safety research
  • Contribute to reproducible research workflows

Compensation

Competitive salary and equity offered based on experience and location.

Work Arrangement

Full-time position with flexible work options depending on team and role requirements.

Team

Part of a research-focused team working on AI safety, collaborating closely with scientists and engineers to implement and evaluate safeguard techniques.

About the Team

This team focuses on developing technical solutions to ensure AI systems behave safely and reliably. Work includes probing model vulnerabilities, designing detection mechanisms, and creating scalable interventions to prevent harmful behavior.

What We Value

We prioritize rigorous empirical work, intellectual honesty, and a proactive approach to identifying risks. Candidates should demonstrate curiosity about model behavior and a commitment to improving system safety through engineering.

Visa sponsorship available for qualified candidates requiring relocation.

Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.
All jobs at Anthropic Visit website
Job Details
Department Safeguards ML
Category other
Posted 2 hours ago