San Francisco, CA Hybrid Employment $320,000 - $405,000 USD

Anthropic is hiring a Research Engineer, AI Observability

About the Role

The role involves building techniques to analyze and interpret the inner workings of AI models, focusing on creating systems that provide actionable insights into model behavior and decision-making processes.

Responsibilities

  • Design and implement methods to interpret AI model internals
  • Create scalable tools for monitoring model behavior during training and deployment
  • Collaborate with research teams to identify key observability challenges
  • Develop metrics to track model reasoning patterns over time
  • Build visualizations that help diagnose model decisions
  • Investigate anomalies in model outputs using internal representations
  • Prototype new interpretability techniques based on current research
  • Translate theoretical insights into practical monitoring systems
  • Work with safety teams to identify risky model behaviors
  • Optimize instrumentation for large-scale AI systems
  • Maintain documentation for observability frameworks
  • Evaluate effectiveness of existing interpretability approaches
  • Support model debugging using activation analysis
  • Integrate observability tools into training pipelines
  • Contribute to open problems in model transparency
  • Ensure tools are usable by interdisciplinary teams
  • Refine methods based on empirical feedback
  • Assess performance impact of monitoring systems
  • Collaborate on benchmarking interpretability techniques
  • Stay current with advancements in AI transparency research

Nice to Have

  • Prior work on interpretability or mechanistic analysis of neural networks
  • Publications or contributions in AI safety or transparency
  • Experience with transformer-based models
  • Knowledge of causal analysis in machine learning
  • Familiarity with reinforcement learning from human feedback
  • Background in cognitive science or neuroscience
  • Experience with large language model internals
  • Contributions to open-source research tools

Compensation

Competitive salary and equity package aligned with experience

Work Arrangement

Hybrid or remote options available based on role and location

Team

Interdisciplinary research and engineering team focused on AI safety and transparency

Research Focus

The position emphasizes developing practical tools to understand how AI systems process information and make decisions, with a focus on real-world deployment scenarios.

Impact

Work will directly inform the development of safer and more reliable AI systems by enabling deeper visibility into model behavior.

Available for qualified candidates in select locations

Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole.
All jobs at Anthropic Visit website
Job Details
Department AI Observability
Category other
Posted 2 hours ago