London On-site Employment 100k - 200k GBP

Apollo Research is hiring a Research Scientist/Engineer (Science of Scheming)

About the Role

This role involves conducting independent and collaborative research into the mechanisms by which AI systems may develop and execute strategic, goal-directed behaviors, particularly those involving deception or manipulation. The work aims to anticipate and mitigate risks associated with advanced AI capabilities.

Responsibilities

Investigate how artificial intelligence models form and execute long-term plans
Design experiments to detect and analyze deceptive behaviors in AI systems
Develop theoretical frameworks for understanding strategic reasoning in machine learning models
Collaborate with interdisciplinary teams on AI safety and alignment challenges
Publish findings in academic venues and technical reports
Identify early warning signs of scheming or manipulative tendencies in model outputs
Build simulations to test hypotheses about AI goal pursuit
Evaluate model behavior under varying incentive structures
Contribute to the design of training protocols that discourage undesirable strategic behaviors
Engage in peer review of internal and external research
Translate theoretical insights into practical safety interventions
Maintain detailed documentation of experimental methods and results
Participate in workshops and discussions on AI ethics and governance
Assess the scalability of detection methods across model sizes
Explore connections between cognitive science and AI decision-making

Compensation

Competitive salary with performance-based incentives and research impact bonuses

Work Arrangement

Hybrid work model with primary location in San Francisco; remote options available for exceptional candidates

Team

Small, autonomous research team focused on foundational AI safety questions with regular collaboration across disciplines

Research Focus

Primary research area: emergent scheming behaviors in large language models
Emphasis on identifying precursors to manipulation and long-term planning in AI systems

Publication Policy

Support for publishing in top-tier conferences and journals
Flexible timeline for dissemination to accommodate responsible disclosure

Equipment

Access to high-performance computing clusters
Provision of local development machines with high-end GPUs

Collaboration

Regular research retreats with external collaborators
Weekly internal seminars and reading groups

Sponsorship available for qualified international applicants

Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation

Project choice & autonomy

International client base

Career growth support

Check compensation

Top earners exceed market rate

About company

The rapid rise in AI capabilities offer tremendous opportunities, but also present significant risks.At Apollo Research, we’re primarily concerned with risks from Loss of Control, i.e. risks coming from the model itself rather than e.g. humans misusing the AI. We’re particularly concerned with deceptive alignment / scheming, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight. We work on the detection of scheming (e.g., building evaluations and novel evaluation techniques), the science of scheming (e.g., model organisms and the study of scaling trends), and scheming mitigations (e.g., control). We closely work with multiple frontier AI companies, e.g. to test their models before deployment and collaborate on fundamental research. At Apollo, we aim for a culture that emphasizes truth-seeking, being goal-oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.

All jobs at Apollo Research Visit website

Job Details

Department Evals Team

Category other

Posted 3 months ago