About the Role
This role involves conducting independent and collaborative research into the mechanisms by which AI systems may develop and execute strategic, goal-directed behaviors, particularly those involving deception or manipulation. The work aims to anticipate and mitigate risks associated with advanced AI capabilities.
Responsibilities
- Investigate how artificial intelligence models form and execute long-term plans
- Design experiments to detect and analyze deceptive behaviors in AI systems
- Develop theoretical frameworks for understanding strategic reasoning in machine learning models
- Collaborate with interdisciplinary teams on AI safety and alignment challenges
- Publish findings in academic venues and technical reports
- Identify early warning signs of scheming or manipulative tendencies in model outputs
- Build simulations to test hypotheses about AI goal pursuit
- Evaluate model behavior under varying incentive structures
- Contribute to the design of training protocols that discourage undesirable strategic behaviors
- Engage in peer review of internal and external research
- Translate theoretical insights into practical safety interventions
- Maintain detailed documentation of experimental methods and results
- Participate in workshops and discussions on AI ethics and governance
- Assess the scalability of detection methods across model sizes
- Explore connections between cognitive science and AI decision-making
Compensation
Competitive salary with performance-based incentives and research impact bonuses
Work Arrangement
Hybrid work model with primary location in San Francisco; remote options available for exceptional candidates
Team
Small, autonomous research team focused on foundational AI safety questions with regular collaboration across disciplines
Research Focus
- Primary research area: emergent scheming behaviors in large language models
- Emphasis on identifying precursors to manipulation and long-term planning in AI systems
Publication Policy
- Support for publishing in top-tier conferences and journals
- Flexible timeline for dissemination to accommodate responsible disclosure
Equipment
- Access to high-performance computing clusters
- Provision of local development machines with high-end GPUs
Collaboration
- Regular research retreats with external collaborators
- Weekly internal seminars and reading groups
Sponsorship available for qualified international applicants