Responsibilities
- Designing, building, running and evaluating methods to automatically attack and evaluate control protocols, such as LLM-automated attacking and optimisation approaches.
- Building and maintaining infrastructure and benchmarks for AI control experiments, including tools for evaluating the robustness of control measures across diverse threat models.
- Performing adversarial testing of frontier AI system control protocols and produce reports that are impactful and action-guiding for deployers.
Requirements
- Hands-on research experience with large language models (LLMs) - such as training, fine-tuning, evaluation, or safety research.
- A demonstrated track record of peer-reviewed publications in top-tier ML conferences or journals.
- Ability and experience writing clean, documented research code for machine learning experiments, including experience with ML frameworks like PyTorch or evaluation frameworks like Inspect.
- A sense of mission, urgency, responsibility for success.
- An ability to bring your own research ideas and work in a self-directed way, while also collaborating effectively and prioritising team efforts over extensive solo work.
Nice to Have
- Experience working on AI alignment or AI control.
- Experience working on adversarial robustness, other areas of AI security, or red teaming against any kind of system.
- Extensive experience writing production quality code.
- Desire to and experience with improving our team through mentoring and feedback.
- Experience designing, shipping, and maintaining complex technical products.
Team
Structure: The Control Red Team partners with leading frontier AI companies to stress-test control measures. The team uses techniques from adversarial ML to develop algorithms to find a range of failures in control measures, which are then used to assess strengthen control measures. These partnerships allow us to directly influence vital control measures, while our position in government lets us bring our understanding of the state of control measures to broader government as they make critical deployment, research, and policy decisions. The Control Red Team grew out of our previous work on control, including a library for running AI control experiments, stress-testing asynchronous monitors, chain-of-thought monitorability, evaluating control for LLM agents, practical challenges of control monitoring and AI control safety cases. The Control Red Team additionally draws from expertise within our broader Red Team, which has world-leading expertise in human-led attacks against AI systems.