London, UK Hybrid Employment £105,000 - £145,000

AI Security Institute is hiring a Sub Team Lead - Red Team (Control)

Responsibilities

Designing, building, running and evaluating methods to automatically attack and evaluate control protocols, such as LLM-automated attacking and optimisation approaches.
Building and maintaining infrastructure and benchmarks for AI control experiments, including tools for evaluating the robustness of control measures across diverse threat models.
Performing adversarial testing of frontier AI system control protocols and produce reports that are impactful and action-guiding for deployers.

Requirements

Hands-on research experience with large language models (LLMs) - such as training, fine-tuning, evaluation, or safety research.
A demonstrated track record of peer-reviewed publications in top-tier ML conferences or journals.
Ability and experience writing clean, documented research code for machine learning experiments, including experience with ML frameworks like PyTorch or evaluation frameworks like Inspect.
A sense of mission, urgency, responsibility for success.
An ability to bring your own research ideas and work in a self-directed way, while also collaborating effectively and prioritising team efforts over extensive solo work.

Nice to Have

Experience working on AI alignment or AI control.
Experience working on adversarial robustness, other areas of AI security, or red teaming against any kind of system.
Extensive experience writing production quality code.
Desire to and experience with improving our team through mentoring and feedback.
Experience designing, shipping, and maintaining complex technical products.

Team

Structure: The Control Red Team partners with leading frontier AI companies to stress-test control measures. The team uses techniques from adversarial ML to develop algorithms to find a range of failures in control measures, which are then used to assess strengthen control measures. These partnerships allow us to directly influence vital control measures, while our position in government lets us bring our understanding of the state of control measures to broader government as they make critical deployment, research, and policy decisions. The Control Red Team grew out of our previous work on control, including a library for running AI control experiments, stress-testing asynchronous monitors, chain-of-thought monitorability, evaluating control for LLM agents, practical challenges of control monitoring and AI control safety cases. The Control Red Team additionally draws from expertise within our broader Red Team, which has world-leading expertise in human-led attacks against AI systems.

Required Skills

large language modelsML frameworks like PyTorch or evaluationimproving our team through mentoring large language modelsML frameworks like PyTorch or evaluationimproving our team through mentoring

Scaling your freelance income?

Invoice multiple clients effortlessly

Managing 3+ international clients? Glopay streamlines everything. One EU company, unlimited invoices, automatic compliance. You just send and get paid.

Unlimited clients & invoices

Multi-currency support

Automated tax compliance

Client portal for easy payments

Scale with Glopay

Trusted by 10,000+ freelancers

About company

The AI Security Institute is the world's largest and best-funded team dedicated to understanding advanced AI risks and translating that knowledge into action. It is positioned within the UK government with direct lines to the Prime Minister's office and works with frontier developers and governments globally.

All jobs at AI Security Institute Visit website

Job Details

Department Red Team (Control)

Category security

Posted 2 hours ago