Responsibilities

Developing a chain-of-thought monitorability benchmark and comparing monitorability properties across frontier AI systems, leveraging AISI’s unique access to reasoning traces from multiple labs.
Designing and running experiments on open-weight models to study alignment and oversight-relevant phenomena – such as reproducing emergent misalignment from reward hacking, or red-teaming techniques like inoculation prompting and character training.
Using white-box and interpretability methods – such as activation oracles, sparse auto-encoders or probes – to detect misalignment that isn’t visible through behavioural evaluation alone.
Building tooling and infrastructure for our research – including agent orchestration, large-scale RL pipelines, mechanistic interpretability methodologies, and auditing agents.
Reviewing frontier lab risk assessments and safety cases, providing independent analysis of alignment claims before deployment decisions.
Conducting literature reviews and expert interviews to map the state of model transparency risks and inform AISI’s strategic priorities.
Translating technical findings into actionable insights for AISI evaluation teams, UK government officials, and international partners.

Requirements

A get-things-done mindset – you take ownership, move fast, and care about shipping work that matters.
A combination of self-sufficiency and enthusiasm for teamwork – you’re equally happy defining your own agenda and contributing to shared goals.
You’re excited about growing, giving and receiving feedback, and building something together.
An ability to build, supervise and orchestrate AI agents to complete tasks effectively, while verifying and maintaining quality of work.
A demonstrated track record of relevant, high-quality work – whether technical publications, blog posts, or other publicly visible contributions.

Team

Structure: The Model Transparency team is a research team within AISI focused on ensuring that evaluations, assessments, and monitoring of frontier AI systems remain reliable as models become less transparent.

Additional Information

The deadline for applying to this role is Sunday 24th May 2026, end of day, anywhere on Earth.

AI Security Institute is hiring a Research Engineer/Research Scientist – Model Transparency

Responsibilities

Requirements

Team

Additional Information

Invoice multiple clients effortlessly

Similar Jobs

Customer Support Representative

Commerce Executive, GOC

Senior Account Executive, PR & Communications

Account Director

Speech Language Pathologist (SLP) Remote

HR Recruiter