TaskUs is looking for a Data Scientist, AI Safety Services to bring statistical rigor and analytical firepower to AI safety benchmarks, red-teaming, and client rollouts. In this role, you will transform evaluation outputs into clear metrics and risk scores to guide client decisions and accelerate research, supporting our mission to ensure AI models behave safely and fairly.
What You'll Do
- Own analytics pipelines for jailbreak detection rates, toxicity scores, drift signals, and other safety KPIs from ETL through presentation.
- Design A/B and sequential tests to measure the impact of prompt tweaks, fine-tuning, or policy changes on model behavior.
- Develop risk models and dashboards in Python backed by scalable storage.
- Collaborate with Researchers to choose statistical methods, validate assumptions, and publish reproducible notebooks.
- Partner with Solutions Engineering to embed metrics in client reports and scope data requirements for new engagements.
- Automate reporting by building CI hooks or Airflow jobs so safety scores refresh with each new model drop or data batch.
- Stay current by evaluating new safety benchmarks, open-source metric libraries, and MLOps best practices.
What We're Looking For
- 3–5 years in data science, analytics engineering, or applied statistics.
- Proven track record turning messy, high-volume data into actionable insights and visualizations for stakeholders.
- Solid understanding of machine-learning evaluation workflows.
- Hands-on experience with the Python data stack (Pandas, NumPy, SciPy, scikit-learn) and SQL—or equivalents in Spark/BigQuery.
- Familiarity with experiment design, hypothesis testing, and causal inference techniques.
- Statistical rigor: power analysis, significance testing, bootstrap or Bayesian methods.
- Data engineering basics: ETL, data validation, versioning, and scalable storage.
- Visualization and storytelling: interactive dashboards and clear narratives for technical and executive audiences.
- Scripting and automation: write maintainable code, schedule pipelines, and integrate with CI/CD.
- Collaboration: work fluidly with researchers, engineers, and client-facing teams under tight timelines.
Nice to Have
- Exposure to large language model outputs.
- Experience with MLflow or Weights & Biases.
- Exposure to privacy-preserving analytics.
- Familiarity with NIST RMF or EU AI Act metrics.
Technical Stack
- Python, Pandas, NumPy, Plotly, Streamlit, SciPy, scikit-learn
- SQL, BigQuery, Redshift, Spark
- Airflow
Team & Environment
You will collaborate closely with Researchers and Solutions Engineering teams in a dynamic, cross-functional environment.
Benefits & Compensation
- Competitive industry salaries
- Comprehensive benefits packages
- Inclusive environment
- Internal mobility and professional growth opportunities
Work Mode
This is a remote-first position.
TaskUs is committed to providing equal access to opportunities. If you need reasonable accommodations in any part of the hiring process, please let us know.

