US - Remote Remote (Country) Full-time

Sully.ai is hiring an Applied Research Scientist

Responsibilities

Build and scale automated evaluation pipelines (LLM-as-judge + human review) with clinical-grade benchmarks.

Requirements

Proven experience designing agentic processes and LLM evaluation/benchmarking frameworks.
Strong Python and ML background (PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex).
Demonstrated ability to design rigorous experiments and translate findings into production.
Track record of published research or deep applied work in LLMs and agent evaluation.
Strong communication and technical writing skills to articulate complex findings clearly.

About company

Sully.ai is transforming healthcare access by integrating AI into medical workflows and automating healthcare administrative tasks throughout the patient visit cycle - enhancing efficiency, reducing errors, and supporting real-time decision-making.

All jobs at Sully.ai Visit website

Job Details

Department Engineering

Category other

Posted 4 months ago