Cambridge, Massachusetts, United States Employment

Third Way Health is hiring a Senior QA Engineer

About the Role

Third Way Health is hiring a Senior QA Engineer to own and evolve the quality and evaluation infrastructure for our AI-powered patient engagement platform. This is a high-impact individual contributor role where you will define how quality is measured across automated and manual workflows and build the tooling to make that definition actionable.

What You'll Do

  • Own and extend our multi-layered evaluation pipeline and verification portfolio, including deterministic quality checks, risk-factor heuristics, and LLM-graded transcript evaluation.
  • Advance our capabilities to evaluate end-to-end system performance across orchestrated agents, RAG-supported responses, and multi-party voice conversations.
  • Drive improvements to our observability stack to surface evaluation metrics, detect regressions, and enable data-driven quality decisions.
  • Build real-time monitoring and verification loops that catch issues in production interactions and feed back for system refinement.
  • Partner with ML engineers, product managers, and operations leads to translate real-world failure modes into automated checks.
  • Build and maintain adversarial and edge-case test suites for prompt injection resistance, guardrail robustness, and graceful degradation.
  • Champion “shift-left” quality practices by embedding evaluation criteria into prompt engineering workflows and defining acceptance criteria for new agent behaviors.
  • Contribute to the design of our QA pipeline orchestration to improve throughput, reliability, and developer experience.

What We're Looking For

  • 5+ years of software or test engineering experience, with 3+ years focused on quality infrastructure for AI/ML or data-intensive systems.
  • Strong proficiency in Python for building test frameworks, evaluation pipelines, and API-level integration tests.
  • Demonstrated experience designing evaluation systems for LLM-based applications, with a clear understanding of the model as a generation layer, not the quality layer.
  • Familiarity with the architectural tradeoffs of relying on LLM outputs in production, including variance across model versions and prompt sensitivity.
  • Experience building extensible, rule-based validation systems that scale across a growing surface area of features.
  • Solid understanding of voice AI or conversational AI systems, including tool-calling patterns, transcript analysis, and interaction-level quality metrics.
  • Hands-on experience with observability and metrics instrumentation in production environments.
  • Excellent communication skills and the ability to collaborate effectively across engineering, product, and non-technical stakeholders.
  • Strong interest in healthcare innovation and building AI systems that meaningfully improve health outcomes.

Nice to Have

  • Experience building QA or evaluation systems in healthcare or regulated environments, with familiarity with standards such as HIPAA, GDPR, or FDA guidance.
  • Proven experience leading complex technical initiatives and mentoring junior engineers.
  • Experience building systems where quality guarantees live in the verification infrastructure rather than in any single model.
  • Familiarity with risk-scoring systems, anomaly detection, or production safety nets for autonomous AI agents.
  • Experience with AI safety testing, including adversarial evaluation, jailbreak testing, and bias detection in LLM outputs.
  • Hands-on experience with CI/CD pipelines for evaluation automation and infrastructure-as-code deployment patterns.
  • Experience with voice UI testing tools and platforms focused on evaluating speech generation and response quality.
  • Knowledge of accessibility testing and inclusive design principles.

Technical Stack

  • Python, pytest, FastAPI TestClient, Pydantic
  • LLM-based evaluation systems
  • Observability and metrics tooling
  • CI/CD pipelines (CircleCI, GitHub Actions)
  • Voice AI/conversational AI systems

Team & Environment

You will partner closely with ML engineers, product managers, and operations leads to translate real-world needs into robust quality infrastructure.

Required Skills
PythonpytestFastAPI TestClientPydanticLLM-based evaluation systemsObservabilityCI/CDCircleCIGitHub ActionsVoice AIconversational AItest frameworksAPI testingquality infrastructure
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Job Details
Department Quality Assurance
Category qa_testing
Posted 14 days ago