United States Employment

UP.Labs is hiring a Sr. AI Quality Engineer

About the Role

UP.Labs is seeking a Sr. AI Quality Engineer to own end-to-end quality for our AI-powered inference system. This hybrid AI QA and Product Analyst role sits at the intersection of LLM inference, event-driven backend state-machines, and freight domain logic. You will define what "correct" means, build the systems to measure and enforce it, and lead deep-dive investigations into edge cases and failures.

What You'll Do

  • Own end-to-end system quality for our AI-powered freight audit platform.
  • Develop and maintain a quality rubric for key use cases and exception types.
  • Build and curate golden datasets, including customer-specific variations.
  • Own ongoing quality review in development and production: inspect high-volume outputs, diagnose failures, and convert discoveries into roadmap items.
  • Define and execute regression tests for new model changes, backend logic changes, or customer-specific use cases.
  • Investigate and diagnose issues across the full product stack, from email ingestion to final reporting.
  • Triage quality incidents by tracing through logs, event histories, and data queries to isolate root cause.
  • Produce high-signal findings reports with minimal reproduction steps, evidence, and recommended fixes.
  • Build scalable quality operations, including a repeatable triage playbook and classification system.
  • Define monitoring and dashboards for key quality signals like volume anomalies and exception drift.
  • Partner with engineering and AI teams to improve system observability and traceability.
  • Act as a product and domain translator, understanding freight billing workflows and converting customer requirements into testable rules.
  • Identify systemic gaps where real-world data doesn't fit our schema and propose product changes.

What We're Looking For

  • Experience in roles that blend quality assurance, investigation, and systems thinking.
  • Demonstrated experience evaluating AI/LLM output quality for tasks like extraction, classification, and structured outputs.
  • Strong technical ability to debug production issues using log and trace tools like Datadog, ELK, or Honeycomb.
  • Strong technical ability to debug production issues using SQL and/or Python for analysis and reproduction.
  • Strong technical ability to debug issues within event-driven architectures and workflow state machines.
  • Ability to write crisp requirements and acceptance criteria, translating ambiguity into concrete test cases.
  • Comfort operating in messy, high-volume, edge-case-heavy environments.

Nice to Have

  • Freight, logistics, audit, or billing domain experience.
  • Experience designing evaluation metrics like precision/recall, drift detection, and per-customer scorecards.
  • Familiarity with workflow engines, state machines, and distributed systems failure modes.
  • Experience with annotation workflows, taxonomy design, and building human-in-the-loop QA processes.

Technical Stack

  • SQL, Python
  • Datadog, ELK, Honeycomb, OpenTelemetry/Jaeger

Team & Environment

Our culture values high ownership—you don't stop at identifying a problem, you drive it to root cause and resolution. We operate comfortably with ambiguity and edge cases, building clarity systematically. You'll need to communicate effectively across product, engineering, machine learning, and operations teams.

Required Skills
SQLPythonDatadogELKHoneycombOpenTelemetryJaegerAI/LLM EvaluationDistributed SystemsEvent-Driven ArchitectureDebuggingQuality AssuranceIncident Triage
Landing international contracts?

Invoice globally with an EU company

GloPay creates an Estonian partnership for you automatically. Your clients get proper invoices, you keep 95% of payments. Setup takes 5 minutes, works in 100+ currencies.

EU-registered company for compliance
Multi-currency invoicing & payments
Expense tracking & tax reports
Money in your bank in 1 business day
Start invoicing free
5% per invoice • No subscriptions
About company
UP.Labs

UP.Labs builds high-growth technology startups that enable faster, cleaner, and safer movement of people and goods. The stealth startup is building an AI-powered platform focused on billing, revenue integrity, and cash-flow automation for enterprise logistics operators.

Visit website
Job Details
Department Quality Assurance
Category qa_testing
Posted 14 days ago