Build and ship AI agents that serve real users: tool-calling LLM systems with structured output, parallel API orchestration, and streaming responses.
Design evaluation harnesses and quality scoring — we use Langfuse, rubrics to measure safety, effectiveness, and personalization.
Own the full loop: prototype a new agent capability, validate it with evals, deploy it to staging and production, monitor traces, and iterate.
Improve reliability, latency, and cost through prompt caching strategies, token budgets, retry logic, and observability.
Write the tools agents use: API integrations with Pydantic validation, exercise search over local databases, structured workout submission.

Strong Python skills: you've built and deployed services on large production systems.
Experience with LangChain/LangGraph or similar agent frameworks.
Hands-on experience with LLMs in production: prompt engineering, tool/function calling, structured output, evaluation.
Comfort with async Python, HTTP APIs, and streaming protocols (SSE, webhooks).
Experience with data validation and schema design (Pydantic, JSON Schema).
Ability to debug across layers: from a broken LLM tool call to a misconfigured Terraform resource.
Clear communication: you'll work directly with product, mobile, and backend engineers.

Familiarity with AWS (Bedrock, ECR, CloudFront, S3, Cognito) or other cloud agent hosting.
Observability and tracing tools (Langfuse, OpenTelemetry, Datadog).
Exposure to evaluation frameworks: LLM-as-a-judge, automated scoring, dataset management.
Infrastructure-as-code (Terraform, CDK).

Remote (Country) — continental US

Remote-First Employment eligible to all employees located anywhere in the continental US. No travel required.
Flexible PTO so you can rest, recharge, and take care of life outside of work.
Future Membership Enjoy our platform for free!

Future is hiring an Applied AI Engineer