Responsibilities
- Design and build autonomous agents using state-of-the-art LLMs
- Implement tool use, retrieval pipelines, memory systems, and multi-step reasoning flows
- Engineer prompts and system instructions for robustness, reliability, and speed
- Optimize latency, cost, and throughput in production
- Build evaluation frameworks to measure agent accuracy, tool correctness, and failure modes
- Create high-quality datasets for training, fine-tuning, and benchmarking
- Develop introspection tooling to debug reasoning chains, hallucinations, and tool misuse
- Run structured experiments to improve agent performance through iterative testing
Requirements
- Strong experimental mindset with a scientific approach to evaluation and iteration
- Experience working with modern LLMs, RAG pipelines, tool calling, and agent frameworks
- Deep understanding of failure modes in LLM systems and how to mitigate them
- Experience building production systems in Python, Go, or TypeScript
- Familiarity with distributed systems, APIs, and real-time infrastructure
- Comfort shipping systems that must be reliable, observable, and measurable
- BS, MS, or PhD in Computer Science, Engineering, Machine Learning, or a related technical field from top University
- 2+ years of experience building software systems (experience working with LLMs, AI agents, or ML systems highly preferred)
- Strong programming ability in Python, with experience in Go or TypeScript a plus
- Experience working with modern LLM APIs (OpenAI, Anthropic, etc.) and building applications powered by foundation models
- Experience building or contributing to production systems that must be reliable, observable, and scalable
- Ability to diagnose and mitigate LLM failure modes such as hallucinations, tool misuse, and reasoning errors
- Strong experimental mindset with a data-driven approach to improving system performance
- Excellent communication skills (written and verbal) in English
- Passion for building cutting-edge AI systems at the speed of a fast-growing startup
- Resilient and adaptable in challenging, fast-paced environments
- Ability to work in an onsite environment, we move faster when we're in the same room
Nice to Have
- Experience building evaluation harnesses or LLM benchmarking systems
- Background in machine learning, applied research, or systems performance optimization
- Experience optimizing inference latency and cost at scale
- Experience debugging complex agent behaviors in real-world environments
Benefits
- Competitive compensation package
- 100% Employer-paid medical, dental, vision, and base life insurance
- Flexible paid time off and 9 paid holidays
- 401(k) with both Traditional and Roth options
- Equity in a rapidly growing company
- Referral bonuses
- Daily team dinners and regular team off-sites to build connection and momentum
- The latest Apple tech and unlimited tools so you can win
- Unlimited Cursor and Claude Code credits
- Direct exposure to our AI-native GTM machinery
Additional Information
- Ability to work in an onsite environment, we move faster when we're in the same room


