Design and build autonomous agents using state-of-the-art LLMs
Implement tool use, retrieval pipelines, memory systems, and multi-step reasoning flows
Engineer prompts and system instructions for robustness, reliability, and speed
Optimize latency, cost, and throughput in production
Build evaluation frameworks to measure agent accuracy, tool correctness, and failure modes
Create high-quality datasets for training, fine-tuning, and benchmarking
Develop introspection tooling to debug reasoning chains, hallucinations, and tool misuse
Run structured experiments to improve agent performance through iterative testing

Strong experimental mindset with a scientific approach to evaluation and iteration
Experience working with modern LLMs, RAG pipelines, tool calling, and agent frameworks
Deep understanding of failure modes in LLM systems and how to mitigate them
Experience building production systems in Python, Go, or TypeScript
Familiarity with distributed systems, APIs, and real-time infrastructure
Comfort shipping systems that must be reliable, observable, and measurable
BS, MS, or PhD in Computer Science, Engineering, Machine Learning, or a related technical field from top University
2+ years of experience building software systems (experience working with LLMs, AI agents, or ML systems highly preferred)
Strong programming ability in Python, with experience in Go or TypeScript a plus
Experience working with modern LLM APIs (OpenAI, Anthropic, etc.) and building applications powered by foundation models
Experience building or contributing to production systems that must be reliable, observable, and scalable
Ability to diagnose and mitigate LLM failure modes such as hallucinations, tool misuse, and reasoning errors
Strong experimental mindset with a data-driven approach to improving system performance
Excellent communication skills (written and verbal) in English
Passion for building cutting-edge AI systems at the speed of a fast-growing startup
Resilient and adaptable in challenging, fast-paced environments
Ability to work in an onsite environment, we move faster when we're in the same room

Experience building evaluation harnesses or LLM benchmarking systems
Background in machine learning, applied research, or systems performance optimization
Experience optimizing inference latency and cost at scale
Experience debugging complex agent behaviors in real-world environments

Ability to work in an onsite environment, we move faster when we're in the same room

Coram AI is hiring an AI Research Engineer

Similar Jobs