About the Role
Role details below.
Responsibilities
- Build and evolve backend services that power AI features: agent orchestration, tool execution, retrieval/RAG pipelines, and model serving integrations.
- Design APIs and control plane workflows for AI platform components (tenant-aware, secure by default, observable).
- Implement MCP style tool discovery / integration patterns so agents can safely call tools, connectors, and internal services.
- Work closely with product managers, designers, customers, and partner engineering teams to deliver high quality AI experiences.
- Engineer for reliability and scale: latency, cost controls, rate limiting, fallbacks, rollouts, and incident response readiness.
- Establish best practices around evaluation: offline test sets, regression detection, prompt/model/version tracking, and quality gates.
- Contribute to secure AI by design approaches: permissions, data access boundaries, prompt injection defenses, and auditability.
- Mentor junior engineers and contribute to a welcoming, high ownership team environment.
Requirements
- Strong software engineering skills with experience in distributed systems (Go, Python, or similar).
- Experience building cloud native services: Kubernetes, containers, service-to-service APIs, CI/CD.
- 4+ years of experience working on a SaaS product or production platform.
- Solid understanding of AI/ML fundamentals (you don’t need to be a researcher, but you should understand concepts well enough to build correct systems):
- Supervised learning basics (training vs inference, overfitting, evaluation metrics, classification, anomaly detection, forecasting, regression etc.)
- LLM basics (tokens, context windows, prompting, tool/function calling concepts)
- Embeddings + vector search fundamentals (similarity, indexing tradeoffs, retrieval pitfalls)
- Strong debugging and problem-solving skills, including incident-style troubleshooting across services and infrastructure.
- Intellectual curiosity about investigating issues that impact product quality, reliability, latency, and business metrics.
- Passion for building robust, maintainable systems in a fast-paced, team-oriented environment.
Nice to Have
- Hands on experience with AI agents and orchestration frameworks (tool calling, workflows, planners/executors).
- Practical experience with RAG systems, reranking, grounding, and evaluation strategies.
- Experience with model serving patterns (batch/online inference, caching, streaming responses).
- Knowledge of security considerations for AI systems (data isolation, RBAC, prompt injection threats, audit logs).
- Familiarity with vector databases or vector capabilities in modern data platforms.
- Experience with observability stacks (structured logging, metrics, tracing) and SLO driven engineering.