Flock Safety seeks a Senior AI Systems Engineer to join the Machine Learning team as an early technical contributor to Night Shift, an AI copilot that automates case work to amplify investigators. You will own the AI evaluation framework and work closely with Engineering partners to architect and scale the agentic AI system.
What You'll Do
- Immerse yourself in the current system design and agent/tooling landscape.
- Support the team by shipping quick wins like refining tool APIs, prompt engineering, and fixing bugs.
- Stand up foundational evaluation and observability scaffolding.
- Propose a technical architecture and implementation plan for an agent evaluation framework.
- Deliver an MVP evaluation harness to produce metrics, enable debugging, and perform regression testing.
- Take on a system feature that offers demonstrated improvement against the MVP evaluation suite.
- Productionize the evaluation and observability platform as the source of truth for quality and safety.
- Own the roadmap for evolving the agent evaluation platform.
- Lead deeper R&D threads to improve system performance on core metrics.
What We're Looking For
- 5+ years building and shipping ML/LLM systems to production.
- Experience with ML Inference (PyTorch, TensorRT, NVIDIA Triton), ideally in multimodal domains.
- Experience with LLM Inference (LangChain/LangGraph, vLLM, OpenAI/Gemini/Anthropic APIs).
- Experience with compute orchestration (Kubernetes, Prefect, Ray).
- Experience with Cloud Infrastructure (AWS, Terraform, VPC, Networking).
- Experience with Observability (Prometheus, Grafana, OpenTelemetry, LangSmith/Langfuse).
- Experience with Data systems (ClickHouse, Postgres, Redis).
- Experience with Web services (Express/FastAPI, REST, SSE, JWTs).
- Backend JS (e.g., NodeJS) familiarity required.
- Hands-on experience with LLM agents including tool use, retrieval, memory, grounding, and guardrails.
- Experience with architectural patterns for multi-agent systems and context management.
- Experience with RAG: vector/hybrid search and re-rankers.
- Experience building offline/online evaluation harnesses for LLMs.
- Familiarity with methodologies to measure search, retrieval, recommendation performance, agentic task success, safety & robustness, and cost/performance/latency trade-offs.
Nice to Have
- Typescript and Python familiarity.
Technical Stack
- PyTorch, TensorRT, NVIDIA Triton, LangChain, LangGraph, vLLM, OpenAI APIs, Gemini APIs, Anthropic APIs, Kubernetes, Prefect, Ray, AWS, Terraform, Prometheus, Grafana, OpenTelemetry, LangSmith, Langfuse, ClickHouse, Postgres, Redis, Express, FastAPI, NodeJS, Typescript, Python, pgvector, turbopuffer, chroma, Cohere, JinaAI
Team & Environment
Sits within the Machine Learning team and works closely with partners in Engineering (Backend, Frontend, and Design).
Benefits & Compensation
- Compensation: $170,000-$210,000 + equity: Flock Safety Stock Options
- Flexible PTO
- 11 company holidays
- Fully-paid health benefits plan including Medical, Dental, and Vision
- HSA match
- 12 weeks of 100% paid parental leave
- Additional 6-8 weeks of physical recovery time for birthing parents
- $50,000-lifetime maximum benefit for eligible adoption, surrogacy, or fertility expenses via Maven
- Mental health benefits via Spring Health
- Caregiver support via Cariloop
- 1:1 sessions with Equity Tax Advisors via Carta
- Employee Resource Groups (ERGs)
- $150 per month WFH stipend
- $300 per year productivity stipend
- One-time $750 home office stipend
Work Mode
This is a hybrid role open to candidates in Atlanta, Boston, Chicago, Denver, Los Angeles, New York City, San Francisco, Austin, or remotely within the United States.
Flock Safety is an equal opportunity employer. We celebrate diverse backgrounds and thoughts and welcome everyone to apply for employment with us.



