At Inflection AI, our mission is to harness AI to improve human well-being and productivity. We are hiring a Senior Machine Learning Engineer to be a key technical leader on our AI Engineering team. You will design and scale the systems that bring our models from research into reliable, production-grade deployments, directly impacting how intelligence is delivered to millions of users.
What You'll Do
- Design and implement scalable, low-latency model-serving infrastructure for large language models and multimodal systems.
- Build and maintain robust APIs and services to support real-time conversational workloads.
- Optimize inference systems for throughput, latency, cost-efficiency, and reliability.
- Architect and improve end-to-end ML pipelines spanning training, evaluation, deployment, monitoring, and rollback.
- Develop model lifecycle management systems with strong observability and performance tracking.
- Partner with infrastructure teams to scale compute resources efficiently across distributed environments.
- Improve CI/CD workflows and automation for model releases and infrastructure updates.
- Collaborate with ML researchers to productionize new model architectures and capabilities.
- Design abstractions that enable rapid experimentation while preserving safety, quality, and reliability.
- Implement evaluation frameworks and guardrails to ensure models meet performance and safety standards before deployment.
- Define data requirements and feedback loops to enable continuous model improvement.
- Partner with product and safety teams to integrate telemetry, evaluation signals, and user feedback into training pipelines.
- Ensure high-quality data ingestion and metadata tracking for ML readiness.
- Lead architectural decisions that balance performance, scalability, safety, and maintainability.
- Contribute to code reviews and engineering best practices across the team.
- Mentor engineers and raise the bar for production ML excellence.
- Help shape long-term technical strategy for deploying AI systems at global scale.
What We're Looking For
- 1-4 years of experience in machine learning engineering, backend systems, or distributed infrastructure.
- Proven experience deploying and operating ML models in production environments.
- Strong programming skills in Python and/or C++ (or equivalent systems language).
- Experience with large-scale model serving (LLMs, transformers, or similar architectures).
- Deep understanding of distributed systems, API design, and cloud infrastructure.
- Experience with MLOps tools and workflows (CI/CD, model monitoring, experiment tracking).
Nice to Have
- Experience scaling high-throughput, low-latency inference systems.
- Familiarity with GPU acceleration, model optimization (quantization, batching, caching), and performance tuning.
- Experience working with conversational AI systems or real-time user-facing AI products.
- Knowledge of ML evaluation methodologies, safety systems, and guardrail design.
- Background collaborating closely with research teams in fast-paced AI environments.
Technical Stack
- Python
- C++
- LLMs
- Transformers
Team & Environment
You will be a key technical leader on the AI Engineering team.
Benefits & Compensation
- Diverse medical, dental and vision options
- 401k matching program
- Unlimited paid time off
- Parental leave and flexibility for all parents and caregivers
- Support of country-specific visa needs for international employees living in the Bay Area
- Compensation: $172,000.00 to $250,000.00 + meaningful equity component
Inflection AI values and supports our team’s mental and physical health. We are focused on building a positive, safe, inclusive and inspiring place to work.





