About the Role
The role involves building and maintaining scalable machine learning operations infrastructure tailored for generative AI workloads, supporting end-to-end model lifecycle management from development to production.
Responsibilities
- Design and implement CI/CD pipelines for machine learning models
- Develop automated workflows for model training and evaluation
- Integrate monitoring solutions for model performance and system health
- Optimize deployment strategies for low-latency inference services
- Collaborate with research teams to transition prototypes into production
- Ensure reproducibility and versioning across ML pipelines
- Support infrastructure for large-scale distributed training
- Implement security and access controls for ML systems
- Work with containerization and orchestration tools like Docker and Kubernetes
- Maintain documentation for ML infrastructure components
- Troubleshoot issues in staging and production environments
- Improve scalability and reliability of model serving platforms
- Partner with data engineering teams to streamline data pipelines
- Apply software engineering best practices to ML codebases
- Evaluate new tools and frameworks for MLOps efficiency
- Contribute to internal developer tooling for ML teams
- Manage configuration and provisioning of cloud resources
- Enforce compliance with data governance policies
- Support A/B testing and canary rollout strategies
- Develop metrics dashboards for operational visibility
- Participate in incident response for critical system outages
- Drive automation of repetitive operational tasks
- Ensure efficient resource utilization in compute environments
- Collaborate on cross-functional initiatives involving AI ethics and safety
- Stay current with advancements in generative AI and MLOps
Nice to Have
- Master’s degree in computer science or related field
- Experience with generative AI models such as LLMs or diffusion models
- Contributions to open-source MLOps projects
- Deep knowledge of PyTorch or TensorFlow ecosystems
- Experience with model quantization and optimization techniques
- Familiarity with data labeling and annotation pipelines
- Background in high-performance computing environments
- Knowledge of model explainability and fairness tools
- Experience with multi-cloud or hybrid cloud deployments
- Prior work in AI research or product teams
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid work model with flexibility based on location and team needs
Team
Part of the AI infrastructure team focused on generative AI systems
About the Team
This group builds foundational infrastructure for generative AI, enabling researchers and engineers to develop, train, and deploy models efficiently. The focus is on creating robust, scalable platforms that support rapid innovation while maintaining operational excellence.
Why This Role Matters
As generative AI models grow in complexity, the systems supporting them must evolve. This role directly impacts the speed and reliability of AI development by creating automated, maintainable pipelines that bridge research and production.
Available for qualified candidates


