Responsibilities
- Design and build scalable ML infrastructure to support real-time and batch classifier and safety evaluations across our model ecosystem
- Build monitoring and observability tools to track model performance, data quality, and system health for safety-critical applications
- Collaborate with research teams to productionize safety research, translating experimental safety techniques into robust, scalable systems
- Optimize inference latency and throughput for real-time safety evaluations while maintaining high reliability standards
- Implement automated testing, deployment, and rollback systems for ML models in production safety applications
- Partner with Safeguards, Security, and Alignment teams to understand requirements and deliver infrastructure that meets safety and production needs
- Contribute to the development of internal tools and frameworks that accelerate safety research and deployment