About the Role
The role involves leading the development of AI and machine learning platforms, guiding technical direction, and ensuring scalable, maintainable implementations across research and product teams.
Responsibilities
- Lead the architecture and deployment of machine learning systems
- Collaborate with research teams to transition prototypes into production
- Design scalable data pipelines for training and inference
- Optimize model performance and latency across workloads
- Mentor engineers on best practices in ML engineering
- Define standards for model monitoring and observability
- Evaluate and integrate new AI frameworks and tools
- Ensure reproducibility and versioning of ML experiments
- Work across teams to align infrastructure with research goals
- Drive automation in model training and deployment workflows
- Troubleshoot production issues in distributed systems
- Contribute to technical roadmap planning
- Implement security and access controls for ML assets
- Support efficient resource utilization in cloud environments
- Maintain documentation for ML pipelines and APIs
- Promote testing and validation practices for models
- Coordinate with product teams on feature integration
- Assess performance metrics and model drift
- Guide selection of hardware accelerators for workloads
- Foster collaboration between engineering and data science
- Ensure compliance with data governance policies
- Optimize cost-efficiency in training and serving
- Lead code reviews and system design discussions
- Stay current with advancements in AI/ML research
- Contribute to open-source projects when applicable
Nice to Have
- Master’s or PhD in computer science or related field
- Experience leading ML teams or projects
- Contributions to open-source ML tools
- Publications or presentations in AI/ML venues
- Experience with federated learning systems
- Knowledge of edge ML deployment
- Background in natural language processing
- Experience with reinforcement learning
- Familiarity with differential privacy techniques
- Work with low-latency inference systems
- Prior work in startup environments
- Experience with regulatory compliance in AI
- Understanding of ethical AI principles
- Worked on multimodal AI systems
- Led cross-functional initiatives
Compensation
Competitive salary with equity and benefits
Work Arrangement
Remote-friendly with flexible hours
Team
Small, fast-moving team focused on AI infrastructure
Tech Stack
Python, PyTorch, Kubernetes, Docker, AWS, Ray, MLflow, Prometheus, Git, Kafka
Impact
- Your work will directly shape the reliability and scalability of AI systems used by researchers and developers worldwide
- You will influence the direction of core infrastructure that accelerates machine learning innovation
Available for exceptional candidates