About the Role
The role involves building and improving core components of an AI platform, working closely with machine learning models, distributed systems, and high-performance infrastructure to deliver reliable and scalable solutions.
Responsibilities
- Develop and maintain key parts of the AI platform infrastructure
- Collaborate on designing scalable backend systems for machine learning workflows
- Optimize system performance and reliability for AI model deployment
- Work with data pipelines to support training and inference processes
- Implement robust APIs for internal and external service integration
- Ensure platform security and compliance with best practices
- Diagnose and resolve technical issues across distributed environments
- Contribute to architectural decisions for long-term scalability
- Write clean, maintainable, and well-tested code
- Partner with research and engineering teams to operationalize AI models
- Monitor system behavior and respond to production incidents
- Improve tooling for automated testing and deployment
- Support version control and CI/CD workflows
- Document technical designs and system changes
- Stay current with advancements in AI and distributed computing
Nice to Have
- Master’s degree in computer science or related field
- Experience with large-scale AI or ML platforms
- Contributions to open-source projects
- Familiarity with MLOps practices
- Knowledge of monitoring and observability tools
- Background in security-first development
- Experience working in global, distributed teams
Compensation
Competitive salary based on experience and location
Work Arrangement
Remote
Team
Distributed engineering team focused on AI and platform infrastructure
Why This Role Matters
You will help shape the foundation of an AI platform used by engineering and research teams to deploy intelligent systems at scale.
Technology Stack
Python, Go, Kubernetes, Docker, AWS, TensorFlow, PyTorch, PostgreSQL, Kafka
Not available