About the Role
The role involves leading development on core AI infrastructure, improving scalability and reliability, and working closely with data scientists and engineers to deploy and optimize machine learning models in production environments.
Responsibilities
- Design and maintain scalable backend systems for AI workloads
- Collaborate with research teams to transition prototypes into production
- Optimize model inference pipelines for low latency and high throughput
- Implement monitoring and observability for AI services
- Ensure platform reliability and performance under heavy load
- Develop APIs for model serving and data integration
- Improve deployment automation and CI/CD workflows
- Support security and compliance requirements for AI systems
- Troubleshoot and resolve production issues promptly
- Contribute to architectural decisions for distributed systems
- Work with large-scale data pipelines and storage solutions
- Integrate third-party tools and services into the AI stack
- Mentor junior engineers and promote best practices
- Evaluate new technologies for performance and fit
- Participate in code reviews and system design discussions
- Ensure efficient resource utilization in cloud environments
- Maintain documentation for systems and processes
- Collaborate on incident response and post-mortems
- Support model versioning and lifecycle management
- Help define testing strategies for AI components
- Drive improvements in system observability
- Contribute to capacity planning and scaling efforts
- Work with containerization and orchestration tools
- Ensure consistency across development, staging, and production
- Assist in defining service level objectives
Nice to Have
- Master’s degree in Computer Science or related field
- Experience with large-scale AI model deployment
- Contributions to open-source projects
- Prior work with real-time inference systems
- Familiarity with MLOps tools
- Experience with model monitoring and drift detection
- Knowledge of GPU-accelerated computing
- Background in high-performance computing
- Exposure to formal software architecture reviews
- Published work in systems or AI conferences
Compensation
Competitive salary based on experience and location
Work Arrangement
Hybrid with flexible remote options
Team
Cross-functional team focused on AI infrastructure and product delivery
Tech Stack
Python, Go, Kubernetes, Docker, AWS, Prometheus, Grafana, TensorFlow, PyTorch, Kafka, PostgreSQL, Redis
Growth Opportunities
- Access to conference travel, dedicated learning budget, and internal tech talks
- Opportunities to lead projects and mentor team members
Inclusion Statement
We value diverse perspectives and encourage applications from all backgrounds
Available for qualified candidates