Responsibilities
- Architect, build, and launch backend systems that are scalable, efficient, and capable of handling high loads for large-scale model inference operations.
- Play a key role in shaping the technical direction and system design of the inference platform with an emphasis on high throughput and minimal latency.
- Maintain system reliability, performance, and scalability in live environments using observability tools such as Prometheus and Grafana.
- Collaborate across data science, product, and engineering functions to ensure platform development supports long-term business objectives.
- Administer and fine-tune cloud infrastructure on Google Cloud Platform, managing workloads via Kubernetes orchestration.
- Champion robust engineering practices across the development lifecycle, including testing, deployment, monitoring, and DevOps and SRE principles.
Benefits
- Full coverage for health, life, and disability insurance plans
- Financial support for commuting expenses
- Eligibility for company stock participation
- Attractive retirement and pension savings options
- Ample paid time off and personal leave
- Comprehensive parental leave and family care support
- On-site food and snack provisions in the office
- Programs focused on mental health and overall well-being
- Employee-led affinity and support groups
- Worldwide Employee Assistance Program for personal support
- Access to professional growth and learning initiatives
- Corporate social responsibility through volunteer time and donation matching