Responsibilities
- Orchestrate the deployment of clusters containing over 1,000 GPUs using proprietary automation scripts, adapting tools as needed to meet specific customer requirements.
- Verify the accuracy and efficiency of compute, storage, and network systems, and collaborate with vendors to enhance performance.
- Transfer massive datasets, measured in petabytes, from public cloud environments to on-premises storage with optimal speed and cost efficiency.
- Diagnose and resolve technical problems across the full technology stack, ranging from physical hardware issues to optimizing data retrieval across distributed storage regions.
- Develop internal tools to streamline deployment processes and improve system reliability, prioritizing automation when benefits clearly exceed development costs.
Benefits
- Comprehensive compensation package including salary and equity.
- Retirement or pension plan aligned with regional standards.
- Coverage for health, dental, and vision care.
- Generous paid time off policy consistent with local practices.
Compensation
Competitive total compensation package (salary + equity)
Work Arrangement
Not specified
Team
Not specified
Other
- This position includes participation in an on-call rotation, typically up to one week per month.
- Candidates must demonstrate a customer-first mindset, personal accountability, and a proactive approach to problem-solving.
- Proven experience delivering clean, well-documented code in technically demanding environments is required.
- Ability to establish order in ambiguous situations, adapt quickly, and operate effectively within the fast-evolving AI landscape.
- Strong communication skills—both technical and interpersonal—combined with humility and a constructive attitude—are essential.
Not specified