Orchestrate the deployment of clusters containing over 1,000 GPUs using proprietary automation scripts, adapting tools as needed to meet specific customer requirements.
Verify the accuracy and efficiency of compute, storage, and network systems, and collaborate with vendors to enhance performance.
Transfer massive datasets, measured in petabytes, from public cloud environments to on-premises storage with optimal speed and cost efficiency.
Diagnose and resolve technical problems across the full technology stack, ranging from physical hardware issues to optimizing data retrieval across distributed storage regions.
Develop internal tools to streamline deployment processes and improve system reliability, prioritizing automation when benefits clearly exceed development costs.

Competitive total compensation package (salary + equity)

Not specified

Not specified

This position includes participation in an on-call rotation, typically up to one week per month.
Candidates must demonstrate a customer-first mindset, personal accountability, and a proactive approach to problem-solving.
Proven experience delivering clean, well-documented code in technically demanding environments is required.
Ability to establish order in ambiguous situations, adapt quickly, and operate effectively within the fast-evolving AI landscape.
Strong communication skills—both technical and interpersonal—combined with humility and a constructive attitude—are essential.

Not specified

Fluidstack is hiring a Site Reliability Engineer