Manage full lifecycle deployment of inference workloads, including setup, optimization, SLA adherence, and incident resolution.
Deliver quantifiable gains in token generation speed, latency, and cost efficiency across various model types and usage patterns.
Develop and maintain key infrastructure for KV cache management and request scheduling to improve system throughput.
Design and validate split prefill/decode processing pipelines along with scalable Kubernetes-based orchestration.
Identify and eliminate performance constraints across compute, memory, and inter-process communication layers; implement comprehensive monitoring.
Collaborate with clients to align deployment strategies and platform enhancements with their model designs and performance needs.
Influence platform evolution by contributing to architectural decisions focused on simplifying deployments, boosting hardware efficiency, and enabling new model support.
Join a rotating on-call schedule, covering up to one week per month, to ensure system stability and meet service level objectives.

$165,000 – $350,000 base salary annually, with potential equity through stock options.

Not specified

Not specified

Base salary range is $165,000 – $350,000 per year, based on experience, skills, qualifications, and location.
Total compensation may include equity in the form of stock options.
Equal Employment Opportunity Employer policy is in effect.
Applicants with arrest and conviction records will be considered in accordance with applicable laws.
A confirmation email will be sent upon successful application submission.
If no confirmation is received, contact careers@fluidstack.io with resume/CV, role applied for, and submission date for follow-up.

Not specified

Fluidstack is hiring a Software Engineer, Inference Platform