Crusoe is looking for a Senior Staff Software Engineer to join our Model LifeCycle team. You will play a crucial role in building a comprehensive managed platform for the entire application development lifecycle, with a specific focus on leveraging Machine Learning models, including Large Language Models (LLMs). This role offers significant 0 → 1 ownership, designing and building core systems from first principles to accelerate the abundance of energy and intelligence.
What You'll Do
- Manage fine-tuning systems for large foundation models, including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling.
- Implement and maintain end-to-end training pipelines for Large Language Models.
- Build distillation and reinforcement learning pipelines.
- Build agent execution infrastructure.
- Manage dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale.
- Work closely with product, business, and platform teams to shape the core abstractions and APIs of the system.
- Influence long-term architectural decisions around training runtimes, scheduling, storage, and model lifecycle management.
- Contribute to and engage with the open-source LLM ecosystem.
What We're Looking For
- Advanced degree in Computer Science, Engineering, or a related field.
- 8-12+ years of industry experience driving impactful projects in the AI Space.
- Proven track record of delivering early-stage projects under tight deadlines.
- Expertise in using cloud-based services, such as elastic compute, object storage, virtual private networks, and managed database.
- Experience in Generative AI, including Large Language Models and Multimodal.
- Deep experience with AI infrastructure, including training and inference.
- Proactive and collaborative approach with the ability to work autonomously.
- Strong communication and interpersonal skills.
- Passion for building AI products and solving challenging technical problems.
Nice to Have
- Proficiency in Golang or Python for large-scale, production-level services.
- Contributions to open-source AI projects such as vLLM or similar frameworks.
- Performance optimizations on GPU systems and inference frameworks.
- Experience working with PyTorch.
- Experience with training and fine-tuning LLMs.
Technical Stack
- Golang
- Python
- PyTorch
- vLLM
- Cloud-based services: elastic compute, object storage, virtual private networks, managed database
Team & Environment
You will be joining the Model LifeCycle team at Crusoe.
Benefits & Compensation
- Compensation: $237,600 - $288,000 + Bonus + equity. Restricted Stock Units are included in all offers.
- Industry competitive pay
- Restricted Stock Units
- Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
- Employer contributions to HSA accounts
- Paid Parental Leave
- Paid life insurance, short-term and long-term disability
- Teladoc
- 401(k) with a 100% match up to 4% of salary
- Generous paid time off and holiday schedule
- Cell phone reimbursement
- Tuition reimbursement
- Subscription to the Calm app
- MetLife Legal
- Company paid commuter benefit; $300/month
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.






