San Francisco Office (Fremont St); Bellevue Office; San Jose Office (Zanker) Hybrid Full-time USD 203,000 – 300,000 / year

Lambda is hiring a Software Engineer - Fleet

Responsibilities

  • Develop and Maintain Production Systems: Design, implement, and improve software that powers GPU fleet lifecycle management and machine configuration at scale.
  • Automate Infrastructure: Build and enhance automation frameworks for machine provisioning, configuration management, and deployment.
  • Support New Hardware Introduction (NPI): Enable bring-up, validation, and production readiness for new server and accelerator platforms.
  • Enhance Machine Lifecycle Processes: Improve and refine workflows for bare metal provisioning, firmware updates, and system health monitoring.
  • Debug Hardware and Firmware Issues: Investigate failures across BIOS, BMC, firmware, networking, storage, and boot flows.
  • Collaborate Across Teams: Work closely with infrastructure, security, and product engineering teams to develop scalable and maintainable solutions.

Requirements

  • 2+ years of experience working with Go (Golang) or Python in production environments.
  • 2+ years of experience with configuration management tools and practices.
  • Comfortable working in Linux environments and debugging issues at the OS, hardware, and networking layers.
  • Able to independently troubleshoot complex systems and communicate effectively across software, infrastructure, and vendor teams.

Nice to Have

  • Experience with Go in infrastructure, systems, or backend development.
  • Hands-on experience with bare metal provisioning and lifecycle management, including technologies such as Redfish, BMC, IPMI, DHCP, and PXE.
  • Experience diagnosing issues involving drivers, firmware, and hardware compatibility across GPU servers.
  • Experience incorporating AI-assisted development tools into engineering workflows, including code generation, debugging, test development, and documentation.
  • Experience building Linux distributions or managing OS customization and imaging.
  • Familiarity with Ansible for system configuration and automation.
  • Exposure to Kubernetes and container orchestration concepts.
Required Skills
Backend Development
About company
Lambda
Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. The company builds and scales AI cloud infrastructure, including high-performance storage, networking, and compute systems for AI training and inference. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence.
All jobs at Lambda Visit website
Job Details
Department Data Center Business
Category other
Posted a month ago