Role Overview

We're looking for a Senior Hardware Support Engineer to safeguard the reliability of our production hardware infrastructure at scale. In this role, you'll serve as a technical leader for resolving complex hardware and firmware issues, bridging operations, engineering, and vendor teams to maintain fleet stability and system uptime.

Key Responsibilities

Lead in-depth investigations into hardware and firmware failures, tracing issues from symptom to root cause
Identify systemic reliability trends by analyzing recurring errors and field data
Serve as the primary technical contact during critical hardware incidents affecting performance or availability
Collaborate with hardware vendors to expedite diagnostics, replacements, and corrective actions
Work closely with internal engineering groups to test and validate fixes before broad deployment
Conduct pre-deployment validation of server hardware and firmware updates
Apply structured problem-solving frameworks to manage and document incident resolution
Provide technical guidance to on-site teams during urgent hardware interventions
Enhance tools and processes for monitoring, tracking, and reporting hardware failures
Help shape long-term strategies to improve fleet-wide hardware resilience

Required Qualifications

Proven experience supporting server hardware in large-scale or data center environments
Track record of diagnosing and resolving complex hardware and firmware issues
Solid knowledge of server subsystems including CPU, memory, storage, power, networking, and BMC
Experience engaging with hardware vendors to resolve production escalations
Proficiency in structured troubleshooting using formal incident or problem management practices
Strong analytical skills with the ability to interpret system logs, telemetry, and error patterns
Experience coordinating with field operations teams during hardware events
Ability to manage multiple high-priority investigations simultaneously
Clear and effective communication skills in cross-team settings

Preferred Qualifications

Background in GPU-intensive, AI, or high-performance computing environments
Experience managing firmware lifecycle and validating large-scale rollouts
Familiarity with Linux-based infrastructure and operational tooling
History of improving hardware reliability metrics across large server fleets

Benefits

Comprehensive medical, dental, and vision insurance
401(k) plan with company match
Flexible paid time off policy
Paid parental leave
Support for professional development and learning

Compensation

Annual salary range: $125,000 – $180,000. Additional compensation includes an annual performance-based bonus.

Work Mode

This position is remote within the United States. Occasional travel may be required for on-site coordination or critical hardware events. The role includes participation in incident escalation coverage for production-impacting issues.

Company Culture

Our environment encourages initiative and innovation, with a strong emphasis on collaboration and technical excellence. We support professional growth and work at the forefront of AI cloud infrastructure, building systems that power next-generation applications.

Nebius is hiring a Hardware Support Engineer

Role Overview

Key Responsibilities

Required Qualifications

Preferred Qualifications

Benefits

Compensation

Work Mode

Company Culture

Similar Jobs

Network Support Engineer

DC Change Manager

IT Incident & Technical Support Engineer

Technical Support Engineer

IT Incident & Technical Support Engineer

S&S System Engineer

Nebius is hiring a Hardware Support Engineer

Role Overview

Key Responsibilities

Required Qualifications

Preferred Qualifications

Benefits

Compensation

Work Mode

Company Culture

Similar Jobs

Network Support Engineer

DC Change Manager

IT Incident & Technical Support Engineer

Technical Support Engineer

IT Incident &amp; Technical Support Engineer

S&S System Engineer

IT Incident & Technical Support Engineer