On-site Full-time

Dell Technologies is hiring a Senior System Development Engineer – AI Technologies

About the Role

Dell Technologies is hiring a Senior System Development Engineer – AI Technologies. You will design and implement complex system requirements with a focus on AI technologies. This role involves leading the bring-up, validation, and debugging of system platforms that support demanding AI workloads, including AI clusters and rack-level operations.

What You'll Do

  • Lead bring‑up, configuration, and validation of system platforms supporting AI workloads, including servers, GPU racks, accelerators, and networking fabrics.
  • Work with BIOS/UEFI, BMC, firmware, drivers, and kernel subsystems to ensure system readiness for large‑scale AI deployments.
  • Perform hardware–software co-validation of CPUs, GPUs, DPUs, NICs, accelerators, and memory subsystems under AI‑heavy workloads.
  • Validate PCIe fabric behavior, NUMA topology, and data‑path efficiency for model training and inference.
  • Diagnose complex issues across BIOS, firmware, OS, driver stack, container runtime, orchestration layer, and AI frameworks.
  • Analyze system logs, kernel traces, hardware event telemetry, GPU health signals, and fabric diagnostics.
  • Conduct root‑cause analysis of performance bottlenecks, training failures, model divergence, and hardware stability issues.
  • Collaborate with silicon, firmware, OS, and AI software teams to resolve issues rapidly.
  • Deploy and manage AI clusters: GPU servers, accelerators, high‑speed networking (InfiniBand, RoCE), and storage systems.
  • Validate cluster readiness for distributed training, including bandwidth, latency, topology checks, and gradient‑sync performance.
  • Work with orchestration systems like Kubernetes, Slurm, Ray, Docker, and Singularity to run and optimize AI pipelines.
  • Partner with data center teams for rack integration, power/thermal analysis, and capacity planning.
  • Execute and analyze standard AI benchmarks like MLPerf Training, MLPerf Inference, and SPEC AI Benchmarks.
  • Build custom benchmarks for transformer models, LLMs, computer vision, multimodal models, and recommendation systems.
  • Interpret results to provide optimization recommendations at the hardware, OS, driver, and framework levels.
  • Document findings and drive improvements across the platform and AI software ecosystem.

What We're Looking For

  • Bachelor’s or Master’s degree in Computer Engineering, Computer Science, Electrical Engineering, or a related field.
  • 5+ years of experience in system engineering, platform development, or hardware–software validation.
  • Strong understanding of system architecture, CPU/GPU/accelerator internals, memory systems, and I/O subsystems.

Technical Stack

  • BIOS/UEFI, BMC, firmware, drivers, kernel subsystems
  • Kubernetes, Slurm, Ray, Docker, Singularity
  • InfiniBand, RoCE

Team & Environment

You will be joining the Systems Development Engineering Team.

Benefits & Compensation

  • Health and wellness benefits detailed at MyWellatDell.com
  • Compensation range: $123k - $170k

Work Mode

This is an onsite role located in Austin, Texas.

Dell Technologies is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment.

Required Skills
BIOS/UEFIBMCfirmwaredriverskernel subsystemsKubernetesSlurmRayDockerSingularitysystem architectureCPU/GPU/accelerator internalsmemory systemsI/O subsystems
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Dell Technologies

Dell Technologies is a unique family of businesses that helps individuals and organizations transform how they work, live and play. They have delivered HPC solutions for 25+ years and are NVIDIA’s preferred partner for GenAI Factory systems.

Visit website
Job Details
Category embedded
Posted 11 days ago