Dell Technologies is hiring a Senior System Development Engineer – AI Technologies. You will design and implement complex system requirements with a focus on AI technologies. This role involves leading the bring-up, validation, and debugging of system platforms that support demanding AI workloads, including AI clusters and rack-level operations.
What You'll Do
- Lead bring‑up, configuration, and validation of system platforms supporting AI workloads, including servers, GPU racks, accelerators, and networking fabrics.
- Work with BIOS/UEFI, BMC, firmware, drivers, and kernel subsystems to ensure system readiness for large‑scale AI deployments.
- Perform hardware–software co-validation of CPUs, GPUs, DPUs, NICs, accelerators, and memory subsystems under AI‑heavy workloads.
- Validate PCIe fabric behavior, NUMA topology, and data‑path efficiency for model training and inference.
- Diagnose complex issues across BIOS, firmware, OS, driver stack, container runtime, orchestration layer, and AI frameworks.
- Analyze system logs, kernel traces, hardware event telemetry, GPU health signals, and fabric diagnostics.
- Conduct root‑cause analysis of performance bottlenecks, training failures, model divergence, and hardware stability issues.
- Collaborate with silicon, firmware, OS, and AI software teams to resolve issues rapidly.
- Deploy and manage AI clusters: GPU servers, accelerators, high‑speed networking (InfiniBand, RoCE), and storage systems.
- Validate cluster readiness for distributed training, including bandwidth, latency, topology checks, and gradient‑sync performance.
- Work with orchestration systems like Kubernetes, Slurm, Ray, Docker, and Singularity to run and optimize AI pipelines.
- Partner with data center teams for rack integration, power/thermal analysis, and capacity planning.
- Execute and analyze standard AI benchmarks like MLPerf Training, MLPerf Inference, and SPEC AI Benchmarks.
- Build custom benchmarks for transformer models, LLMs, computer vision, multimodal models, and recommendation systems.
- Interpret results to provide optimization recommendations at the hardware, OS, driver, and framework levels.
- Document findings and drive improvements across the platform and AI software ecosystem.
What We're Looking For
- Bachelor’s or Master’s degree in Computer Engineering, Computer Science, Electrical Engineering, or a related field.
- 5+ years of experience in system engineering, platform development, or hardware–software validation.
- Strong understanding of system architecture, CPU/GPU/accelerator internals, memory systems, and I/O subsystems.
Technical Stack
- BIOS/UEFI, BMC, firmware, drivers, kernel subsystems
- Kubernetes, Slurm, Ray, Docker, Singularity
- InfiniBand, RoCE
Team & Environment
You will be joining the Systems Development Engineering Team.
Benefits & Compensation
- Health and wellness benefits detailed at MyWellatDell.com
- Compensation range: $123k - $170k
Work Mode
This is an onsite role located in Austin, Texas.
Dell Technologies is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment.





