Palo Alto, California, United States Employment USD 180,000 - 440,000 Yearly

xAI is hiring a Software Engineer

About the Role

xAI is seeking a Software Engineer for the ML and Data Infrastructure team. This team builds the foundational infrastructure for frontier AI models and truth-seeking agents. You will collaborate with pre-training, multimodal, reasoning, and product teams to tackle ambiguous, high-stakes problems in a fast-paced, meritocratic environment.

What You'll Do

  • Design, build, and operate petabyte-to-exabyte scale distributed systems for data acquisition, web crawling, preprocessing, filtering, classification, and multimodal pipelines.
  • Architect high-performance search and retrieval engines at trillion-document scale, integrated with LLMs and agents for truth-seeking, low-hallucination reasoning, and real-time knowledge access.
  • Develop reliable inference serving infrastructure: load balancing, autoscaling, KV cache, batching, fault-tolerance, monitoring, CI/CD, and benchmarking for 100% uptime and optimal tail latency.
  • Optimize low-level performance: CUDA kernels, Triton and CUTLASS extensions, quantization, distillation, speculative decoding, GPU memory hierarchy, and model-hardware co-design for next-generation architectures.
  • Innovate on compilers, runtimes, distributed profiling and debugging tools, and interconnect fabrics.
  • Manage complex workloads across clouds and clusters: orchestration, data bookkeeping and verifiability, high-speed interconnect validation, failure analysis, and telemetry and automation for production reliability.

What We're Looking For

  • Strong systems engineering skills with a proven impact on large-scale distributed infrastructure.
  • Proficiency in Python and at least one compiled language (Rust, C++, Go, or Java); experience building bespoke libraries, optimizing performance, and debugging complex systems.
  • Hands-on experience with at least one key area: petabyte-scale data pipelines and crawling, web-scale search and retrieval, inference optimization, compiler features, or high-speed interconnects.
  • Deep understanding of distributed systems challenges: high-throughput operations per second, latency and throughput tradeoffs, fault-tolerance, monitoring, and scaling to production billions-of-users or 100,000+ GPU clusters.
  • Passion for AI infrastructure: keeping up with state-of-the-art techniques, first-principles problem-solving, meticulous organization and bookkeeping, and delivering rigorous, high-quality results.

Nice to Have

  • Experience with multimodal data, epistemics and truth-seeking in retrieval, or agentic systems.
  • Low-level optimizations: CUDA kernel development, GPU profiling, low-precision numerics, or interconnect pathfinding.
  • Production expertise in inference reliability, CI/CD for ML, or cluster networking.
  • A track record of owning end-to-end projects in hyperscale environments, with strong debugging, vendor management, or open-source contributions.

Technical Stack

  • Languages: Python, Rust, C++, Go, Java
  • Infrastructure: Spark, Ray, Kubernetes
  • ML/Performance: CUDA, Triton, CUTLASS, JAX, XLA, MLIR
  • Ops & Observability: Prometheus, Grafana, Buildkite, ArgoCD

Team & Environment

You will join a small team within a flat organizational structure. The company culture is highly motivated and focused on engineering excellence. All employees are expected to be hands-on and contribute directly to the company’s mission to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

Benefits & Compensation

  • Total compensation range: $180,000 - $440,000 USD
  • Equity
  • Comprehensive medical, vision, and dental coverage
  • 401(k) retirement plan
  • Short-term and long-term disability insurance
  • Life insurance
  • Various other discounts and perks

xAI is an equal opportunity employer.

Required Skills
PythonRustC++GoJavaSparkRayKubernetesCUDATritonDistributed SystemsPerformance OptimizationData PipelinesInference OptimizationHigh-Speed Interconnects
Visa expiring soon?

Extend or switch without leaving Thailand

Running out of time on your current visa? SVBL identifies your best option — extension, category switch, or long-term visa — and handles the entire process.

Visa extensions & category switches
LTR & DTV visa applications
90-day reporting managed
Overstay prevention
Check your options
Prevent overstay issues
About company
xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

Visit website
Job Details
Department Software Development
Category infrastructure
Posted 14 days ago