Remote (Country)

NVIDIA is hiring a Senior Solutions Architect, HPC and AI

Responsibilities

  • Partner with internal development and product teams to maintain deep knowledge of emerging features and support external adoption.
  • Support the deployment and optimization of AI workloads across large-scale GPU platforms.
  • Diagnose and resolve performance challenges in complex computing environments.
  • Evaluate new capabilities in training frameworks through rigorous benchmarking and performance analysis.
  • Deliver data-driven recommendations to clients and internal stakeholders based on technical findings.
  • Provide expert guidance to customers scaling AI and HPC applications on current-generation GPU infrastructure.
  • Collaborate directly with clients to troubleshoot cluster-level performance and reliability issues.
  • Identify system bottlenecks and implement scalable fixes in production environments.
  • Enable efficient workload execution by aligning customer implementations with best practices.
  • Share insights on framework improvements derived from real-world testing and client feedback.
  • Drive adoption of optimized configurations for AI training and inference workloads.
  • Support the integration of advanced resilience mechanisms in customer AI pipelines.
  • Act as a technical liaison between customer teams and product development units.
  • Contribute to performance tuning strategies tailored to specific application needs.
  • Ensure smooth onboarding of partners onto new platform capabilities.
  • Promote effective use of tools and libraries for maximizing GPU utilization.
  • Assist in validating framework updates in customer-relevant scenarios.
  • Facilitate knowledge transfer through technical workshops and documentation.
  • Analyze system-level metrics to guide infrastructure improvements.
  • Support the development of robust, high-throughput computing environments.
  • Guide customers in leveraging hardware advancements for AI scalability.
  • Collaborate on resolving low-level system integration challenges.
  • Improve training pipeline efficiency through targeted optimizations.
  • Support cross-functional efforts to enhance platform reliability.
  • Contribute technical expertise to Europe’s Sovereign AI program

Work Arrangement

Remote (Country)

Required Skills
CC++PythonCUDASLURMNsight SystemsNsight ComputeNCCLMPIPyTorchHPCAISolutions ArchitectureParallel ComputingPerformance Analysis
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category other
Posted 7 months ago