Responsibilities
- Partner with internal development and product teams to maintain deep knowledge of emerging features and support external adoption.
- Support the deployment and optimization of AI workloads across large-scale GPU platforms.
- Diagnose and resolve performance challenges in complex computing environments.
- Evaluate new capabilities in training frameworks through rigorous benchmarking and performance analysis.
- Deliver data-driven recommendations to clients and internal stakeholders based on technical findings.
- Provide expert guidance to customers scaling AI and HPC applications on current-generation GPU infrastructure.
- Collaborate directly with clients to troubleshoot cluster-level performance and reliability issues.
- Identify system bottlenecks and implement scalable fixes in production environments.
- Enable efficient workload execution by aligning customer implementations with best practices.
- Share insights on framework improvements derived from real-world testing and client feedback.
- Drive adoption of optimized configurations for AI training and inference workloads.
- Support the integration of advanced resilience mechanisms in customer AI pipelines.
- Act as a technical liaison between customer teams and product development units.
- Contribute to performance tuning strategies tailored to specific application needs.
- Ensure smooth onboarding of partners onto new platform capabilities.
- Promote effective use of tools and libraries for maximizing GPU utilization.
- Assist in validating framework updates in customer-relevant scenarios.
- Facilitate knowledge transfer through technical workshops and documentation.
- Analyze system-level metrics to guide infrastructure improvements.
- Support the development of robust, high-throughput computing environments.
- Guide customers in leveraging hardware advancements for AI scalability.
- Collaborate on resolving low-level system integration challenges.
- Improve training pipeline efficiency through targeted optimizations.
- Support cross-functional efforts to enhance platform reliability.
- Contribute technical expertise to Europe’s Sovereign AI program
Work Arrangement
Remote (Country)


