About the Role
The role involves designing and improving software for large-scale language model inference, with a focus on performance, correctness, and integration within AI ecosystems.
Responsibilities
- Develop and optimize core components for large language model inference
- Improve runtime performance and memory efficiency on GPU architectures
- Collaborate on building scalable inference solutions for production use
- Implement new features in the inference engine for advanced model support
- Work closely with research and framework teams to integrate new technologies
- Diagnose and resolve performance bottlenecks in distributed systems
- Contribute to kernel optimization for tensor operations
- Support model compatibility across different AI frameworks
- Ensure correctness and numerical accuracy in inference outputs
- Participate in code reviews and maintain high code quality standards
- Document technical designs and system behavior for team reference
- Drive improvements in latency, throughput, and scalability
- Assist in debugging complex system-level issues
- Contribute to open-source projects related to inference
- Stay current with advancements in AI model architectures
- Collaborate on benchmarking and profiling tools
- Optimize for real-time and batch inference scenarios
- Support deployment in cloud and data center environments
- Work on model quantization and compression techniques
- Integrate security and safety features into inference pipelines
Nice to Have
- Advanced degree in computer science or related technical field
- Direct experience with large language model inference
- Contributions to open-source deep learning projects
- Experience with model quantization techniques
- Knowledge of safety and alignment in AI systems
- Familiarity with CI/CD pipelines for AI software
- Background in compiler optimization for AI workloads
- Experience with kernel fusion and graph optimization
- Published work in systems or AI conferences
- Leadership in technical design and architecture
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid work model
Team
Part of the accelerated computing and AI software team
About the Team
- The team focuses on building high-performance software for AI inference, enabling faster and more efficient deployment of large language models across industries.
- Work is centered on pushing the boundaries of what's possible with GPU-accelerated computing in AI applications.
Why Join Us
- Opportunity to work on cutting-edge AI technologies at scale.
- Collaborative environment with experts in systems, machine learning, and compilers.
- Impact on real-world AI deployment across cloud, enterprise, and research domains.
Available for qualified candidates
