About the Role
The role involves designing, implementing, and optimizing CUDA kernels to achieve maximum performance on GPU architectures, with a strong emphasis on low-level code efficiency and scalability.
Responsibilities
- Design and write efficient CUDA kernels for parallel computing tasks
- Optimize existing GPU code for improved throughput and reduced latency
- Collaborate with performance engineers to profile and benchmark kernel execution
- Debug and resolve issues in GPU-accelerated applications
- Analyze hardware limitations and adapt algorithms accordingly
- Implement memory access patterns that maximize bandwidth utilization
- Work with minimal supervision in a fast-paced technical environment
- Contribute to code reviews and technical design discussions
- Ensure kernel compatibility across different GPU architectures
- Translate algorithmic specifications into high-performance GPU code
- Maintain documentation for kernel functionality and performance characteristics
- Respond to performance regressions with targeted optimizations
- Stay current with advancements in GPU computing and CUDA capabilities
- Support integration of CUDA modules into larger software systems
- Assist in defining best practices for GPU programming within the team
Nice to Have
- Experience with HPC workloads or scientific computing
- Background in algorithm acceleration on GPUs
- Familiarity with modern CUDA features like cooperative groups
- Knowledge of GPU memory hierarchy and caching behavior
- Experience with cross-platform GPU development
- Exposure to machine learning or data-intensive applications
- Contributions to open-source GPU computing projects
- Understanding of power and thermal constraints in GPU systems
Compensation
Competitive salary based on experience
Work Arrangement
Remote within the United States
Team
Small, focused engineering team working on high-performance computing solutions
What You’ll Do
- Develop and refine CUDA kernels to meet strict performance targets
- Work closely with system architects to align kernel design with hardware capabilities
- Use profiling data to guide optimization efforts and validate improvements
What We Look For
- Deep technical expertise in GPU programming models
- A methodical approach to performance analysis and tuning
- Commitment to writing clean, maintainable, and efficient code
Not available