Red Hat is seeking a Principal Software Engineer to work on advanced AI/ML applications and agent systems, leveraging modern inference platforms to build production-ready prototypes. This role involves deep technical contributions to open source communities and providing leadership across engineering teams in a globally distributed environment.
What You'll Do
- Build high-quality, high-performing AI/ML applications and agent systems using modern inference platforms for multi-modal and distributed model serving
- Apply and optimize inference techniques including KV cache management, model quantization, and distributed serving to production workloads
- Contribute to upstream inference runtime communities such as vLLM, TGI, PyTorch, OpenVINO, and related projects
- Build multi-modal AI applications integrating vision, language, and other modalities
- Provide technical leadership and coordination across multiple stakeholders and engineering teams
- Apply a growth mindset by staying current with rapid advancements in AI/ML inference technologies
- Benchmark and analyze inference performance at scale, driving data-driven optimization decisions
- Publicize innovations through blogs, presentations, conferences, and other technical venues
What We're Looking For
- Bachelor's degree in Computer Science, Engineering, or equivalent experience
- 5+ years of experience in AI/ML engineering with focus on production inference systems
- Deep expertise in PyTorch and modern deep learning frameworks
- Hands-on experience with inference runtime optimization (model serving, batching, KV cache management)
- Advanced programming skills in Python and C++
- Proven ability to contribute to and lead open source projects
- Strong self-motivation and organizational skills
- Ability to work concurrently on multiple projects, independently and within a team environment
- Excellent English written and verbal communication skills
- Collaborative attitude and willingness to share ideas openly
Nice to Have
- Experience with vLLM, TGI (Text Generation Inference), or similar inference runtimes
- Contributions to PyTorch, OpenVINO, or other inference frameworks
- Experience with distributed model serving and GPU optimization
- Familiarity with Kubernetes and cloud-native AI/ML deployments
- Knowledge of model quantization techniques (GPTQ, AWQ, FP8, etc.)
- Experience with CUDA, Triton, or other GPU programming frameworks
- Experience with diffusion models and diffusion transformers
- Experience building AI agents and agentic systems
Technical Stack
- vLLM
- TGI
- PyTorch
- OpenVINO
- Python
- C++
- Kubernetes
- CUDA
- Triton
- Model quantization (GPTQ, AWQ, FP8)
- Distributed model serving
- GPU optimization
- Cloud-native AI/ML deployments
- Diffusion models
- Diffusion transformers
- Multi-modal AI
Benefits & Compensation
- Flexible work environments (in-office, office-flex, fully remote depending on role)
- Opportunity to work across 40+ countries
- Inclusive and open culture based on open source principles
- Encouragement to bring best ideas regardless of title or tenure
- Support for individuals with disabilities including reasonable accommodations
- Equal opportunity and affirmative action employment policy
Work Mode
Work environments vary by role: in-office, office-flex, or fully remote. This position is based in Ireland.
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
