As a Senior AI Infrastructure Engineer, you will be responsible for building and maintaining the core systems that enable AI models to operate efficiently in production. Your work will directly support high-throughput, low-latency inference for large language models, real-time computer vision, and voice-powered AI agents serving healthcare interactions at scale.
What You'll Do
You'll architect and manage Kubernetes environments tailored for GPU-intensive workloads, implementing autoscaling, efficient resource scheduling, and multi-model serving strategies. You'll own the deployment pipeline for diverse AI models, ensuring seamless integration from development to production.
You will optimize inference performance using techniques like speculative decoding, continuous batching, and model parallelism, balancing speed and cost. You'll also build and maintain infrastructure for real-time communication, including WebRTC clusters and low-latency speech-to-text and text-to-speech services.
Using Infrastructure as Code with Terra游戏副本


