What You'll Do
Own the end-to-end deployment of neural text-to-speech models, ensuring they operate reliably at scale with minimal latency. You'll refine inference pipelines using advanced post-training methods to enhance both audio fidelity and system throughput. Work closely with engineering and research teams to integrate new capabilities, run controlled experiments, and continuously improve live systems. Design robust, scalable infrastructure that supports expressive, multi-speaker, and controllable voice synthesis. Establish clear standards for monitoring, reliability, and performance optimization across production environments.
Requirements
- Proven experience deploying large neural TTS models in production, either on cloud platforms or on-premises.
- Deep technical knowledge of inference optimization techniques including quantization, kernel tuning, and efficient batching.
- Familiarity with real-time audio processing constraints and strategies to maintain quality under low-latency demands.
- Strong grasp of distributed systems, GPU utilization, and scalable backend architectures.
- Ability to troubleshoot and resolve issues affecting voice quality, system performance, or uptime.
- Adaptability to fast-moving environments with a hands-on approach to system ownership.
Preferred Qualifications
- Contributions to open-source TTS or audio processing frameworks.
- Background in telephony, live communication systems, or enterprise voice applications.
Benefits
- Comprehensive healthcare coverage including dental and vision
- Meaningful equity in a rapidly growing company
- Access to all necessary tools and equipment for effective work
- Hybrid work model with options for remote work within the U.S. or in-office collaboration in San Francisco
- Modern office space located in Jackson Square, SF, featuring rooftop views