Responsibilities
- Work across the full stack of audio ML, developing audio codecs and representations, sourcing and synthesizing high quality audio data, training large-scale speech language models and large audio diffusion models, and developing novel architectures for incorporating continuous signals into LLMs
- Focus primarily but not exclusively on speech, building advanced steerable systems spanning end-to-end conversational systems, speech and audio understanding models, and speech synthesis capabilities
- Work closely with many collaborators across pretraining, finetuning, reinforcement learning, production inference, and product to get advanced audio technologies from early research to high impact real-world deployments
Requirements
- Have hands-on experience with training audio models, whether that's conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, or generative audio models
- Genuinely enjoy both research and engineering work, and you'd describe your ideal split as roughly 50/50 rather than heavily weighted toward one or the other
- Are comfortable working across abstraction levels, from signal processing fundamentals to large-scale model training and inference optimization
- Have deep expertise with JAX, PyTorch, or large-scale distributed training, and can debug performance issues across the full stack
- Communicate clearly and collaborate effectively; audio touches many parts of our systems, so you'll work closely with teams across the company
- Are passionate about building conversational AI that feels natural, steerable, and safe
- Care about the societal impacts of voice AI and want to help shape how these systems are developed responsibly
Nice to Have
- Large language model pretraining and finetuning
- Training diffusion models for image and audio generation
- Reinforcement learning for large language models and diffusion models
- End-to-end system optimization, from performance benchmarking to kernel optimization
- GPUs, Kubernetes, PyTorch, or distributed training infrastructure
Team
Structure: Audio team