About the Role

Develop high-performance inference solutions for audio models with a focus on efficiency, speed, and scalability across diverse deployment environments.

Responsibilities

Optimize inference pipelines for audio-based machine learning models
Improve model latency and throughput without sacrificing accuracy
Collaborate with research teams to implement efficient model architectures
Profile and benchmark inference performance across hardware platforms
Develop compression and quantization techniques for audio models
Support deployment of models in production environments
Troubleshoot performance bottlenecks in real-time audio processing
Design low-latency audio preprocessing and feature extraction modules
Ensure scalability of inference systems under high load
Integrate models with backend serving infrastructure
Maintain high code quality and documentation standards
Work closely with ML researchers to adapt models for efficient inference
Evaluate trade-offs between model size, speed, and accuracy
Implement hardware-aware optimizations for CPUs and accelerators
Contribute to model versioning and deployment workflows
Analyze memory usage and reduce footprint of audio models
Support cross-platform compatibility for inference systems
Develop automated testing for inference correctness and performance
Stay current with advancements in model compression and inference
Collaborate on defining best practices for efficient model deployment

Nice to Have

Master’s or PhD in a relevant technical field
Experience with speech recognition or audio generation models
Contributions to open-source machine learning projects
Prior work optimizing transformer models for inference
Familiarity with WebAssembly or JavaScript for inference
Experience with on-device audio model deployment
Knowledge of acoustic modeling techniques
Background in distributed inference systems

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid or remote options available

Team

Part of the core machine learning systems team focused on optimizing inference performance

What We Value

Technical excellence paired with practical problem-solving
Ownership of system performance and reliability
Clear communication across technical and non-technical stakeholders
Continuous learning and adaptation to new research

Impact

Your work will directly influence the speed and efficiency of audio AI systems
Optimizations will enable broader deployment across devices and platforms

Available for qualified candidates

Cohere is hiring an Audio Inference Engineer, Model Efficiency

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

What We Value

Impact

Thailand or Vietnam — your office, your rules

Similar Jobs

Clinical Sales Representative

Regional Sales Manager - PNW

Remote ESL Teacher - Guaranteed Hours | Online Teaching Jobs Costa Rica

Business Consultant, Exit Strategy (WA)

Foreign Affairs Communications Manager

Social Media and Digital Strategy Manager