New York; Montreal; San Francisco; Toronto Remote (Global) Employment

Cohere is hiring an Audio Inference Engineer, Model Efficiency

About the Role

Develop high-performance inference solutions for audio models with a focus on efficiency, speed, and scalability across diverse deployment environments.

Responsibilities

  • Optimize inference pipelines for audio-based machine learning models
  • Improve model latency and throughput without sacrificing accuracy
  • Collaborate with research teams to implement efficient model architectures
  • Profile and benchmark inference performance across hardware platforms
  • Develop compression and quantization techniques for audio models
  • Support deployment of models in production environments
  • Troubleshoot performance bottlenecks in real-time audio processing
  • Design low-latency audio preprocessing and feature extraction modules
  • Ensure scalability of inference systems under high load
  • Integrate models with backend serving infrastructure
  • Maintain high code quality and documentation standards
  • Work closely with ML researchers to adapt models for efficient inference
  • Evaluate trade-offs between model size, speed, and accuracy
  • Implement hardware-aware optimizations for CPUs and accelerators
  • Contribute to model versioning and deployment workflows
  • Analyze memory usage and reduce footprint of audio models
  • Support cross-platform compatibility for inference systems
  • Develop automated testing for inference correctness and performance
  • Stay current with advancements in model compression and inference
  • Collaborate on defining best practices for efficient model deployment

Nice to Have

  • Master’s or PhD in a relevant technical field
  • Experience with speech recognition or audio generation models
  • Contributions to open-source machine learning projects
  • Prior work optimizing transformer models for inference
  • Familiarity with WebAssembly or JavaScript for inference
  • Experience with on-device audio model deployment
  • Knowledge of acoustic modeling techniques
  • Background in distributed inference systems

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid or remote options available

Team

Part of the core machine learning systems team focused on optimizing inference performance

What We Value

  • Technical excellence paired with practical problem-solving
  • Ownership of system performance and reliability
  • Clear communication across technical and non-technical stakeholders
  • Continuous learning and adaptation to new research

Impact

  • Your work will directly influence the speed and efficiency of audio AI systems
  • Optimizations will enable broader deployment across devices and platforms

Available for qualified candidates

Ready to relocate and code from paradise?

Thailand or Vietnam — your office, your rules

Iglu offers relocation to Bangkok, Chiang Mai, Ho Chi Minh City, or Hong Kong. Full employment, legal setup, and a community of 200+ digital professionals.

Relocation to 5 countries
Full legal work setup
Developer community access
Work-life balance culture
Explore locations
Relocation support included
About company
Cohere
Cohere trains and deploys frontier AI models for developers and enterprises building systems for content generation, semantic search, RAG, and agents.
All jobs at Cohere Visit website
Job Details
Department Modeling
Category other
Posted 6 months ago