San Francisco, Singapore, Amsterdam Hybrid Employment $160,000 - $230,000

Together AI is hiring a LLM Inference Frameworks and Optimization Engineer

Responsibilities

  • Build and maintain distributed inference systems capable of handling high request volumes and supporting text, image, and multimodal models reliably.
  • Develop and refine distributed inference methods such as Mixture of Experts, tensor parallelism, and pipeline parallelism to maximize serving performance.
  • Improve inference speed and resource efficiency using CUDA graphs, TensorRT/TRT-LLM optimizations, PyTorch compilation, and speculative decoding techniques.
  • Partner with hardware teams to identify performance bottlenecks and jointly optimize inference workloads across GPUs, TPUs, and specialized accelerators.
  • Collaborate with AI researchers and infrastructure teams to create optimized execution strategies and streamline end-to-end model serving workflows.

Benefits

  • competitive compensation
  • startup equity
  • health insurance
  • other competitive benefits

Compensation

competitive compensation

About company
Together AI
Together AI is a research-driven artificial intelligence company that believes open and transparent AI systems will drive innovation. They are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models, and have contributed to leading open-source research, models, and datasets.
All jobs at Together AI Visit website
Job Details
Category other
Posted 8 days ago