Remote (Global) Full-time

EverAI is hiring a Mid/Senior LLM Engineer (Remote - Worldwide)

About the Role

As a Mid/Senior LLM Engineer at EverAI, you will be at the forefront of developing AI companionship technology that serves 30 million users and processes 5 million messages daily. You will fine-tune and optimize large language models to scale globally while maintaining personalized interactions.

What You'll Do

  • Interact with stakeholders including Co-founders, Web Engineers, and DevOps Engineers to bring projects to life.
  • Oversee the creation and optimization of algorithms for LLM behavior adjustments via fine-tuning and prompt engineering.
  • Develop features to improve product richness, such as multi-character chats and gamification.
  • Collaborate with team members managing other modalities like audio, image, and video.
  • Adapt and fine-tune base models for multilingual support.
  • Manage the creation and maintenance of diverse datasets critical for training and improving LLM performance.
  • Assess and determine the best technological approaches, selecting between classifiers, fine-tuning, and other methods.

What We're Looking For

  • 5+ years building production-grade, modular, and maintainable Python codebases.
  • Deep expertise in LLM architecture, including transformers, attention mechanisms, positional encodings, samplers, tokenizers, and post-training.
  • Expert-level experience with inference optimization at scale using vLLM or TensorRT-LLM, and a proven record of reducing latency and memory via quantization or distillation.
  • Hands-on experience with distributed training using FSDP, DeepSpeed, or accelerate on multi-GPU/multi-node setups, including mixed-precision training and gradient checkpointing.
  • Skilled at performance profiling and optimization, identifying compute or memory bottlenecks across CPU/GPU pipelines.

Nice to Have

  • Strong concurrency and runtime engineering skills with asyncio or multiprocessing.
  • Practical low-level systems experience with CUDA or Triton, including writing or debugging custom kernels.
  • Contributions to open-source LLM tooling such as vLLM, Hugging Face Transformers, or Triton.
  • Experience building or maintaining latency-critical, multi-user LLM services like RAG, streaming, agents, or chatbots.
  • Exposure to specialized generation use cases like multi-turn instruction tuning or non-English quality alignment.

Technical Stack

  • Python, vLLM, TensorRT-LLM, CUDA, Triton, FSDP, DeepSpeed, accelerate
  • GPT-4, Mistral, Hugging Face

Team & Environment

You will join a team of 55+ people, interacting directly with stakeholders including Co-founders, Web Engineers, and DevOps Engineers.

Benefits & Compensation

  • 4 weeks of PTO.
  • Annual company gathering.
  • A wellbeing budget of up to $200.
  • Learning budget.
  • Company laptop.
  • Access to GPT-4, Mistral, and a Hugging Face Pro plan.

Work Mode

This role is fully remote and open to candidates worldwide.

EverAI is an equal opportunity employer.

Required Skills
PythonvLLMTensorRT-LLMCUDATritonFSDPDeepSpeedaccelerateGPT-4MistralLLMNLPDistributed TrainingModel Optimization
Scaling your freelance income?

Invoice multiple clients effortlessly

Managing 3+ international clients? Glopay streamlines everything. One EU company, unlimited invoices, automatic compliance. You just send and get paid.

Unlimited clients & invoices
Multi-currency support
Automated tax compliance
Client portal for easy payments
Scale with Glopay
Trusted by 10,000+ freelancers
About company
EverAI

EverAI is building the future of AI companionship and is one of the Top 15 Largest & Fastest-Growing AI Companies in the World. Their flagship product is the world’s largest AI companionship platform, redefining relationships for millions of users. The company has 50 million users and is governed by their proprietary moderation system, EverGuard, an internal AI designed to ensure everything they build is safe, ethical, and human-first.

Visit website
Job Details
Category data
Posted 8 months ago