As a Mid/Senior LLM Engineer at EverAI, you will be at the forefront of developing AI companionship technology that serves 30 million users and processes 5 million messages daily. You will fine-tune and optimize large language models to scale globally while maintaining personalized interactions.
What You'll Do
- Interact with stakeholders including Co-founders, Web Engineers, and DevOps Engineers to bring projects to life.
- Oversee the creation and optimization of algorithms for LLM behavior adjustments via fine-tuning and prompt engineering.
- Develop features to improve product richness, such as multi-character chats and gamification.
- Collaborate with team members managing other modalities like audio, image, and video.
- Adapt and fine-tune base models for multilingual support.
- Manage the creation and maintenance of diverse datasets critical for training and improving LLM performance.
- Assess and determine the best technological approaches, selecting between classifiers, fine-tuning, and other methods.
What We're Looking For
- 5+ years building production-grade, modular, and maintainable Python codebases.
- Deep expertise in LLM architecture, including transformers, attention mechanisms, positional encodings, samplers, tokenizers, and post-training.
- Expert-level experience with inference optimization at scale using vLLM or TensorRT-LLM, and a proven record of reducing latency and memory via quantization or distillation.
- Hands-on experience with distributed training using FSDP, DeepSpeed, or accelerate on multi-GPU/multi-node setups, including mixed-precision training and gradient checkpointing.
- Skilled at performance profiling and optimization, identifying compute or memory bottlenecks across CPU/GPU pipelines.
Nice to Have
- Strong concurrency and runtime engineering skills with asyncio or multiprocessing.
- Practical low-level systems experience with CUDA or Triton, including writing or debugging custom kernels.
- Contributions to open-source LLM tooling such as vLLM, Hugging Face Transformers, or Triton.
- Experience building or maintaining latency-critical, multi-user LLM services like RAG, streaming, agents, or chatbots.
- Exposure to specialized generation use cases like multi-turn instruction tuning or non-English quality alignment.
Technical Stack
- Python, vLLM, TensorRT-LLM, CUDA, Triton, FSDP, DeepSpeed, accelerate
- GPT-4, Mistral, Hugging Face
Team & Environment
You will join a team of 55+ people, interacting directly with stakeholders including Co-founders, Web Engineers, and DevOps Engineers.
Benefits & Compensation
- 4 weeks of PTO.
- Annual company gathering.
- A wellbeing budget of up to $200.
- Learning budget.
- Company laptop.
- Access to GPT-4, Mistral, and a Hugging Face Pro plan.
Work Mode
This role is fully remote and open to candidates worldwide.
EverAI is an equal opportunity employer.



