Design, implement, and maintain SFT and RL post-training pipelines for multi-step coding agents.
Train and adapt LLMs for agent workflows, including planning, tool use, and multi-step interactions inside JetBrains IDEs.
Build and develop evaluation and simulation environments where coding agents can act, be measured, and compared on realistic developer tasks.
Design evaluation frameworks and metrics for agent behavior, analyze traces and logs, and close the loop from evaluation back into training, data, and reward design.
Analyze training and evaluation results to propose and implement improvements to model architectures, training recipes, and datasets.
Work with large-scale infrastructure, including distributed training on GPU clusters and large MapReduce-style data processing for pre-training and fine-tuning datasets.
Collaborate closely with research, product, and infrastructure teams to turn high-level product visions into concrete models, experiments, and shipped features.

Extensive hands-on experience training LLMs (pre-training, fine-tuning, or post-training) in a research or production setting.
Deep expertise in modern deep learning frameworks such as PyTorch, and specialized LLM training stacks (e.g. Megatron, NeMo, verl, or similar).
Strong theoretical and practical understanding of LLM fundamentals: architectures, tokenization, data pipelines, batching, mixed precision, distributed training, and debugging unstable runs.
The ability to own projects end to end, starting from a high-level problem or product pain point and overseeing it through the design, experimentation, implementation, and iteration phases.
A product-aware mindset – you care about how developers actually use agents and can translate product needs and failure modes into modeling and evaluation work.
At least 3 years of Python experience writing clean, maintainable code in modern ML codebases.

ML orchestrators and workflow tools such as Kubeflow, Dagster, Airflow, ZenML, and/or job schedulers like Kubernetes or SLURM.
Large-scale data and training pipelines, e.g. MapReduce-style clusters, multi-node GPU training, or workloads on the order of 1M+ CPU/GPU hours.
Designing and maintaining evaluation pipelines for LLMs or agents, including metrics, dashboards, experiment tracking, and automated regression checks.
AI agent development, such as tool-using agents, planners, or multi-step coding workflows, and familiarity with agentic frameworks or patterns.
Experiment tracking and observability using tools like Weights & Biases, MLflow, Langfuse, or similar.
Inference optimization and serving optimized models in production.

JetBrains is hiring a Research Engineer (Agentic Models)