Responsibilities
- Design and build end-to-end infrastructure for training, evaluation, and productionization of ML models, working closely with our HPC engineers who manage our on-prem compute cluster
- Influence foundational choices around data access, compute orchestration, experiment tracking, model versioning, and deployment pipelines
- Partner with quant researchers to accelerate iteration cycles, tighten feedback loops, and bring models from prototype to live trading
- Work with researchers to adapt and deploy modern architectures — transformers, state-space models, temporal convolutions, graph neural networks — to noisy, high-frequency financial data. Explore techniques like self-supervised pretraining, representation learning, and cross-sectional modelling where they offer genuine edge
- Shape our approach to reproducibility, continual learning, and production monitoring across a petabyte-scale data environment
- Define standards that create consistency across teams and geographies; mentor engineers and influence technical culture beyond your immediate work
- Keep pace with developments in deep learning research and ML infrastructure; bring ideas from academia and industry into how we work — whether that's new architectures, training techniques, or tooling
Requirements
- 8+ years of experience building ML platforms or infrastructure at a leading tech company, research lab, or quantitative firm
- A track record of designing and owning large-scale training and inference systems — not just contributing, but architecting
- Deep proficiency in Python, with strong experience in either CUDA or C++
- Hands-on expertise with modern deep learning frameworks (PyTorch, TensorFlow, or JAX) and practical experience implementing architectures like transformers, attention mechanisms, or sequence models
- Strong foundation in deep learning fundamentals: optimization, regularization, loss design, and the trade-offs that matter when training at scale
- Experience with distributed training at scale (Horovod, NCCL) and GPU optimization (cuDNN, TensorRT)
- History of deploying models to production with strong observability, reproducibility, and monitoring practices
- Comfort working across the ML stack from data pipelines to training infrastructure to serving systems
Benefits
- Build, don't inherit — You'll make foundational technology choices in a platform that's still being defined, not maintain someone else's legacy
- Real investment, real backing — This is a strategic priority with resources behind it, not a side experiment
- Direct impact on trading — Your infrastructure will power models that make real trading decisions in competitive global markets
- Global scope — Work with teams across New York, Chicago, Amsterdam, London, Sydney, Hong Kong and beyond; define practices that can scale worldwide
- Ideas over titles — IMC's culture values clarity, rigor, and collaboration. The best ideas win, regardless of where they come from
- Tight coupling with research — You won't be building in isolation. Researchers and engineers work side-by-side, iterating together