Responsibilities
- Lead full lifecycle execution of high-impact initiatives, from concept through deployment, including defining scope, estimating timelines, designing system architecture, and evaluating emerging technologies.
- Develop and sustain robust data and machine learning pipelines that adapt to changing business and modeling demands, ensuring reliability and efficiency.
- Enhance training pipeline performance by optimizing for speed, memory usage, and cost, including leveraging spot instances, efficient data loading, and reuse of preprocessed data.
- Empower data scientists by delivering reusable, tested components such as transformers, data loaders, and training tools, while guiding contributions to shared codebases.
- Expand data pipeline capabilities by incorporating new data sources, extending feature time windows, and scaling training datasets within existing infrastructure limits.
- Support deep learning workflows by managing GPU-based training, implementing custom training loops in PyTorch, and assisting with model architecture design.
- Ensure consistency and reproducibility across experimentation and production using version-controlled configurations, experiment logging, and alignment between offline and online systems.
- Work with infrastructure teams to improve system scalability through resource management, monitoring, and CI/CD modernization, while participating in on-call rotations to address pipeline alerts.
Benefits
- Comprehensive benefits package tailored to the employee's country of residence
Work Arrangement
Hybrid — Paris, Helsinki
Responsibilities
- Lead full lifecycle execution of high-impact initiatives, from concept through deployment, including defining scope, estimating timelines, designing system architecture, and evaluating emerging technologies.
- Develop and sustain robust data and machine learning pipelines that adapt to changing business and modeling demands, ensuring reliability and efficiency.
- Enhance training pipeline performance by optimizing for speed, memory usage, and cost, including leveraging spot instances, efficient data loading, and reuse of preprocessed data.
- Empower data scientists by delivering reusable, tested components such as transformers, data loaders, and training tools, while guiding contributions to shared codebases.
- Expand data pipeline capabilities by incorporating new data sources, extending feature time windows, and scaling training datasets within existing infrastructure limits.
- Support deep learning workflows by managing GPU-based training, implementing custom training loops in PyTorch, and assisting with model architecture design.
- Ensure consistency and reproducibility across experimentation and production using version-controlled configurations, experiment logging, and alignment between offline and online systems.
- Work with infrastructure teams to improve system scalability through resource management, monitoring, and CI/CD modernization, while participating in on-call rotations to address pipeline alerts.
Benefits
Comprehensive benefits package tailored to the employee's country of residence