London Hybrid Full-time

Perplexity is hiring a Member of Technical Staff (AI Infrastructure Engineer)

Responsibilities

  • Architect, launch, and manage scalable Kubernetes clusters supporting AI model training and inference workloads
  • Operate and enhance Slurm-based high-performance computing environments used for distributed training of large language models
  • Create reliable APIs and workflow orchestration systems for training pipelines and inference platforms
  • Develop job scheduling and resource allocation systems across diverse computing infrastructures
  • Evaluate system performance, identify performance constraints, and apply optimizations in training and inference environments
  • Design monitoring, alerting, and observability frameworks specific to machine learning workloads on Kubernetes and Slurm
  • Respond promptly to infrastructure failures and coordinate with cross-functional teams to ensure continuous operation of critical AI systems
  • Improve cluster efficiency and implement dynamic autoscaling to meet fluctuating workload demands
About company
Perplexity
Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
All jobs at Perplexity Visit website
Job Details
Department AI
Category infrastructure
Posted 2 months ago