San Fracisco , California Hybrid Full-time

TRM Labs is hiring a Machine Learning Infrastructure Engineer

About the Role

As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will design and operate scalable GPU-backed infrastructure that powers TRM’s AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning to enable high-throughput, production-grade ML workloads for blockchain intelligence and AI platforms.

What You'll Do

  • Design and operate GPU cluster infrastructure.
  • Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users.
  • Optimize high-throughput inference.
  • Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads.
  • Enable distributed inference strategies.
  • Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models.
  • Implement model optimization and compilation workflows.
  • Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost.
  • Schedule heterogeneous workloads.
  • Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand.
  • Build observability into ML infrastructure.
  • Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability.
  • Partner across engineering teams.
  • Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services.

What We're Looking For

  • Bachelor’s degree (or equivalent) in Computer Science or related field.
  • 5+ years of experience building and operating distributed systems or infrastructure in production environments.
  • Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
  • Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost.
  • Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
  • Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
  • Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
  • Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
  • Adaptable. Goals can change fast. You anticipate and react quickly.
  • Autonomous. You own what you work on. You move fast and get things done.
  • Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing.
  • Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization.

Nice to Have

  • Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus.
  • CUDA familiarity and experience debugging GPU-related issues is a plus.

Technical Stack

  • TensorRT, ONNX Runtime, vLLM, FlashAttention, Triton Inference Server, Ray Serve, HuggingFace Optimum, Kubernetes, AWS, GCP, NVIDIA GPUs, Inferentia

Team & Environment

  • Cross-functional team including data scientists, engineers, and product managers.

Benefits & Compensation

  • Opportunity to work on meaningful problems at the intersection of AI, national security, and fighting financial crime.
  • High-velocity, high-ownership environment with clarity, follow-through, and impact.
  • Frequent, high-touch communication and close collaboration across teams and functions.
  • Creative problem solving and out-of-the-box thinking encouraged.
  • Work environment that rewards urgency, adaptability, and outcomes.
  • Distributed-first company with hubs in multiple global locations.
  • Company values emphasize impact, craftsmanship, and collaboration.

Work Mode

Distributed-first company with hubs in multiple locations; implies flexible remote work with optional office presence. Locations include San Francisco, Los Angeles, New York, Washington D.C., London, Singapore.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via a provided form.

Required Skills
TensorRTONNX RuntimevLLMFlashAttentionTriton Inference ServerRay ServeHuggingFace OptimumKubernetesAWSGCPML Inference OptimizationGPU Cluster ManagementDistributed SystemsHigh-Throughput InferenceCloud Infrastructure TensorRTONNX RuntimevLLMFlashAttentionTriton Inference ServerRay ServeHuggingFace OptimumKubernetesAWSGCPML Inference OptimizationGPU Cluster ManagementDistributed SystemsHigh-Throughput InferenceCloud Infrastructure
Invoicing holding you back?

Focus on work, not paperwork

Stop worrying about invoicing, taxes, and compliance. Glopay handles the business setup, you handle the client work. Get paid faster and look professional.

Auto-generated compliant invoices
Built-in expense management
Income reports for tax season
95% of earnings stay with you
Try Glopay free
No credit card needed
About company
TRM Labs
TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s platforms enable tracing of funds, identification of illicit activity, case building, and threat visualization.
All jobs at TRM Labs Visit website
Job Details
Category infrastructure
Posted 2 months ago