San Francisco, Canada On-site Full-time USD 180,000 – 250,000 / year

fal is hiring a Software Engineer, Distributed Systems

Responsibilities

  • Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
  • Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
  • Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
  • Profile and tune low level CPU and memory performance

Requirements

  • 3+ years experience building distributed compute and orchestration platforms in Python or Rust
  • Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
  • Deep understanding of computational complexity and memory allocation
  • Track record of designing systems that scale under real production load
  • Experience building and using observability to drive performance and reliability decisions
  • Excellent communication and ability to drive technical decisions across teams
  • Self-starter who executes quickly, takes ownership, and constantly seeks improvement

Nice to Have

  • Experience with AI/ML inference or training infrastructure
  • Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
  • Background in building multi-tenant compute platforms
  • Understanding of networking fundamentals and performance characteristics
  • Familiarity with GPU workload characteristics and scheduling constraints

Benefits

  • Health, dental, and vision insurance (US)
  • Regular team events and offsites

Compensation

Compensation - $180,000-250,000 plus equity + benefits (This range is across all 3 levels Mid, Senior and Staff)

Work Arrangement

On-site — San Francisco, CA

Additional Information

  • We offer relocation assistance to San Francisco.
  • Willing to consider remote for Senior and Staff levels
About company
fal
fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products.
All jobs at fal Visit website
Job Details
Department Engineering
Category other
Posted a year ago