webAI is hiring a Staff Software Engineer - Runtime

About the Role

This position involves leading the development and refinement of runtime infrastructure that powers scalable AI execution, ensuring efficiency, reliability, and performance across distributed environments.

Responsibilities

  • Design and implement core components of runtime systems
  • Optimize execution performance for AI and machine learning workloads
  • Collaborate with cross-functional teams to define system requirements
  • Diagnose and resolve complex performance bottlenecks
  • Contribute to architectural decisions for scalable infrastructure
  • Ensure runtime compatibility across diverse hardware environments
  • Develop tooling to monitor and improve system behavior
  • Lead code reviews and set engineering best practices
  • Mentor junior engineers in systems programming and design
  • Work closely with research teams to integrate new AI models
  • Improve fault tolerance and system resilience
  • Drive automation in testing and deployment pipelines
  • Maintain detailed technical documentation
  • Evaluate emerging technologies for runtime improvements
  • Support security and compliance requirements in execution layers
  • Participate in incident response and on-call rotations
  • Refactor legacy components for better maintainability
  • Contribute to open-source projects when applicable
  • Ensure backward compatibility during system upgrades
  • Collaborate on debugging low-level system issues
  • Improve startup and execution latency
  • Work with containerization and orchestration technologies
  • Integrate observability into runtime components
  • Support deployment across cloud and edge environments
  • Balance feature development with technical debt reduction

Compensation

Competitive salary and equity package

Work Arrangement

Hybrid work model with flexibility for remote or on-site collaboration

Team

Part of a core engineering team focused on runtime systems and performance optimization

About the Team

The Runtime team builds the foundational execution layer that powers AI inference and training workflows. We focus on speed, efficiency, and scalability across heterogeneous environments.

Tech Stack

  • Primary languages: C++, Rust
  • Infrastructure: Kubernetes, Docker, Prometheus
  • Cloud platforms: AWS, GCP
  • Monitoring: Grafana, OpenTelemetry
  • CI/CD: GitHub Actions, ArgoCD

Growth Opportunities

  • Opportunities to lead major system redesigns
  • Present technical work to broader engineering groups
  • Contribute to strategic planning for runtime evolution
  • Mentor engineers across multiple teams

Sponsorship available for qualified candidates requiring work authorization

Required Skills
GoMQTTKafkaDistributed SystemsMicroservicesCloud InfrastructureAPI DesignPerformance OptimizationCI/CDMonitoringSecurity
About company
webAI
We are establishing the first distributed AI infrastructure dedicated to personalized AI. We are building a future where a company's data and IP remains private and it's possible to bring large models directly to consumer hardware without removing information from the model.
All jobs at webAI Visit website
Job Details
Category other
Posted a year ago