About the Role
The role involves improving system uptime, managing production environments, and bridging development and operations through automation and proactive monitoring.
Responsibilities
- Monitor system performance and respond to incidents with urgency and precision
- Design and maintain automated deployment pipelines for consistent delivery
- Collaborate with development teams to improve code deployability and resilience
- Implement and manage monitoring and alerting systems across services
- Troubleshoot complex production issues across distributed systems
- Optimize infrastructure for reliability, scalability, and cost-efficiency
- Develop tools to reduce manual operational work for engineering teams
- Support incident response and lead post-mortem analyses
- Enforce security best practices within infrastructure and deployment workflows
- Manage configuration and orchestration of cloud-based services
- Ensure compliance with service level objectives and error budgets
- Participate in on-call rotations with clear escalation paths
- Improve system observability through logging, metrics, and tracing
- Contribute to capacity planning and performance testing
- Maintain documentation for systems and operational procedures
- Evaluate new technologies for improving platform stability
- Work cross-functionally to align SRE practices with product goals
- Drive adoption of infrastructure-as-code principles
- Support disaster recovery planning and testing
- Assist in database reliability and performance tuning
- Help onboard services onto standardized platform tooling
- Promote blameless culture during incident reviews
- Scale systems to handle growing user demand
- Reduce technical debt in operational systems
- Ensure changes are rolled out safely using canary and staged releases
Nice to Have
- Experience supporting real-time audio or communication platforms
- Background in game development or interactive media infrastructure
- Advanced knowledge of Kubernetes in production environments
- Experience with large-scale data pipelines
- Contributions to open-source infrastructure projects
- Certifications in cloud or DevOps platforms
- Prior work in fast-growing startup environments
Compensation
Competitive salary and equity package
Work Arrangement
Hybrid work model with flexibility for remote and office-based collaboration
Team
Collaborative engineering team focused on scalable infrastructure and real-time systems
Why This Role Matters
The platform handles sensitive real-time audio interactions, making reliability and uptime critical for user trust and safety. This role directly impacts the stability and scalability of systems that millions depend on daily.
Growth Opportunities
Engineers are encouraged to take ownership of projects, propose infrastructure improvements, and grow into leadership roles. Mentorship and skill development are prioritized.
Available for qualified candidates