Responsibilities
- Find solutions to Sphere's toughest scaling, performance, and latency problems
- Work closely with engineering team to define tooling to help us ship even faster
- Participate in an On Call rotation to solve critical production events
- Work directly with customers like Eleven Labs, Replit, Windsurf, and partners like Stripe, Chargebee, on their latency and availability requirements
- Influence and implement the next generation of Sphere's database, real-time queue, and container orchestration infrastructure
- Work across our engineering organization to introduce and scale best practices with cloud-native technologies like Amazon ALB, ECS/EKS, Temporal, AWS SQS, Amazon Aurora PostgreSQL, Elasticache Redis, and S3
- Build abstractions within Terraform to simplify the architecture and increase velocity and ownership
Requirements
- Experience managing k8s clusters in AWS/GCP/Azure at scale
- Extensive experience shipping high-quality architectures for mission critical systems (focus on high availability, high load, low latency)
- Experience with Postgres at scale
Nice to Have
- Experience working with large volumes of transaction data. You’ll be getting very familiar with it!
- Strong experience in Python. Our core application backend and data pipeline services are built with Python and Django
- Passionate about developer experience
- Very strong attention to detail. When you work with numbers this is a non-negotiable - it’s not enough to be 99% right.