Underdog is hiring a founding Senior Site Reliability Engineer - Infrastructure to define and build reliability, scalability, and operational excellence from the ground up. This is a high-impact role with real ownership from day one, where you'll partner with platform, infrastructure, and product teams to ensure Underdog scales seamlessly through peak traffic and game-day spikes.
What You'll Do
- Own and maintain the incident response process, including defining procedures, tools, and best practices.
- Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems.
- Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs.
- Develop and implement disaster recovery plans, including regular testing and regulatory compliance.
- Collaborate with teams on architecture decisions to ensure high availability and scalability.
- Manage launch and event planning for high-traffic occasions, focusing on infrastructure preparation and capacity management.
- Act as an internal expert and consultant for monitoring tools like Datadog and Pagerduty and infrastructure like AWS and Kubernetes.
- Emphasize automation and tooling to scale our workload.
- Contribute across codebases in Ruby, Python, Go, TypeScript, Swift, and Kotlin as needed to support initiatives.
What We're Looking For
- A strong written and verbal communicator.
- Collaborative by nature.
- Someone who enjoys using research, data, and experiments to make decisions.
- You enjoy working directly with customers (generally engineers or other people inside the company).
- You think long-term about what is best for the business and its customers.
- You are excited to take ownership.
- You are very comfortable around an IDE, working with multiple languages, multiple web application frameworks, AWS services, Kubernetes, PostgreSQL.
- You can work independently to learn new languages/technologies as needed.
- You enjoy deploying changes to production quickly, multiple times a week if necessary.
Nice to Have
- Experience with PostgreSQL SQL query optimization, tweaking autovacuum settings, table statistics, different index types, etc.
- Experience with Redis / Valkey Optimization.
- Experience with Datadog or similar observability tools.
- Experience working as a web application developer, frontend or backend, especially in React and Ruby on Rails.
- Experience with AWS cost optimization.
- Read the Google SRE books or similar books, or have other forms of SRE training.
- Actively leveraging the capabilities of AI to augment abilities and gain knowledge about interested domains.
Technical Stack
- Languages: Ruby, Python, Go, TypeScript, Swift, Kotlin
- Infrastructure: AWS, Kubernetes
- Data & Observability: PostgreSQL, Datadog, Pagerduty, Redis, Valkey
- Frameworks: React, Ruby on Rails
Team & Environment
You will partner closely with platform, infrastructure, and product teams.
Benefits & Compensation
- Compensation: $160,000-$240,000 + pre-IPO equity.
- Unlimited PTO (extremely flexible with the exception of the first few weeks before & into the NFL season).
- 16 weeks of fully paid parental leave.
- Home office stipend.
- A connected virtual first culture with a highly engaged distributed workforce.
- 5% 401k match, FSA, company paid health, dental, vision plan options for employees and dependents.
Work Mode
This is a fully remote position.
Underdog is an equal opportunity employer and doesn't discriminate on the basis of creed, race, sexual orientation, gender, age, disability status, or any other defining characteristic.






