The Site Reliability Engineer will play a critical role in maintaining and improving the reliability, scalability, and performance of our cloud-native infrastructure. You will work closely with engineering teams to ensure systems are resilient, observable, and cost-efficient. This role combines deep technical expertise with operational discipline, requiring proactive problem-solving and a strong understanding of distributed systems. You will lead cost optimization initiatives, enhance monitoring and incident response, and contribute to platform-level decisions that impact the entire organization.
Responsibilities
- Lead initiatives to enhance cost efficiency, such as minimizing network egress expenses by eliminating redundant data transfers.
- Ensure data storage aligns with access patterns by using appropriate storage classes, including cold storage for infrequently accessed data.
- Optimize autoscaling configurations for databases and compute resources to balance performance and cost.
- Improve cost attribution systems so engineering teams have transparent and accurate insights into their cloud spending.
- Respond to platform incidents as part of an on-call rotation and provide timely resolution support.
- Assist engineers with infrastructure-related challenges and troubleshooting efforts.
- Review and approve pull requests requiring platform-level oversight.
- Collaborate within a small, high-performing team of SREs focused on scalable and reliable systems.
Requirements
- Proven experience in site reliability engineering, DevOps, software engineering, or systems engineering.
- Strong troubleshooting abilities in complex distributed systems.
- Solid understanding of system design and strong analytical reasoning.
- Effective communication skills for cross-team collaboration.
- Familiarity with major cloud platforms, with a preference for Google Cloud.
- Proficiency in SQL for data analysis and querying.
- Hands-on experience with containers, Kubernetes, and configuration tools like Kustomize and Helm.
- Knowledge of service mesh technologies, particularly Istio.
- Understanding of networking concepts including DNS, TLS, certificates, and ingress routing.
- Experience with observability tools such as Datadog for logs, metrics, and APM.
- Working knowledge of security practices including IAM, RBAC, and network security.
- Familiarity with authentication and authorization mechanisms.
- Experience with CI/CD pipelines and automation.
- Knowledge of database systems and their operational requirements.
- Proficiency in scripting with Bash, Python, or similar languages.
Tech Stack
Google Cloud, Kubernetes, Kustomize, Helm, Istio, Datadog, SQL, Bash, Python, CI/CD, DNS, TLS, IAM, RBAC, APM, Containers
Benefits
- Well-funded startup with significant growth ambitions.
- Competitive compensation package.
- Pre-IPO equity participation.
- Unlimited paid time off.
- Travel stipend through Carrot Cash.
- On-demand access to co-working spaces via FlexDesk.
- Work-from-home financial support.
- Generous parental leave policy exceeding industry norms.
- Direct access to company leadership and open communication channels.
- High-impact roles within small, agile teams.
- Employer-covered 100% of Medical, Dental, and Vision insurance.
- Disability and Life insurance coverage.
- Health Reimbursement Account (HRA) availability.
- Access to Dependent Care Assistance (DCA/FSA) and 401k plans.
Compensation
competitive salary. Equity: pre-IPO equity packages. Unlimited PTO, travel stipend, work-from-home stipend, parental leave, HRA, DCA/FSA, 401k
Work Arrangement
global — America, Europe — Team is scattered across America and Europe, so you can sleep at night
Team
small and highly efficient team of SREs. Team is scattered across America and Europe
- Entrepreneurial culture where pushing limits and taking risks is ev
Additional Information
- This is a fully remote position with team members distributed across North and South America and Europe.
- Candidates must be self-motivated and capable of working independently in an asynchronous environment.
- Occasional travel to team meetups may be encouraged but is not required.


