Responsibilities
- Partner closely with our engineers to debug production issues, improve performance, and design systems that scale reliably
- Own and evolve Socket’s infrastructure, with a focus on reliability, performance, and cost as we scale
- Help define and evolve SLIs and SLOs for new and existing systems, turning reliability into something that can be measured and improved
- Debug, maintain, and improve our deployment pipeline, including addressing failures in production and driving meaningful improvements over time
- Build and maintain observability across our systems (metrics, logs, traces) to support faster detection and resolution of issues
- Participate in an on-call rotation and drive incident reviews with an emphasis on concrete follow-ups and system improvements
Requirements
- 5+ years of software development experience, including 1+ year in a DevOps or SRE role
- Comfortable working on a distributed, cross-functional team where priorities shift and the problems change day to day
- Experience scaling and operating production web applications, preferably in a TypeScript / NodeJS environment
- Strong knowledge of relational databases, with Postgres preferred
- Hands-on experience building and using observability systems (Prometheus/Mimir, OpenTelemetry, Grafana)
- Experience with container orchestration (Docker, Kubernetes)
- Practical experience managing infrastructure-as-code with Terraform
- Experience running systems in a cloud environment, with GCP preferred
- Experience building and maintaining CI/CD pipelines (e.g. GitHub Actions)
Benefits
- Market competitive salary bands
- Meaningful equity program
- Comprehensive health benefits for you and your family (99% coverage)
- Flexible time-off, holidays, and winter shutdown to rest & recharge
- Paid parental leave
- Remote-first, with quarterly team off-sites
Work Arrangement
Remote (Worldwide)
Team
Structure: Early member of the team; will help form the defining DNA for the company's culture and future team
Additional Information
- Participate in an on-call rotation
- Quarterly team off-sites