The Wikimedia Foundation is hiring a Senior Site Reliability Engineer to support and evolve the platform powering Wikipedia and its sister projects. This globally distributed SRE team is responsible for the health and continuous improvement of the underlying infrastructure for one of the world's top-10 websites, directly advancing Wikimedia's mission of free knowledge.
What You'll Do
- Perform day-to-day operational and DevOps tasks on Wikimedia's public-facing infrastructure, including deployment, maintenance, configuration, and troubleshooting.
- Implement and utilize configuration management and deployment tools like Puppet and Kubernetes.
- Lead continuous improvement by automating the installation, configuration, and maintenance of services across the platform.
- Work closely with product teams to assist in architectural design, helping bring scalable functionality to users and ensuring services operate at scale.
- Participate in a 24/7 on-call rotation shared across the broader SRE team, handling incident response, diagnosis, and follow-up on system outages.
- Collaborate effectively with a global, cross-functional team in an asynchronous communication environment.
- Mentor peers in your areas of technical and operational strength.
- Travel 1-2 times a year for in-person events and team meetings.
What We're Looking For
- 6+ years of experience in an SRE, Operations, or DevOps role as part of a team.
- Experience with shell and scripting languages like Python, Go, Bash, or Ruby (we primarily use Python) and configuration management tools like Puppet or Ansible.
- Experience with distributed caching systems, including their underlying algorithms and performance optimization.
- A thorough, protocol-level understanding of TCP/IP, HTTP, TLS, and DNS.
- Experience with package management on Linux systems (we use Debian).
- Strong Linux system-level troubleshooting skills.
- A history of automating tasks and processes, identifying gaps, and finding automation opportunities.
- Strong English language skills and the ability to work independently as part of a globally distributed team across multiple time zones.
- Experience leading and participating in incident response and post-incident reviews, conducting root cause analysis, and implementing preventive measures.
Nice to Have
- Experience with Linux kernel tuning for high-traffic loads.
- Experience with high-performance HTTP(S) caching-proxy software like HAProxy, Varnish, Apache Traffic Server, Envoy, or Nginx.
- Experience with the use, maintenance, and configuration of monitoring, metrics, and logging infrastructure (Prometheus, Grafana, etc.).
- Experience developing or contributing to Free and Open Source software, or being part of an open-source community.
- Experience with LAMP stack technologies (PHP/HHVM, memcached/Redis); MediaWiki experience is a definite plus.
- Experience with defining cross-team SLOs and their implementation.
Technical Stack
- Configuration/Deployment: Puppet, Kubernetes, Ansible
- Languages/Scripting: Python, Go, Bash, Ruby, PHP, HHVM
- Systems: Linux, Debian
- Networking/Protocols: TCP/IP, HTTP, TLS, DNS
- Proxies/Caching: HAProxy, Varnish, Apache Traffic Server, Envoy, Nginx, memcached, Redis
- Monitoring/Observability: Prometheus, Grafana
- Application: MediaWiki
Team & Environment
You will join a globally distributed and diverse SRE team, working in an asynchronous, open-source environment where all documentation, code, and configuration are published openly.
Benefits & Compensation
- Compensation for US-based applicants is US$113,082 to US$175,725, adjusted to the country of hire for other locations.
Work Mode
This is a global role. Candidates are welcome from the listed US states (Arizona, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, Wyoming) and countries (Brazil, Canada, Colombia, France, Germany, Ghana, India, Indonesia, Italy, Kenya, Mexico, Morocco, Netherlands, Poland, Singapore, South Africa, Spain, Switzerland, United Kingdom).
As an equal opportunity employer, the Wikimedia Foundation values having a diverse workforce and continuously strives to maintain an inclusive and equitable workplace. We encourage people with a diverse range of backgrounds to apply.




