Responsibilities
- Design, develop, and maintain reliable, scalable systems
- Implement monitoring and alerting solutions
- Troubleshoot and resolve complex system issues
- Collaborate with development teams to improve system reliability
- Participate in on-call rotations to ensure 24/7 system availability
- Conduct root cause analysis for system outages
- Implement and maintain CI/CD pipelines
- Develop and maintain infrastructure as code
- Participate in system capacity planning and performance tuning
- Ensure compliance with security and regulatory requirements
- Document system architecture and processes
- Provide mentorship to junior team members
- Stay up-to-date with emerging technologies and industry trends
- Contribute to the development of best practices and standards
- Participate in code reviews and pair programming
- Collaborate with product management to define system requirements
- Implement and maintain disaster recovery solutions
- Participate in incident management and post-mortem analysis
- Develop and maintain automated testing frameworks
- Implement and maintain logging and monitoring solutions
- Participate in system architecture and design reviews
- Collaborate with operations teams to ensure system stability
Nice to Have
- Master's degree in Computer Science or related field
- Certification in cloud platforms (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect)
- Experience with site reliability engineering in a global organization
- Experience with large-scale distributed systems in a cloud environment
- Proficiency in multiple programming languages
- Experience with containerization and orchestration tools in a production environment
- Strong knowledge of monitoring and logging tools in a cloud environment
- Experience with CI/CD pipelines in a cloud environment
- Experience with incident management and post-mortem analysis in a global organization
- Strong communication and collaboration skills in a global team
- Experience with security and compliance best practices in a cloud environment
- Knowledge of networking and system administration in a cloud environment
- Experience with automated testing frameworks in a cloud environment
- Strong understanding of system architecture and design in a cloud environment
- Experience with disaster recovery and business continuity planning in a cloud environment
- Knowledge of Agile development methodologies in a cloud environment
- Experience with performance tuning and capacity planning in a cloud environment
- Knowledge of cloud-native architectures and microservices in a cloud environment
- Experience with infrastructure automation and management in a cloud environment
Compensation
Competitive salary and benefits package
Work Arrangement
Hybrid
Team
Collaborate with cross-functional teams to ensure system reliability and performance.
About the Team
- Join a dynamic team of site reliability engineers focused on building and maintaining reliable, scalable systems.
- Collaborate with cross-functional teams to ensure system reliability and performance.
- Work in a fast-paced environment with a strong emphasis on innovation and continuous improvement.
What We Offer
- Competitive salary and benefits package
- Opportunities for professional development and growth
- A dynamic and collaborative work environment
- The chance to work on cutting-edge technologies and projects
- A strong emphasis on work-life balance
- The opportunity to make a significant impact on the company's success
- A supportive and inclusive team culture
- The chance to work with a diverse and talented group of professionals
- A focus on continuous learning and development
- The opportunity to work on large-scale, distributed systems
Our Values
- Innovation: We encourage creativity and continuous improvement.
- Collaboration: We work together to achieve our goals.
- Integrity: We act with honesty and transparency.
- Customer Focus: We prioritize the needs of our customers.
- Excellence: We strive for the highest standards in everything we do.
- Respect: We value diversity and inclusion.
- Accountability: We take responsibility for our actions and decisions.
- Teamwork: We support and collaborate with our colleagues.
- Continuous Learning: We seek opportunities for growth and development.
- Adaptability: We embrace change and adapt to new challenges.
How to Apply
- Submit your resume and cover letter through our careers portal.
- Include a brief description of your relevant experience and skills.
- Highlight any certifications or training related to site reliability engineering.
- Provide examples of your problem-solving and troubleshooting abilities.
- Describe your experience with cloud platforms and infrastructure as code tools.
- Include any relevant projects or contributions to open-source communities.
- Explain your experience with incident management and post-mortem analysis.
- Describe your communication and collaboration skills in a team environment.
- Highlight your experience with automated testing frameworks and CI/CD pipelines.
- Provide any additional information that demonstrates your qualifications for the role.
Not provided