About the Role
The Staff Site Reliability & DevOps Engineer - Observability will be responsible for designing, implementing, and maintaining observability systems to ensure the reliability, performance, and scalability of the infrastructure. This role involves collaborating with various teams to identify and resolve issues, as well as developing and implementing best practices for observability.
Responsibilities
- Design and implement observability solutions to enhance system reliability and performance.
- Collaborate with cross-functional teams to identify and resolve issues in observability systems.
- Develop and maintain monitoring and alerting systems to ensure proactive issue detection.
- Implement best practices for observability, including logging, metrics, and tracing.
- Conduct regular performance reviews and optimize observability systems for efficiency.
- Ensure data integrity and security in observability systems.
- Provide technical leadership and mentorship to junior engineers.
- Participate in on-call rotations to ensure 24/7 support for observability systems.
- Document observability processes and procedures for reference and training.
- Stay updated with the latest trends and technologies in observability and DevOps.
- Work closely with development and operations teams to integrate observability into the CI/CD pipeline.
- Implement automated testing and validation for observability systems.
- Conduct root cause analysis for incidents and outages.
- Develop and maintain dashboards and reports for observability metrics.
- Ensure compliance with industry standards and best practices for observability.
- Collaborate with security teams to implement security measures in observability systems.
- Provide technical support and troubleshooting for observability issues.
- Participate in incident response and resolution processes.
- Conduct regular audits and assessments of observability systems.
- Develop and implement disaster recovery plans for observability systems.
- Ensure high availability and fault tolerance in observability systems.
- Collaborate with vendors and third-party service providers for observability tools and services.
Nice to Have
- Experience with large-scale observability systems.
- Knowledge of machine learning and AI for observability.
- Experience with open-source observability tools.
- Proficiency in multiple programming languages.
- Experience with multi-cloud environments.
- Knowledge of DevSecOps practices.
- Experience with container orchestration platforms.
- Proficiency in network and system administration.
- Experience with performance tuning and optimization.
- Knowledge of compliance and regulatory requirements for observability.
- Experience with data governance and management.
- Proficiency in scripting and automation frameworks.
- Experience with log management and analysis tools.
- Knowledge of cloud-native architectures and microservices.
- Experience with infrastructure as code (IaC) tools.
- Proficiency in monitoring and alerting frameworks.
- Experience with incident management and response tools.
- Knowledge of security information and event management (SIEM) systems.
- Experience with observability dashboards and reporting tools.
- Proficiency in data visualization and analytics tools.
Compensation
Competitive salary and benefits package.
Work Arrangement
Hybrid work arrangement with flexible hours.
Team
Collaborative and innovative team environment.
What You'll Bring
- A passion for observability and a strong desire to learn and grow in the field.
- Excellent problem-solving skills and a proactive approach to issue resolution.
- Strong communication and collaboration skills to work effectively with cross-functional teams.
- A commitment to continuous improvement and staying updated with the latest trends and technologies in observability.
- Experience with large-scale observability systems and a deep understanding of observability best practices.
- A strong background in site reliability engineering and DevOps, with a focus on observability.
- Proficiency in scripting and automation tools, with a focus on observability solutions.
- Experience with cloud platforms and containerization technologies, with a focus on observability.
- A strong understanding of security best practices for observability systems and a commitment to data integrity and security.
- Experience with incident management and response, with a focus on observability systems.
What You'll Do
- Design and implement observability solutions to enhance system reliability and performance.
- Collaborate with cross-functional teams to identify and resolve issues in observability systems.
- Develop and maintain monitoring and alerting systems to ensure proactive issue detection.
- Implement best practices for observability, including logging, metrics, and tracing.
- Conduct regular performance reviews and optimize observability systems for efficiency.
- Ensure data integrity and security in observability systems.
- Provide technical leadership and mentorship to junior engineers.
- Participate in on-call rotations to ensure 24/7 support for observability systems.
- Document observability processes and procedures for reference and training.
- Stay updated with the latest trends and technologies in observability and DevOps.
What You'll Get
- A competitive salary and benefits package, including health insurance, retirement plans, and paid time off.
- A hybrid work arrangement with flexible hours, allowing for a better work-life balance.
- A collaborative and innovative team environment, with opportunities for professional growth and development.
- Access to the latest tools and technologies in observability, with a focus on continuous improvement and innovation.
- A commitment to diversity, equity, and inclusion, with a supportive and inclusive work environment.
- A focus on work-life balance, with a supportive and flexible work culture.
- A commitment to employee well-being, with a focus on mental health and work-life integration.
- A supportive and inclusive work environment, with a focus on diversity, equity, and inclusion.
- A commitment to continuous learning and development, with opportunities for training and professional growth.
- A focus on employee engagement and satisfaction, with a supportive and collaborative work culture.
Visa sponsorship available for eligible candidates.