About the Role
Role details below.
Responsibilities
- Design, develop, and maintain software components, frameworks, and tooling that support platform reliability, integration, and resilience
- Contribute to validation strategies for real-world scenarios including multi-product interactions, upgrades, and operational or failure conditions
- Leverage AI-assisted tools and techniques to improve test development, failure analysis, and productivity across validation and reliability workflows
- Implement and integrate automation into CI pipelines such as OpenShift CI to improve early detection of regressions and platform risks
- Collaborate closely with engineering, QE, SRE, and product teams in a geographically distributed environment to identify gaps, troubleshoot issues, and improve overall system behavior
- Analyze failures uncovered through testing or automation, identify root causes, and contribute fixes or improvements in collaboration with owning teams
- Develop and maintain dashboards, reports, or data pipelines to improve visibility into system health, test results, and trends
- Write clear documentation for tooling, workflows, and operational processes to support broader adoption and maintainability
- Stay current with industry trends, tools, and best practices related to distributed systems, cloud platforms, and software reliability
- Participate in code reviews, design discussions, and agile ceremonies to continuously improve engineering quality and team effectiveness
Work Arrangement
Remote (Country)