About the Role

This role involves designing and maintaining highly available systems, improving operational workflows, and implementing automation to support large-scale infrastructure.

Responsibilities

Design and manage scalable infrastructure across distributed environments
Monitor system performance and respond to incidents with urgency
Develop automation tools to reduce manual intervention
Collaborate with development teams to improve service reliability
Implement and maintain CI/CD pipelines
Troubleshoot complex production issues across multiple layers
Optimize system performance and resource utilization
Enforce security and compliance standards in infrastructure
Lead post-mortem analyses after critical incidents
Drive reliability improvements through proactive monitoring
Maintain documentation for systems and procedures
Support capacity planning and system forecasting
Ensure high availability and disaster recovery readiness
Integrate observability into services and platforms
Promote best practices in configuration management
Work closely with product teams during major releases
Evaluate and adopt new technologies for operational efficiency
Contribute to on-call rotation with rapid response protocols
Improve deployment safety through automated checks
Reduce technical debt in legacy systems
Implement scalable logging and alerting frameworks
Support cloud infrastructure management and optimization
Ensure infrastructure as code principles are followed
Drive incident response improvements through data analysis
Foster a culture of operational excellence

Compensation

Competitive salary and benefits package

Work Arrangement

Remote with flexible hours

Team

Collaborative engineering team focused on scalable systems

Why This Role Matters

The systems you build and maintain directly impact the reliability of core services used by thousands of users daily.
You will play a key role in shaping how engineering teams approach scalability, resilience, and operational rigor.
Your work ensures that failures are minimized and recovery is fast, reducing business impact during outages.

What You’ll Build

Automated recovery systems that reduce downtime without human intervention.
Monitoring dashboards that provide actionable insights across services.
Self-service tools that empower developers to deploy safely and efficiently.

Available for qualified candidates

Invisible Technologies is hiring a Senior Site Reliability Engineer

About the Role

Responsibilities

Compensation

Work Arrangement

Team

Why This Role Matters

What You’ll Build

Similar Jobs

Lead Platform Engineer (m/f/d)

Staff Software Engineer - Compute Infrastructure

Containerization Cloud Consulting

Sr Cloud Engineer | NodeJS + TS/JS | Europe remote

Senior Infrastructure Engineer

Senior DevOps Engineer

Related Articles

Network Configuration as Code: CI/CD for Automation | NVIDIA

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026