We are hiring a Senior Systems Reliability Engineer to ensure the stability, scalability, and performance of the mission-critical systems that support innovative entertainment experiences. You will work within Walt Disney Imagineering, bridging traditional IT and industrial control systems to provide reliable infrastructure for SCADA, HMI, and PLC programming. As a senior technical leader, you will architect resilient solutions, champion reliability best practices, and drive continuous improvement.
What You'll Do
- Administer Windows and Linux servers supporting automation and industrial applications.
- Collaborate with engineering and project teams to implement CI pipeline automation to streamline PLC testing.
- Develop tools or scripts to automate documentation generation.
- Define, measure, and monitor service-level indicators/objectives (SLIs/SLOs) and manage error budgets for critical services.
- Manage Kubernetes clusters and Helm charts deployments for automation and monitoring applications.
- Identify and automate manual operational processes ('toil') within project teams to improve reliability.
- Ensure high availability, scalability, and disaster recovery readiness for OT (Operational Technology) related systems.
What We're Looking For
- A minimum of 5+ years in production system reliability (web, cloud, OT, or embedded)—including at least 2 years with industrial or embedded control systems.
- Hands-on experience managing Kubernetes clusters and Helm-based deployments.
- Expertise in installing and configuring Linux and Windows Server operating systems.
- Software Development Continuous Integration (CI) expertise in GitLab CI or similar.
- Experience with Source Control Management systems (Git).
- Experience in AWS or other cloud platform.
- Advanced skills in at least one programming language such as Python, PHP, Ruby, Java, Go, Swift or C++ and ability to build unit test suites.
- Excellent verbal and written communication to all levels in the organization.
Nice to Have
- Experience supporting industrial automation platforms (e.g., Ignition, FactoryTalk, Copia).
- Experience with multiple public cloud platforms (AWS, Azure, GCP).
- Full stack web development experience.
- A demonstrated curiosity for continuous learning and self-improvement.
- Ability to influence architectural decisions and advocate for best reliability practices.
- Skills in Datadog monitoring and alerting and instrumentation with OpenTelemetry.
- Contributions to reliability-related open-source projects or technical communities.
Technical Stack
- Operating Systems: Windows Server, Linux
- Orchestration & CI/CD: Kubernetes, Helm, GitLab CI, Git
- Cloud: AWS
- Languages: Python, PHP, Ruby, Java, Go, Swift, C++
- Industrial Platforms: Ignition, FactoryTalk, Copia, Coverity
- Monitoring: Datadog, OpenTelemetry
Team & Environment
This role is embedded within Walt Disney Imagineering as part of the Enterprise Technology segment.
Benefits & Compensation
- Compensation range: $141,900.00 to $190,300.00 per year.
- Medical, financial, and/or other benefits.
- Bonus and/or long-term incentive units may be provided.
Work Mode
This is an onsite position located in Glendale, CA, USA.
We are an equal opportunity employer.



