On-site Full-time

The Walt Disney Company is hiring a Senior Systems Reliability Engineer

About the Role

We are hiring a Senior Systems Reliability Engineer to ensure the stability, scalability, and performance of the mission-critical systems that support innovative entertainment experiences. You will work within Walt Disney Imagineering, bridging traditional IT and industrial control systems to provide reliable infrastructure for SCADA, HMI, and PLC programming. As a senior technical leader, you will architect resilient solutions, champion reliability best practices, and drive continuous improvement.

What You'll Do

  • Administer Windows and Linux servers supporting automation and industrial applications.
  • Collaborate with engineering and project teams to implement CI pipeline automation to streamline PLC testing.
  • Develop tools or scripts to automate documentation generation.
  • Define, measure, and monitor service-level indicators/objectives (SLIs/SLOs) and manage error budgets for critical services.
  • Manage Kubernetes clusters and Helm charts deployments for automation and monitoring applications.
  • Identify and automate manual operational processes ('toil') within project teams to improve reliability.
  • Ensure high availability, scalability, and disaster recovery readiness for OT (Operational Technology) related systems.

What We're Looking For

  • A minimum of 5+ years in production system reliability (web, cloud, OT, or embedded)—including at least 2 years with industrial or embedded control systems.
  • Hands-on experience managing Kubernetes clusters and Helm-based deployments.
  • Expertise in installing and configuring Linux and Windows Server operating systems.
  • Software Development Continuous Integration (CI) expertise in GitLab CI or similar.
  • Experience with Source Control Management systems (Git).
  • Experience in AWS or other cloud platform.
  • Advanced skills in at least one programming language such as Python, PHP, Ruby, Java, Go, Swift or C++ and ability to build unit test suites.
  • Excellent verbal and written communication to all levels in the organization.

Nice to Have

  • Experience supporting industrial automation platforms (e.g., Ignition, FactoryTalk, Copia).
  • Experience with multiple public cloud platforms (AWS, Azure, GCP).
  • Full stack web development experience.
  • A demonstrated curiosity for continuous learning and self-improvement.
  • Ability to influence architectural decisions and advocate for best reliability practices.
  • Skills in Datadog monitoring and alerting and instrumentation with OpenTelemetry.
  • Contributions to reliability-related open-source projects or technical communities.

Technical Stack

  • Operating Systems: Windows Server, Linux
  • Orchestration & CI/CD: Kubernetes, Helm, GitLab CI, Git
  • Cloud: AWS
  • Languages: Python, PHP, Ruby, Java, Go, Swift, C++
  • Industrial Platforms: Ignition, FactoryTalk, Copia, Coverity
  • Monitoring: Datadog, OpenTelemetry

Team & Environment

This role is embedded within Walt Disney Imagineering as part of the Enterprise Technology segment.

Benefits & Compensation

  • Compensation range: $141,900.00 to $190,300.00 per year.
  • Medical, financial, and/or other benefits.
  • Bonus and/or long-term incentive units may be provided.

Work Mode

This is an onsite position located in Glendale, CA, USA.

We are an equal opportunity employer.

Required Skills
Windows ServerLinuxKubernetesHelmGitLab CIGitAWSPythonPHPRubySystems ReliabilityCI/CDCloud Infrastructure
Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details
Compliance handled automatically
Withdraw to any bank account
Income reports for easy tax filing
Create free account
Free signup • 5 min setup
About company
The Walt Disney Company

The Walt Disney Company creates world-class experiences and entertainment.

Visit website
Job Details
Category infrastructure
Posted 3 months ago