Ottawa, Ontario, Canada Remote (City) CAD 129,500 - 170,100 Yearly

Ericsson is hiring a Site Reliability Engineer

Ericsson is looking for a Senior Site Reliability Engineer to champion the reliability, availability, performance, and scalability of our mission-critical services. In this senior role, you will partner with development and operations teams to guide system design and provide leadership in incident response.

What You'll Do

Serve as a technical leader ensuring production service reliability, scalability, and performance.
Collaborate with development teams to embed operability and automation into system architecture.
Lead high-severity incident response, driving resolution and coordinating stakeholder communications.
Champion root cause analysis and postmortems; ensure remediation is implemented and verified.
Design and maintain sophisticated monitoring, alerting, deployment, and infrastructure automation systems.
Oversee creation and regular review of operational runbooks/playbooks; lead resilience and chaos testing exercises.
Drive service lifecycle processes, including operational readiness, onboarding, and decommissioning.

What We're Looking For

B.Sc., M.Sc., degree in a relevant area, or equivalent experience.
7–10+ years in systems engineering, DevOps, or SRE roles, with at least 3 years in a senior/lead capacity driving reliability initiatives.
Expert knowledge of SRE principles: SLIs, SLOs, error budgets, and reliability engineering methodologies.
Advanced Linux systems administration and troubleshooting skills, spanning cloud (AWS/Azure/GCP) and on-premises environments.
Extensive production experience with Kubernetes and container ecosystems (Docker, CRI).
Proficiency with Infrastructure as Code (Terraform, CloudFormation, Ansible) and automation scripting (Python, Go, Bash).
Strong background in designing/operating CI/CD pipelines, automated deployments, and rollout strategies (canary, blue-green).
Expertise with observability tools such as Prometheus, Grafana, ELK/EFK, Splunk, plus distributed tracing frameworks (Jaeger, Zipkin, OpenTelemetry).
Solid networking skills (TCP/IP, routing, load balancing) and security best practices (TLS, identity, secrets management).
Demonstrated thought leadership in designing and operating complex distributed systems.
Proven ability in capacity planning, performance tuning, profiling, and cost optimization at scale.
Understanding of telecom architectures (IMS, 4G/5G core concepts) and carrier-grade availability standards.
Command operational excellence during incidents, coordinating cross-team responses in high-pressure situations.
Lead structured problem-solving for deep root cause analysis with actionable follow-through.
Establish operational standards, best practices, and governance for reliability engineering across teams.
Exceptional communication to bridge technical and business contexts, influencing senior stakeholders.
Mentorship and coaching for junior and mid-level engineers; fostering a culture of reliability-first thinking.
Strategic decision-making under pressure, balancing innovation with risk management.
Initiative to identify systemic risks and champion enterprise-grade improvements.

Nice to Have

Experience with OSS/BSS, network management tooling, and telecom protocols.
Knowledge of regulatory/compliance constraints in telecom deployments.
Reliability-first, automation-first, and risk-aware approach; skilled at balancing speed and safety in delivery.
Advanced cloud or Kubernetes certifications (AWS Professional, Azure Expert, GCP Professional, CKA/CKAD) beneficial.
SRE leadership training, incident response, or chaos engineering certifications preferred.

Technical Stack

Operating Systems: Linux
Cloud: AWS, Azure, GCP
Containers & Orchestration: Kubernetes, Docker, CRI
Infrastructure as Code: Terraform, CloudFormation, Ansible
Scripting & Languages: Python, Go, Bash
Observability: Prometheus, Grafana, ELK/EFK, Splunk, Jaeger, Zipkin, OpenTelemetry

Work Mode

This is a local position based in Ottawa, Canada.

Ericsson is proud to be an Equal Opportunity employer.

Required Skills

LinuxAWSAzureGCPKubernetesDockerCRITerraformCloudFormationAnsibleSRESLIsSLOsDevOpsSystems Engineering LinuxAWSAzureGCPKubernetesDockerCRITerraformCloudFormationAnsibleSRESLIsSLOsDevOpsSystems Engineering

Ready to relocate and code from paradise?

Thailand or Vietnam — your office, your rules

Iglu offers relocation to Bangkok, Chiang Mai, Ho Chi Minh City, or Hong Kong. Full employment, legal setup, and a community of 200+ digital professionals.

Relocation to 5 countries

Full legal work setup

Developer community access

Work-life balance culture

Explore locations

Relocation support included

About company

Ericsson builds advanced telecommunications solutions and networks, enabling connectivity and innovation across industries. The company focuses on developing next-generation technologies including 5G, cloud infrastructure, and AI-driven network services.

All jobs at Ericsson Visit website

Job Details

Department Information Technology

Category infrastructure

Posted 2 months ago