Ottawa, Ontario, Canada Remote (City) Employment CAD 129,500 - 170,100 Yearly

Ericsson is hiring a Site Reliability Engineer

About the Role

Ericsson is looking for a Senior Site Reliability Engineer to champion the reliability, availability, performance, and scalability of our mission-critical services. In this senior role, you will partner with development and operations teams to guide system design and provide leadership in incident response.

What You'll Do

  • Serve as a technical leader ensuring production service reliability, scalability, and performance.
  • Collaborate with development teams to embed operability and automation into system architecture.
  • Lead high-severity incident response, driving resolution and coordinating stakeholder communications.
  • Champion root cause analysis and postmortems; ensure remediation is implemented and verified.
  • Design and maintain sophisticated monitoring, alerting, deployment, and infrastructure automation systems.
  • Oversee creation and regular review of operational runbooks/playbooks; lead resilience and chaos testing exercises.
  • Drive service lifecycle processes, including operational readiness, onboarding, and decommissioning.

What We're Looking For

  • B.Sc., M.Sc., degree in a relevant area, or equivalent experience.
  • 7–10+ years in systems engineering, DevOps, or SRE roles, with at least 3 years in a senior/lead capacity driving reliability initiatives.
  • Expert knowledge of SRE principles: SLIs, SLOs, error budgets, and reliability engineering methodologies.
  • Advanced Linux systems administration and troubleshooting skills, spanning cloud (AWS/Azure/GCP) and on-premises environments.
  • Extensive production experience with Kubernetes and container ecosystems (Docker, CRI).
  • Proficiency with Infrastructure as Code (Terraform, CloudFormation, Ansible) and automation scripting (Python, Go, Bash).
  • Strong background in designing/operating CI/CD pipelines, automated deployments, and rollout strategies (canary, blue-green).
  • Expertise with observability tools such as Prometheus, Grafana, ELK/EFK, Splunk, plus distributed tracing frameworks (Jaeger, Zipkin, OpenTelemetry).
  • Solid networking skills (TCP/IP, routing, load balancing) and security best practices (TLS, identity, secrets management).
  • Demonstrated thought leadership in designing and operating complex distributed systems.
  • Proven ability in capacity planning, performance tuning, profiling, and cost optimization at scale.
  • Understanding of telecom architectures (IMS, 4G/5G core concepts) and carrier-grade availability standards.
  • Command operational excellence during incidents, coordinating cross-team responses in high-pressure situations.
  • Lead structured problem-solving for deep root cause analysis with actionable follow-through.
  • Establish operational standards, best practices, and governance for reliability engineering across teams.
  • Exceptional communication to bridge technical and business contexts, influencing senior stakeholders.
  • Mentorship and coaching for junior and mid-level engineers; fostering a culture of reliability-first thinking.
  • Strategic decision-making under pressure, balancing innovation with risk management.
  • Initiative to identify systemic risks and champion enterprise-grade improvements.

Nice to Have

  • Experience with OSS/BSS, network management tooling, and telecom protocols.
  • Knowledge of regulatory/compliance constraints in telecom deployments.
  • Reliability-first, automation-first, and risk-aware approach; skilled at balancing speed and safety in delivery.
  • Advanced cloud or Kubernetes certifications (AWS Professional, Azure Expert, GCP Professional, CKA/CKAD) beneficial.
  • SRE leadership training, incident response, or chaos engineering certifications preferred.

Technical Stack

  • Operating Systems: Linux
  • Cloud: AWS, Azure, GCP
  • Containers & Orchestration: Kubernetes, Docker, CRI
  • Infrastructure as Code: Terraform, CloudFormation, Ansible
  • Scripting & Languages: Python, Go, Bash
  • Observability: Prometheus, Grafana, ELK/EFK, Splunk, Jaeger, Zipkin, OpenTelemetry

Work Mode

This is a local position based in Ottawa, Canada.

Ericsson is proud to be an Equal Opportunity employer.

Required Skills
LinuxAWSAzureGCPKubernetesDockerCRITerraformCloudFormationAnsibleSRESLIsSLOsDevOpsSystems Engineering
Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details
Compliance handled automatically
Withdraw to any bank account
Income reports for easy tax filing
Create free account
Free signup • 5 min setup
About company
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago