London or United Kingdom Employment

Xceptor is hiring a Site Reliability Engineer

About the Role

Xceptor is hiring a Site Reliability Engineer to join a cross-cutting function that partners with tribes across the company to make services reliable, performant, secure, and operable in production. This is an AI-first role where you will use AI routinely to accelerate investigation, diagnostics, runbook creation, and automation, while embedding reliability into the delivery process from the start.

What You'll Do

  • Contribute at the tribe level to service reliability, performance, and operability.
  • Help build and run the reliability system: observability standards, incident response practices, runbooks, and automation.
  • Partner closely with Software Engineering, QA, Platform Engineering, and Senior/Lead SREs.
  • Own well-scoped operational improvements end-to-end, from design and implementation through testing, rollout, and measurement.
  • Contribute to defining and improving SLIs/SLOs and service health signals, aligned to customer outcomes.
  • Implement reliability improvements within established patterns like timeouts, retries, graceful degradation, and safe failure modes.
  • Support capacity and performance work, including basic baselining, load investigation, and scaling hygiene.
  • Help maintain operational quality across production and staging environments and improve environment consistency.
  • Participate in incident response and on-call rotations, contributing to triage, mitigation, and recovery.
  • Produce clear post-incident notes and support root cause analysis, focusing on actions that prevent recurrence.
  • Create and improve runbooks and playbooks so incidents are faster and more consistent to resolve.
  • Help improve change safety through practical release/readiness checks and operational guardrails.
  • Implement and improve observability for services: logs, metrics, traces, dashboards, and alerting aligned to standards.
  • Tune alerts to reduce noise and improve actionability; help manage flakiness and false positives.
  • Build and maintain service health dashboards that support quick diagnosis and release confidence.
  • Work with QA and Engineering to align operational signals with end-to-end journey health.
  • Automate repetitive operational tasks and reduce toil through scripts, tooling, and pipeline improvements.
  • Contribute to deployment automation and reliability guardrails in CI/CD, working with Platform Engineering.

Team & Environment

You will be part of a cross-cutting function that partners with tribes across Xceptor, embedding reliability practices directly into their workflows and systems.

Xceptor fosters a company culture built on Client Centricity, One Team, and Impactful work.

Required Skills
AWSAzureKubernetesDockerTerraformCI/CDGitLabGitHub ActionsPrometheusGrafanaPythonBashLinuxNetworkingSecurity
Freelancing without stability?

Get steady projects, keep your freedom

Iglu connects you with international clients and handles contracts, payments, and admin. You get consistent work and flexibility — no more chasing invoices or worrying about gaps.

Consistent client projects
Contract & payment management
Flexible work schedule
Revenue-sharing compensation
See open positions
Work from anywhere
About company
Xceptor

Xceptor is a company that designs around data manipulation, sourcing data from wherever it flows, then curating, normalising, validating, repairing, and enriching that data so it reaches its destination in a reliable and consistent format. It is an expert in the Financial Services vertical, enabling business users to solve their data challenges by themselves.

Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago