Prague, Prague, Czech Republic Remote (City)

Barclays is hiring a Site Reliability Engineer

Responsibilities

  • Availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
  • Resolution, analysis and response to system outages and disruptions, and implement measures to prevent similar incidents from recurring.
  • Development of tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience.
  • Monitoring and optimisation of system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.
  • Collaboration with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle, and work closely with other teams to ensure smooth and efficient operations.
  • Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities to foster a culture of technical excellence and growth.

Requirements

  • Hands-on experience with Elastic Stack (Elasticsearch, Kibana, Logstash/Beats).
  • Strong understanding of observability & monitoring (metrics, logs, traces, APM).
  • Experience with defining and configuring dashboards, alerts, and SLI/SLOs.
  • Basic infrastructure-management exposure (capacity planning, performance insights, scaling, monitoring).

Nice to Have

  • Experience with DevOps tools: GitLab, TeamCity, CI/CD pipelines.
  • Scripting/programming in Python, Java or C#.
  • Basic Linux experience.
  • Exposure to additional monitoring tools (Grafana, Prometheus, Splunk, etc.).
Job Details
Department Information Technology
Category infrastructure
Posted 4 months ago