Barclays is looking for a Site Reliability Engineer to be part of the Identity Access Management team. This role is central to bringing a new digital platform capability to life and modernizing our digital estate. You will partner with business-aligned engineering and product teams to foster a collaborative team culture.
What You'll Do
- Ensure the availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.
- Resolve, analyze, and respond to system outages and disruptions, and implement measures to prevent recurrence.
- Develop tools and scripts to automate operational processes, reducing manual workload and improving system resilience.
- Monitor and optimize system performance and resource usage, identify bottlenecks, and implement performance tuning best practices.
- Collaborate with development teams to integrate reliability, scalability, and performance best practices into the software development lifecycle.
- Stay informed of industry technology trends and innovations, and contribute to the organization's technology communities.
- Contribute to or set strategy, drive requirements, and make recommendations for change.
- Plan resources, budgets, and policies; manage and maintain policies/processes; deliver continuous improvements.
- Advise key stakeholders, including functional leadership teams and senior management, on functional and cross-functional areas.
- Manage and mitigate risks through assessment in support of the control and governance agenda.
- Demonstrate leadership and accountability for managing risk and strengthening controls.
- Create solutions based on sophisticated analytical thought, comparing and selecting complex alternatives.
- Seek out, build, and maintain trusting relationships with internal and external stakeholders to accomplish key business objectives.
What We're Looking For
- Experience in designing, implementing, deploying, and running highly available, fault-tolerant, auto-scaling and auto-healing systems.
- Strong expertise in AWS (essential).
- Strong experience in running disaster recovery, zero downtime solutions.
- Strong experience in designing and implementing continuous delivery across large-scale, distributed, cloud-based microservice and API service solutions with 99.9%+ uptime.
- Hands-on experience coding in Python, Bash and JSON/Yaml (Configuration as Code).
- Ability to drive reliability best practices across engineering teams, embed SRE principles into the DevSecOps lifecycle, and partner with engineering, security and product teams.
Nice to Have
- Experience with Azure and GCP (Google Cloud Platform).
- Experience with Kubernetes (ECS is essential, Fargate and GCE is a plus) and server-less architectures.
- Experience in hands-on configuration, deployment and operation of ForgeRock COTS based IAM solutions (PingGateway, PingAM, PingIDM, PingDS) with embedded security gates, HTTP header signing, access token and data at rest encryption, PKI based self-sovereign identity, or open source.
Technical Stack
- AWS
- Azure
- GCP
- Kubernetes
- ECS
- Fargate
- GCE
- Python
- Bash
- JSON/Yaml
- ForgeRock
- PingGateway
- PingAM
- PingIDM
- PingDS
Team & Environment
This role may manage a team or be an individual contributor subject matter expert.
Work Mode
This position is based locally in Pune.
All colleagues at Barclays are expected to demonstrate the Barclays Values of Respect, Integrity, Service, Excellence and Stewardship and the Barclays Mindset – to Empower, Challenge and Drive.



