OCBC Bank is hiring a Site Reliability Engineer to ensure the stability, scalability, and performance of our core network infrastructure. This role bridges traditional network engineering with modern SRE practices, driving the adoption of automation, AI-assisted observability, and reliability principles.
What You'll Do
- Lead response for complex network disruptions and conduct blameless post-mortems and Root Cause Analysis (RCA) to prevent systemic recurrence.
- Develop and maintain software tools (Python, Go) that automate lifecycle management, including automated provisioning, testing, and compliance checks.
- Architect and manage AI-first monitoring systems (Grafana, ELK) to capture deep telemetry for predictive failure detection.
- Define and measure network-specific SLIs and SLOs (e.g., latency, jitter, packet loss) and manage 'error budgets' to balance network changes with stability.
- Adopt and maintain declarative network configurations (e.g., Terraform, Ansible) to ensure consistency and speed across multi-cloud and data center environments.
What We're Looking For
- A degree in Computer Science, Information Technology, or an Engineering-related field.
- At least 5 years of relevant experience.
- Deep expertise in networking protocols: BGP, OSPF, MPLS, VXLAN, and IPv6.
- High proficiency in Python and Java for developing network management platforms and automation scripts.
- Hands-on experience with observability tools: Grafana, Elasticsearch.
- Experience building automated CI/CD pipelines (Jenkins, Bitbucket, Jira) for validating network changes before production deployment.
Technical Stack
- Languages: Python, Go, Java
- Observability: Grafana, ELK
- Infrastructure as Code: Terraform, Ansible
- CI/CD: Jenkins, Bitbucket, Jira
Benefits & Compensation
- Competitive base salary
- A suite of holistic, flexible benefits to suit every lifestyle
- Community initiatives
- Industry-leading learning and professional development opportunities
OCBC Bank is an equal opportunity employer.




