This role focuses on enhancing system reliability, performance, and operational efficiency by supporting internal teams and the Payment Platform. The engineer will engage in incident management, automation, and close collaboration with development teams to maintain scalable and secure systems.
Responsibilities
- Analyze code, networking, operating systems, or storage layers to resolve complex technical challenges
- Build automated solutions and use monitoring tools to maintain system stability and uptime
- Respond to system incidents and assist in root cause analysis and resolution
- Engage in on-call rotations and escalation procedures as part of a 24x7 reliability model
- Detect and resolve performance issues through code improvements, configuration adjustments, or infrastructure upgrades
- Work closely with development teams to align software design with operational needs
- Refine operational processes to enhance system efficiency and reliability
- Maintain awareness of emerging technologies and industry advancements
- Develop and debug applications while supporting CI/CD pipelines, infrastructure automation, and scalability requirements
Requirements
- Proven experience in a Site Reliability Engineering or similar operational development role
- Programming background with proficiency in at least one of: C#, Java, GoLang, or Python
- Hands-on experience with cloud platforms such as AWS, Azure, or GCP
- Ability to perform effectively in a high-velocity environment combining development and operations
- Strong interpersonal and teamwork skills for cross-functional collaboration
- Familiarity with monitoring and observability tools including Grafana and Splunk
- Operational knowledge of both relational and NoSQL databases
- Experience using containerization tools like Docker and orchestration systems such as Kubernetes
- Bachelor’s degree in Computer Science or a related technical field, or equivalent professional experience
Nice to Have
- Experience implementing infrastructure as code, particularly with Terraform
- Understanding of RESTful API design and development
- Knowledge of Agile development principles and practices
- Experience working with GitOps workflows
- Exposure to Apache Kafka and event-driven architectures
Tech Stack
C#, Java, GoLang, Python, AWS, Azure, GCP, Grafana, Splunk, RDBMS, NoSQL, Docker, Kubernetes, Terraform, RESTful APIs, Apache Kafka
Team
Part of the Site Reliability Engineering team, providing operational support to internal stakeholders and Payment Platform engineering groups.
Additional Information
- Participation in 24x7 on-call rotations and escalation processes is required.


