WEX Inc is hiring a Site Reliability Engineer for our SRE team. You will develop software and solutions focused on observability, incident response, reliability, performance, and operational excellence. This role supports internal stakeholders and our Payment Platform teams by tackling complex challenges and enhancing the experience for our engineering teams and customers.
What You'll Do
- Dig deep into code, networking, operating systems, and storage solutions to solve complex issues.
- Develop automation and utilize monitoring tools to ensure system reliability.
- Participate in incident response, troubleshooting, and 24x7 Site Reliability rotations.
- Identify and address performance bottlenecks through code optimization, configuration changes, or infrastructure upgrade recommendations.
- Collaborate with development teams to ensure software design meets operational requirements.
- Continuously improve processes and procedures to increase system reliability and efficiency.
- Stay up-to-date with the latest industry trends and technologies.
- Design, code, and debug applications while assisting with CI/CD pipelines, automating infrastructure tasks, and ensuring system scalability and security.
What We're Looking For
- Hands-on experience as a Site Reliability Engineer or in an equivalent role.
- Development experience or consistent knowledge of at least one major programming language: C#, Java, GoLang, or Python.
- Experience with Cloud Computing platforms (AWS, Azure, or GCP).
- Ability to thrive in a fast-paced development and operations environment.
- Strong communication and collaboration skills.
- Experience with Grafana and Splunk.
- Experience with at least one major RDBMS and NoSQL data store.
- Experience with containerization technologies such as Docker or Kubernetes.
- A BA/BS degree in Computer Science or a related technical field, or equivalent job experience.
Nice to Have
- Experience with infrastructure as code, preferably Terraform.
- Working knowledge in building and designing RESTful APIs.
- Familiarity with Agile methodologies and practices.
- Experience with GitOps.
- Experience with Apache Kafka and eventing technologies.
Technical Stack
- Languages: C#, Java, GoLang, Python
- Cloud: AWS, Azure, GCP
- Monitoring/Observability: Grafana, Splunk
- Data Stores: RDBMS, NoSQL
- Containers & Orchestration: Docker, Kubernetes
- Infrastructure & APIs: Terraform, RESTful APIs, Apache Kafka
Team & Environment
You will be part of the Site Reliability Engineering organization, supporting internal stakeholders and Payment Platform teams.




