Atlanta or San Francisco Employment

Saviynt is hiring a Site Reliability Engineer

About the Role

Saviynt is looking for a Staff Platform Engineer to ensure our complex, distributed, cloud-native SaaS platform remains highly available, scalable, and secure. In this hands-on engineering and technical leadership role, you will own reliability for major platform domains, design scalable solutions on Kubernetes and AWS, and drive automation and reliability improvements across multiple teams.

What You'll Do

  • Design, build, and maintain shared infrastructure services and platforms for product and application teams.
  • Create reusable, reliable, and scalable solutions that abstract away complexity in a multi-cloud environment.
  • Design and build core platform components and shared infrastructure services for other development teams.
  • Architect, implement, and manage highly available and scalable Kubernetes platforms as a service for internal consumers.
  • Develop robust, internal-facing tools and automation for infrastructure provisioning and management primarily using Go (Golang).
  • Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.), creating reusable patterns and modules.
  • Design and implement shared Event-Driven Architecture components and messaging platforms using technologies like Kafka or Google Pub/Sub.
  • Develop and maintain robust CI/CD pipelines (e.g., GitLab CI and ArgoCD) as a service.
  • Design and build resilient Distributed Systems components focusing on reliability, fault tolerance, and performance.
  • Manage and optimize shared infrastructure across Multi-Region Cloud Environments.
  • Establish and enhance centralized Observability and Monitoring platforms and tools.
  • Define and implement clear, well-documented RESTful API designs for infrastructure services.
  • Implement and manage Service Mesh (e.g., Envoy, Istio) capabilities as a shared platform.
  • Design, implement, and optimize highly available Relational Database services or shared data platforms.
  • Collaborate closely with product development teams to understand their infrastructure needs and provide technical guidance.
  • Participate in on-call rotations to support critical shared infrastructure.

What We're Looking For

  • 6+ years of experience in Infrastructure Development, Platform Engineering, or Site Reliability Engineering, with a focus on building tools/services for other engineers.
  • Deep expertise with Kubernetes in production environments, particularly in providing it as a platform (single tenant and multi-tenant deployment architectures).
  • Strong programming skills in Go (Golang) and Python, with experience building robust backend services and automation.
  • Extensive hands-on experience with at least one major Cloud Provider (AWS, GCP, or Azure).
  • Proven experience designing and implementing Event-Driven Architecture and message queuing systems (e.g., Kafka, RMQ, NATS) as shared services.
  • Solid understanding and practical experience with CI/CD pipeline tools (especially GitLab CI) and establishing automated delivery processes.
  • Demonstrable experience designing and operating Distributed Systems, with an understanding of patterns for creating reliable, shared components.
  • Familiarity with Multi-Region Cloud Environments and strategies for building globally distributed and highly available platforms.
  • Proficiency in establishing and utilizing comprehensive Observability and Monitoring platforms (e.g., Prometheus, Grafana, ELK stack, Datadog).
  • Strong experience with RESTful API design principles and building well-documented, consumable APIs.
  • Knowledge of Service Mesh concepts and practical experience with solutions like Istio in a platform context.
  • Hands-on experience with Relational Databases (e.g., MySQL, PostgreSQL), ideally in managing them as a service.
  • Excellent communication skills and ability to articulate complex technical concepts to technical and non-technical audiences.
  • A strong customer-centric mindset, treating internal development teams as primary customers.
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical or military experience.

Nice to Have

  • Multi-cloud experience, especially in building abstractions over them.

Technical Stack

  • Container Orchestration: Kubernetes
  • Languages: Go (Golang), Python
  • Cloud Providers: AWS, Azure, GCP
  • Messaging: Kafka, Google Pub/Sub, RMQ, NATS
  • CI/CD: GitLab CI, ArgoCD
  • Service Mesh: Envoy, Istio
  • Observability: Prometheus, Grafana, ELK stack, Datadog
  • Databases: MySQL, PostgreSQL

Benefits & Compensation

  • Work on a large-scale, cloud-native SaaS platform.
  • Solve complex reliability challenges at scale.
  • Influence platform architecture and engineering practices.
  • Competitive compensation, benefits, and career growth.

Saviynt is an equal opportunity employer.

Required Skills
KubernetesGo (Golang)PythonAWSAzureGCPKafkaGoogle Pub/SubRMQNATSInfrastructure DevelopmentPlatform EngineeringEvent-Driven ArchitectureBackend ServicesAutomation
Need to work legally in Thailand?

Work permits without the paperwork nightmare

Thai immigration rules are strict and easy to get wrong. SVBL handles the bureaucracy — correct visa type, proper documentation, timely submissions. You focus on your work.

Right visa type for your situation
Document preparation & submission
Deadline tracking & renewals
Direct liaison with immigration
Talk to an expert
10+ years experience
About company
Saviynt

Saviynt is a technology company specializing in Identity and Access Management (IAM) and Identity Governance and Administration (IGA) solutions.

Visit website
Job Details
Department Engineering
Category infrastructure
Posted 14 days ago