Atlanta or San Francisco

Saviynt is hiring a Site Reliability Engineer

Responsibilities

  • Design, build, and maintain the shared infrastructure services and platforms that our product and application teams will depend on
  • Focus on creating reusable, reliable, and scalable solutions that abstract away complexity, enabling other teams to focus on their core business logic and deliver features faster in a multi-cloud environment
  • Design and build core platform components and shared infrastructure services that other development teams will integrate with and leverage to deploy and operate their applications
  • Architect, implement, and manage highly available and scalable Kubernetes platforms as a service for internal consumers
  • Develop robust, internal-facing tools and automation for infrastructure provisioning and management primarily using Go (Golang)
  • Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.), focusing on creating reusable patterns and modules for other teams
  • Design and implement shared Event-Driven Architecture components and messaging platforms using technologies like Kafka or Google Pub/Sub that product teams can easily utilize
  • Develop and maintain robust CI/CD pipelines (e.g., GitLab CI and ArgoCD) as a service, providing standardized and automated deployment workflows for various development teams
  • Design and build resilient Distributed Systems components that serve as building blocks for other applications, focusing on reliability, fault tolerance, and performance
  • Manage and optimize our shared infrastructure across Multi-Region Cloud Environments, ensuring that platform services are globally available and performant for all consumers
  • Establish and enhance centralized Observability and Monitoring platforms and tools that provide self-service insights for consuming teams
  • Define and implement clear, well-documented RESTful API designs for the infrastructure services you build, ensuring ease of integration for internal clients
  • Implement and manage Service Mesh (e.g., Envoy, Istio) capabilities, providing traffic management, security, and policy enforcement as a shared platform for services
  • Design, implement, and optimize highly available Relational Database services or shared data platforms for broad organizational use
  • Collaborate closely with product development teams to understand their infrastructure needs and pain points, providing technical guidance and support
  • Participate in on-call rotations to support the critical shared infrastructure you build

Requirements

  • 6+ years of experience in an Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a strong focus on building tools and services for other engineers
  • Deep expertise with Kubernetes in production environments, particularly in providing it as a platform(i.e single tenant and multi-tenant deployment architectures)
  • Strong programming skills in Go (Golang) and Python, with experience building robust, maintainable backend services and automation
  • Extensive hands-on experience with at least one major Cloud Provider (AWS, GCP, or Azure); multi-cloud experience is a strong plus, especially in building abstractions over them
  • Proven experience designing and implementing Event-Driven Architecture and message queuing systems (e.g., Kafka, RMQ, NATS) as shared services
  • Solid understanding and practical experience with CI/CD pipeline tools (especially GitLab CI) and experience establishing automated delivery processes for other teams
  • Demonstrable experience designing and operating Distributed Systems, with an understanding of patterns for creating reliable, shared components
  • Familiarity with Multi-Region Cloud Environments and strategies for building globally distributed and highly available platform
  • Proficiency in establishing and utilizing comprehensive Observability and Monitoring platforms (e.g., Prometheus, Grafana, ELK stack, Datadog) for shared infrastructure
  • Strong experience with RESTful API design principles and building well-documented, consumable APIs
  • Knowledge of Service Mesh concepts and practical experience with solutions like Istio in a platform context
  • Hands-on experience with Relational Databases (e.g., MySQL, PostgresSQL), ideally in managing them as a service
  • Excellent communication skills and the ability to clearly articulate complex technical concepts to both technical and non-technical audiences
  • A strong customer-centric mindset, treating internal development teams as your primary customers
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience or equivalent military experience required

Benefits

  • Work on a large-scale, cloud-native SaaS platform
  • Solve complex reliability challenges at scale
  • Influence platform architecture and engineering practices
  • Competitive compensation, benefits, and career growth

Additional Information

  • This role requires adherence to Saviynt’s information security and privacy policies, including annual security training.
About company
Saviynt
Saviynt is a technology company specializing in Identity and Access Management (IAM) and Identity Governance and Administration (IGA) solutions.
All jobs at Saviynt Visit website
Job Details
Department Engineering
Category infrastructure
Posted 4 months ago