Remote (global) Remote (Global)

Keyrock is hiring a Site Reliability Engineer

As a Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and security of our global cloud infrastructure. You will work at the intersection of development and operations, applying engineering principles to build automated, self-healing systems that support high-performance trading platforms. Your responsibilities will span infrastructure design, incident response, performance optimization, and proactive monitoring, all while fostering a culture of reliability and continuous improvement across engineering teams.

Responsibilities

  • Architect, deploy, and manage highly available and scalable AWS infrastructure using Infrastructure-as-Code tools
  • Operate and secure Kubernetes clusters, including EKS and self-managed setups, to support containerized services
  • Build and maintain CI/CD and GitOps pipelines to streamline application deployment and testing
  • Develop observability solutions using Prometheus, Grafana, Datadog, or equivalent tools to enhance system reliability
  • Enforce cloud security standards, including IAM policies and compliance with SOC2 and ISO 27001 frameworks
  • Diagnose and resolve infrastructure issues through root cause analysis and implement performance optimizations
  • Automate provisioning and configuration using Terraform, Ansible, or similar tools
  • Collaborate with engineering, architecture, and security teams to advance DevOps practices
  • Design disaster recovery, failover, and backup strategies to ensure continuous operations

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related technical discipline
  • Minimum of 5 years in cloud infrastructure, Site Reliability Engineering, or DevOps roles
  • Deep experience with AWS services including EC2, S3, Lambda, RDS, VPC, and IAM
  • Hands-on experience managing Kubernetes environments such as EKS, K3s, or self-hosted clusters
  • Proficiency in scripting and automation with Python, Bash, or equivalent languages
  • Proven experience with Infrastructure-as-Code tools like Terraform, CloudFormation, or Ansible
  • Familiarity with monitoring, logging, and observability platforms such as Prometheus, Grafana, or Datadog
  • Solid understanding of networking concepts including VPCs, DNS, load balancing, and firewalls
  • Experience working with CI/CD, DevOps, and GitOps methodologies
  • Background in operating low-latency, high-performance systems
  • Knowledge of serverless and event-driven architectures
  • Ability to work and communicate effectively in asynchronous environments
  • Demonstrated commitment to improving system availability and performance through data-driven insights
  • Strong problem-solving skills, ownership mindset, and collaborative approach
  • Exposure to cloud cost optimization and FinOps principles

Nice to Have

  • Interest in or experience with trading systems or financial markets
  • Hold or have pursued AWS Certified SysOps Administrator - Associate certification
  • Familiarity with Rust compilation workflows and tooling
  • Prior experience in cryptocurrency, traditional finance, or trading environments

Tech Stack

AWS, EC2, S3, Lambda, RDS, VPC, IAM, Kubernetes, EKS, K3s, Terraform, Ansible, CloudFormation, Prometheus, Grafana, Datadog, LGTM stack, Python, Bash, Infrastructure-as-Code (IaC), CI/CD, GitOps, Serverless, Event-driven computing, Rust

Benefits

  • Competitive compensation package with benefits tailored to employment or contractor status
  • Flexible working hours and full remote capability across global locations
  • Opportunity to shape and grow within an entrepreneurial, excellence-driven environment
  • Professional development plan with learning and certification support aligned to team and individual goals

Compensation

competitive salary package. benefits v

Work Arrangement

Fully remote with flexible hours, supporting a globally distributed team

Team

You will join a high-performing, globally distributed engineering team responsible for maintaining and scaling mission-critical infrastructure. The team emphasizes collaboration, knowledge sharing, and continuous learning, with a strong focus on operational excellence and proactive system design.

Additional Information

  • This role supports 24/7 systems with occasional on-call responsibilities and incident response duties
  • Regular participation in cross-team initiatives and architecture reviews is expected
  • Opportunities for mentorship, technical leadership, and process improvement are encouraged
  • The team follows agile practices with a focus on automation, observability, and security-by-design
  • Candidates must be comfortable working in a fast-paced environment with evolving technical challenges
Required Skills
AWSKubernetesPythonBashTerraformAnsibleCloudFormationPrometheusGrafanaDatadogDevOpsSREInfrastructureAutomationScripting AWSEC2S3LambdaRDSVPCIAMKubernetesEKSK3sTerraformAnsibleCloudFormationPrometheusGrafana
About company
Keyrock
Keyrock is a leading digital asset market maker that trades across 80+ exchanges and runs desks in market making, options, OTC, and DeFi. The company is known for its tech-first approach and Rust-based trading systems, actively shaping the future of digital asset markets.
All jobs at Keyrock Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago