As a Senior Site Reliability Engineer, you will play a pivotal role in ensuring the reliability, scalability, and security of mission-critical government systems. You will bridge the gap between development and operations by designing resilient infrastructure, automating deployment pipelines, and enforcing compliance with federal standards. This role demands a strong foundation in systems engineering, deep expertise in virtualized and containerized environments, and a proactive mindset toward incident prevention and resolution. You will work closely with development teams to reduce friction, improve system observability, and deliver high-impact solutions in fast-paced, high-stakes environments.

Responsibilities

Design and manage critical application deployments in virtualized or containerized environments such as VMware and Kubernetes, ensuring scalability, uptime, and adherence to federal standards.
Build and maintain automated CI/CD pipelines, monitoring systems, and configuration management processes to enable reliable software delivery and operational visibility across all environments.
Provision and support developer environments and toolchains to enable fast, secure, and efficient development aligned with mission objectives.
Detect and resolve obstacles in the software development lifecycle by implementing developer-centric solutions that improve productivity and workflow efficiency.
Foster strong customer relationships through technical leadership and deliver innovative, mission-driven solutions using deep systems expertise.

Requirements

Active Top Secret security clearance with eligibility for Sensitive Compartmented Information (SCI).
Possession of a DoD 8140-compliant certification such as Security+ or higher.
Minimum of 7 years of experience in software development, systems engineering, or IT operations with a focus on system reliability, performance, and availability.
Proven ability to integrate software engineering and systems administration to support scalable and highly available production systems.
Hands-on experience creating and managing monitoring, alerting, and observability frameworks to meet service level objectives.
Background in incident response, root cause analysis, and driving post-incident improvements.
Proficiency with Ansible and Desired State Configuration for infrastructure automation.
Experience using GitLab CI/CD and Bash scripting to streamline deployment workflows.
Familiarity with container-native and object storage technologies including MinIO, S3-compatible services, and PortWorx.
Knowledge of enterprise load balancing platforms such as F5.
Ability to quickly contribute in high-pressure, mission-critical environments with minimal onboarding time.

Nice to Have

Bachelor’s degree in Computer Science or a related technical field; relevant professional experience may be considered in lieu of a degree.

Tech Stack

VMware, Kubernetes, Ansible, Desired State Configuration, GitLab CI/CD, Bash scripting, MinIO, S3-compatible services, PortWorx, F5

Benefits

Comprehensive health, dental, and vision insurance coverage
401(k) retirement plan with company matching contributions
Paid time off and paid holidays
Parental leave and dependent care support
Flexible work arrangements including hybrid work options

Compensation

SALARY RANGE: $170,000 - $220,000. Includes performance-based bonuses, company-paid training and certifications, referral incentives, and additional rewards tied to individual and organizational performance.

Work Arrangement

hybrid — Flexible work arrangements

Team

Part of a collaborative, high-performing team delivering technical solutions for federal government clients.

Deep commitment to employee well-being and development
Customer-focused mission delivery
Two decades of experience assembling top-tier technical teams
Comprehensive benefits, professional growth opportunities, and support for work-life balance

Additional Information

This role is designated as essential personnel and may require on-call availability during critical incidents.
Candidates must be willing to undergo periodic reinvestigations for security clearance maintenance.
Position requires collaboration across geographically distributed teams, including occasional travel to government facilities.
All systems and processes must comply with federal security standards such as NIST 800-53 and FISMA.
Regular participation in disaster recovery drills and system audits is expected.

MetroStar is hiring a Sr. Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Compensation

Work Arrangement

Team

Additional Information

Similar Jobs

Cloud Systems Engineer

DevOps Engineer - Vice President

Hardware Enablement Engineer (Linux)

DevOps & Site Reliability Engineer

DevOps Engineer (m/w/d)

Software Engineer - Observability

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

MetroStar is hiring a Sr. Site Reliability Engineer

Responsibilities

Requirements

Nice to Have

Tech Stack

Benefits

Compensation

Work Arrangement

Team

Additional Information

Similar Jobs

Cloud Systems Engineer

DevOps Engineer - Vice President

Hardware Enablement Engineer (Linux)

DevOps &amp; Site Reliability Engineer

DevOps Engineer (m/w/d)

Software Engineer - Observability

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

DevOps & Site Reliability Engineer