Leidos is hiring a Senior Reliability Engineer to join a mission-critical program supporting a Department of War enterprise data and analytics initiative. You will work directly with government partners and fellow engineers to translate operational needs into scalable, resilient, and production-ready solutions, playing a key role in product planning, execution, and continuous improvement within our mission-driven culture focused on outthinking, outbuilding, and outpacing the status quo.
What You'll Do
- Develop and implement strategies leveraging FOSS, COTS, and GOTS technologies to enhance platform reliability, resiliency, and scalability.
- Conduct lab-based SWIL and HWIL testing to validate system performance and ensure components meet scalability and operational requirements.
- Identify performance bottlenecks, analyze usage patterns, and recommend improvements to enhance system efficiency and scalability.
- Identify, diagnose, and address recurring incidents, performing root cause analysis, and implementing preventative measures.
- Produce and brief comprehensive resiliency and scalability assessments, providing insights into system behavior under load, failure modes, and recovery conditions.
- Translate findings into inputs for SLAs and KPPs to support informed decision-making by leadership.
- Prepare, maintain, and execute a System Engineering Plan (SEP) for managing all systems architecture and system engineering aspects of the program.
- Conduct systems engineering activities required to specify, build, and maintain system engineering designs for the System.
- Design, prepare, and document systems engineering and cybersecurity artifacts for the System.
- Support the Government in recommending and conducting enterprise system architecture activities.
- Define, document, maintain, and promulgate APIs and technical standards for using and interoperating within and outside the System.
- Design, engineer, integrate, and continuously improve the underlying infrastructure of the System.
- Identify, prepare, track, secure, and integrate government, commercial, and open-source tools and services into the System.
- Design, architect, engineer, and continuously improve the UI and UX components of the Platform.
- Perform site reliability engineering to build and maintain a reliable, scalable, and efficient System by applying software engineering principles to operational tasks.
What We're Looking For
- Active Top Secret (TS) clearance with SCI eligibility.
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or related technical discipline and 8–12 years of relevant experience OR Master’s degree in a related field and 6–10 years of relevant experience.
- Experience engineering and supporting enterprise cloud environments (AWS, Azure, or GCP).
- Experience implementing monitoring, observability, and performance management solutions.
- Experience conducting root cause analysis and implementing systemic reliability improvements.
- Experience integrating reliability engineering practices into DevSecOps pipelines.
- Experience operating within SAFe or large-scale Agile frameworks supporting enterprise systems.
- Experience with FOSS, COTS, and GOTS technologies.
- Proven experience in conducting SWIL and HWIL testing.
- Strong understanding of system performance analysis and optimization.
- Experience in root cause analysis and implementing preventative measures.
- Ability to produce and brief comprehensive technical assessments.
- Experience in preparing and maintaining System Engineering Plans (SEP).
- Strong documentation and communication skills.
Nice to Have
- Active TS/SCI clearance.
- Experience with DoD systems and environments.
- Familiarity with NIST security controls and Zero Trust compliance.
- Experience in defining and tracking KPIs and SLOs.
- Knowledge of AI/ML model serving and deployment.
- Experience in participating in Engineering Control Board (ECB) processes.
- Familiarity with cloud environments and DevSecOps practices.
- SAFe Agilist (SA) or related SAFe certification.
- Experience supporting multi-enclave DoD cloud environments.
- Experience implementing automated failover, redundancy, and capacity management solutions.
- Experience supporting enterprise-scale data, analytics, or AI platforms.
- Experience implementing Zero Trust-aligned resiliency patterns.
- Relevant cloud certification (AWS, Azure, or GCP).
Technical Stack
- FOSS, COTS, GOTS
- AWS, Azure, GCP
Benefits & Compensation
- Salary Range: $92,300.00 - $166,850.00
Leidos is an equal opportunity employer.




