Responsibilities
- Customer-centric engineering: Partnering directly with users and stakeholders to translate mission requirements into technical solutions, ensuring that Striveworks’ AI capabilities are fully integrated into their existing workflows.
- Tactical deployment and automation: Building and maintaining infrastructure-as-code (IaC) to deploy custom Kubernetes clusters across diverse environments, (e.g., AWS, Azure, GCP, and on-prem).
- Integration and troubleshooting: Acting as the primary technical point of contact for troubleshooting complex integrations, you’ll be deep in the logs, debugging containerized microservices and network and application configurations to unblock the customer.
- Product evolution: Feeding field insights back to our core product teams. You are the eyes and ears of the engineering org, identifying friction points in the installation process and automating them away.
- Mission execution: Leading software deployments on unclassified, CUI, and classified networks, ensuring that our AI remains reliable, adaptable, and ready to scale in dynamic environments.
Requirements
- 3–5+ years of hands-on experience in software, DevOps, site reliability, or systems engineering
- Customer-centric technical leadership that combines deep technical expertise with executive presence to lead cross-functional requirement gathering, manage professional incident response across diverse environments, and champion organizational interests to external stakeholders
- Production Kubernetes: Expertise in deploying and diagnosing microservices within K8s (using Helm and kubectl)
- Observability and monitoring: Experience with comprehensive observability solutions (metrics, logs, and traces) using tools such as Prometheus, Grafana, and OpenTelemetry to ensure system reliability and performance visibility
- Application assurance: Ability to design and execute testing strategies to validate application functionality, diagnose issues, and perform root cause analysis—implementing effective long-term fixes to improve stability and performance
- Infrastructure-as-code: Deep proficiency in Terraform, Ansible, or similar tools to manage virtual machines (VMs) and containerized services
- Programming proficiency: Strong scripting and coding skills in Bash and/or Python for building custom automation and tooling
- Linux mastery: Ability to manage and rapidly troubleshoot Linux systems (RHEL, Ubuntu, Alpine)
- Active Secret (or above) US security clearance
- US citizenship
Nice to Have
- Experience with DOD networking, tools, infrastructure, security requirements, and policies
- Proficiency with US federal information system security policies, including Security Technical Implementation Guides (STIGs), NIST SP 800-171, NIST SP 800-53, CMMC, and ICD 503
- Experience with software deployments to on-premises and cloud-based unclassified, CUI, and classified networks within the DOD
- Experience with DevSecOps/DevOps and Continuous Integration and Continuous Delivery (CI/CD) for the administration and deployment of GPU-enabled servers
- Experience with deploying or maintaining Cloud Native Computing Foundation (CNCF) projects
- Experience with network-attached storage (NAS) and storage area network (SAN) technologies
- Experience with Kubernetes and cloud-native applications and services in denied, disrupted, intermittent, and limited (DDIL) impact environments
Benefits
- Medical/dental/vision insurance
- Voluntary life, long-term disability, accident, and hospital indemnity insurance
- HSA and FSA (including dependent care FSA) plans
- 401(k) plan
- Unlimited PTO
- Paid parental leave
Additional Information
- Due to the nature of this role, candidates must have US citizenship
- Active Secret (or above) US security clearance