Responsibilities
- Provide IT support for cloud-based compute and storage environments used to produce protected data products for research and dissemination.
- Administer distributed cloud platforms, including provisioning, configuration, monitoring, and lifecycle management for high-performance workloads.
- Support large data processing environments using frameworks such as Apache Spark and optimization tools.
- Establish and maintain operational parameters for deploying and executing data processing and dissemination workflows.
- Design and optimize cloud storage configurations, including data curation and removal of redundant datasets.
- Implement high performance computing (HPC), storage, and workflow optimizations.
- Deploy and manage monitoring and dashboard solutions to track cluster usage, performance, availability, and cost efficiency.
- Optimize resource utilization by identifying and terminating idle or unused compute resources.
- Manage user access, including account provisioning, role-based access, and deprovisioning.
- Install, configure, and maintain software, operating systems, and platform services, including Linux patching and updates.
- Produce operational and usage reports related to system performance and data storage.
- Provide guidance on cloud cost management, budgeting, and optimization strategies.
- Support agile project execution, including sprint planning, standups, retrospectives, and backlog management.
- Maintain Kanban or sprint boards and track tasks, milestones, and dependencies.
- Coordinate communication across engineering, operations, data science, and stakeholder teams.
- Identify and help resolve technical, operational, and process-related obstacles.
- Provide guidance on agile best practices and continuous improvement.
- Support data governance, data quality, and privacy protection requirements.
- Define and maintain project roles, responsibilities, and reporting structures.
- Develop and manage project schedules, deliverables, risks, and opportunities.
- Document and distribute meeting notes, action items, and decisions.
- Establish data transfer governance, standardized intake, validation, error handling, and repeatable evaluation workflows.
- Develop and maintain production-ready code, tools, and workflows to support data processing, experimentation, and dissemination, including testing and validation of outputs.
- Implement, monitor, and optimize cloud compute and storage environments, leveraging automation, infrastructure-as-code, and performance tuning for distributed processing frameworks.
- Manage source code, system updates, and operational tools, ensuring efficient collaboration, workflow tracking, and reliable execution of production and prototype workloads.
Requirements
- Bachelor’s degree in Computer Science, Information Technology, Engineering, Data Science, or a related field, or equivalent professional experience.
- Experience supporting AWS EMR cloud-based compute and storage environments.
- Hands-on experience with distributed data processing frameworks including Apache Spark.
- Experience administering Linux-based systems.
- Proficiency in scripting or programming languages commonly used for automation and data processing.
- Experience with source code management platforms and collaborative development workflows.
- Understanding of cloud cost management and optimization concepts.
- Strong analytical, troubleshooting, and problem-solving skills.
- Ability to communicate technical concepts to both technical and non-technical audiences.
Nice to Have
- AWS Certified Solutions Architect – Associate or Professional.
- AWS Certified DevOps Engineer – Professional.
- AWS Certified Data Analytics – Specialty.
- Google Professional Data Engineer.
- Microsoft Certified: Azure Solutions Architect Expert.
- Certified Kubernetes Administrator (CKA).
- HashiCorp Terraform Associate.
- CompTIA Cloud+.
- CompTIA Security+.
- Certified Information Systems Security Professional (CISSP).
- Project Management Professional (PMP).
Benefits
- Medical Insurance.
- Vision Insurance.
- Dental Insurance.
- Life and AD&D Insurance.
- 401(k) Savings Plan.
- Education and Professional Training.
- Flexible Spending Accounts (FSA).
- Employee Referral and Merit Recognition Programs.
- Employee Assistance and Identity Theft Protection.
- Paid Holidays: 11 per year.
- Paid Time Off (PTO).
- Disability Insurance.
Additional Information
- Must be a U.S. Citizen and able to obtain or maintain Public Trust (High) clearance.