Responsibilities
- Set up, maintain, and manage Hadoop and AWS EMR systems.
- Use Terraform and infrastructure-as-code to deploy and manage cloud resources.
- Develop and maintain CI/CD pipelines for efficient software delivery.
- Automate deployment and operational workflows for Big Data platforms using DevOps tooling.
- Onboard users to Hadoop ecosystems and manage access via Kerberos, HDFS, Hive, HBase, and Yarn.
- Enforce security standards across HBase, HDFS, Kafka, Hive, and associated services.
- Optimize Hadoop clusters, MapReduce, and Spark jobs, and tune EMR for efficiency and cost savings.
- Monitor system health, logs, storage usage, and plan capacity proactively.
- Work with infrastructure, networking, database, application, and analytics teams to maintain platform stability.
- Support version upgrades and patching for EMR, HBase, Spark, and other data technologies.
- Diagnose and resolve system-level issues, including CPU, memory, OS, storage, and network performance.
- Deploy and configure Hadoop clusters with high availability, manage nodes, job scheduling, and backups.
- Enable seamless integration between cloud and on-premises networks and data platforms.
- Install, configure, and support tools such as Sentry, Spark, Kafka, Oozie, Solr, MongoDB, DocumentDB, and ELK stack.
- Integrate Active Directory or LDAP with Cloudera and manage access policies through Sentry.
Team
Reports to: Engineering Manager, Data Platforms