Job Description: Hadoop Administrator
Role Overview
We are looking for an experienced Hadoop Administrator to manage, maintain, and upgrade our on-premises Hadoop clusters. The role involves working on Apache Hadoop versions 2.7.1 and 3.4.x, ensuring cluster stability, performance, and scalability, along with ecosystem components like Hive and YARN.
Key Responsibilities
1. Cluster Administration:
- Install, configure, and manage on-prem Hadoop clusters based on Apache Hadoop (versions 2.7.1 and 3.4.x)
- Administer and monitor Hadoop Distributed File System (HDFS) including storage management, replication, and data balancing
- Manage resource allocation and scheduling using Apache YARN
2. Upgrade & Migration:
- Hadoop upgrades (2.7 → 3.4) in on-prem environments
- Perform Cluster migration, Data validation (HDFS checksums, block integrity) & Backward compatibility validation for jobs
- Experience with migration tools and distribution changes
3. Ecosystem Management
- Administer and optimize Apache Hive (query tuning, metastore management)
- Work with ecosystem tools like Spark (basic admin exposure preferred) & Tez (execution engine for Hive)
- Ensure high availability for Hive Metastore and critical services
4. Performance & Optimization
- Monitor cluster health, job performance, and resource utilization
- Tune YARN queues and capacity scheduler, Hive queries and execution plans, Disk, memory, and network utilization
- Implement performance improvements for large-scale data processing
5. Security & Governance
- Configure authentication/authorization
- Ensure data security and compliance policies
- Manage access controls and audit logs
6. Troubleshooting & Support
- Provide support for Hadoop cluster issues
- Troubleshoot Node failures, Job failures (Hive/YARN) & Data inconsistencies
- Perform log analysis and root cause identification
7. Backup & DR
- Plan and manage backup and disaster recovery strategies
- Implement DistCp-based replication & DR cluster setup (active-passive / active-active)
Required Skills
- Strong hands-on experience with Apache Hadoop (2.7.x and 3.x)
- Expertise in Apache YARN, Apache Hive & HDFS administration
- Experience in on-prem cluster setup and hardware-level optimization
- Linux administration (RHEL/CentOS/Ubuntu)
- Shell scripting for automation