We are seeking a talented and experienced Big Data Hadoop Developer to join our growing data engineering team. The ideal candidate will have 4-6 years of hands-on experience designing, developing, and optimizing big data solutions using the Hadoop ecosystem, with a strong focus on Apache Spark. You will be responsible for building and maintaining scalable data pipelines, processing large datasets, and collaborating with data scientists and analysts to deliver insights.
Responsibilities:
- Design, develop, and maintain robust and scalable ETL processes and data pipelines using Apache Hadoop and Apache Spark.
- Write efficient, clear, and well-documented code primarily in Scala, Python, or PySpark for big data processing.
- Implement data ingestion, transformation, and loading routines from various sources into Hadoop Distributed File System (HDFS) and other big data stores.
- Optimize existing Spark jobs and Hadoop ecosystem components for performance and scalability.
- Collaborate with data architects, data scientists, and other stakeholders to understand data requirements and translate them into technical solutions.
- Ensure data quality, integrity, and security across all big data platforms.
- Participate in code reviews, testing, and deployment of big data applications.
- Troubleshoot and resolve issues in big data environments.
- Stay up-to-date with the latest trends and technologies in the big data ecosystem.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field.
- 3-4 years of professional experience in Big Data development.
- Proven experience with the Hadoop ecosystem, including HDFS, YARN, Hive, and other related technologies.
- Hands on experience in SQL and shell scripting
- Strong expertise in Apache Spark for data processing and analysis.
- Proficiency in at least one of the following programming languages: Scala, Python, or PySpark.
- Experience with building and optimizing large-scale data pipelines.
- Familiarity with data warehousing concepts and ETL methodologies.
- Solid understanding of distributed computing principles.
- Excellent problem-solving skills and attention to detail.
- Ability to work independently and as part of a collaborative team.
Preferred Qualifications:
- Experience with cloud-based big data services (e.g., AWS EMR, Azure HDInsight, Google Cloud Dataproc).
- Experience with Databricks platform.
- Knowledge of other big data tools like Kafka, HBase, Flink, or Presto.
- Experience with SQL and NoSQL databases.
- Familiarity with CI/CD practices and tools (e.g., Git, Jenkins).
- Understanding of machine learning concepts and how they apply to big data.
Education:
- Bachelor’s degree/University degree or equivalent experience
This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required.
------------------------------------------------------
Job Family Group:
Technology
------------------------------------------------------
Job Family:
Applications Development
------------------------------------------------------
Time Type:
Full time
------------------------------------------------------
Most Relevant Skills
Please see the requirements listed above.
------------------------------------------------------
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.
------------------------------------------------------
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.