Citi

Senior Big Data Engineer - Vice President

Pune Maharashtra India Full time

The Applications Development Technology Lead Analyst is a senior level position responsible for establishing and implementing new or revised application systems and programs in coordination with the Technology team. The overall objective of this role is to lead applications systems analysis and programming activities.

  • Responsibilities

  • Design, develop, and optimize large-scale data processing jobs using Apache Spark (Java).
  • Build and maintain robust, scalable, and efficient ETL/ELT pipelines for data ingestion, transformation, and loading from various sources into data lakes and data warehouses.
  • Implement data governance, data quality, and data security standards within the data pipelines.
  • Collaborate with data scientists, analysts, and other engineers to understand data requirements and deliver appropriate data solutions.
  • Monitor, troubleshoot, and improve the performance of existing data pipelines and Spark applications.
  • Develop and maintain documentation for data pipelines, data models, and data processing logic.
  • Evaluate and implement new big data technologies and tools to enhance our data platform capabilities.
  • Participate in code reviews, design discussions, and contribute to the overall architectural vision of the data platform.
  • Ensure data solutions adhere to best practices for scalability, reliability, and cost-effectiveness.
  • Qualifications

  • Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field.
  • 10+ years of experience in data engineering, with a strong focus on Apache Spark development.
  • Proficiency in at least one programming language used with Spark (Java).
  • Solid understanding of distributed computing principles and big data technologies (Hadoop, HDFS, YARN, Hive, Kafka).
  • Strong SQL skills and experience with relational and NoSQL databases.
  • Experience with data warehousing concepts and data modelling (star schema, snowflake schema).
  • Familiarity with version control systems (Git) and CI/CD pipelines.
  • Excellent problem-solving, analytical, and communication skills.
  • Preferred Skills

  • Experience with stream processing technologies like Apache Kafka, Spark Streaming.
  • Knowledge of orchestration tools such as Apache Airflow, Azure Data Factory, or AWS Step Functions.
  • Familiarity with data visualization tools (e.g., Tableau, Power BI) and reporting.
  • Experience with containerization (Docker, Kubernetes).
  • Certification in Apache Spark or relevant cloud data engineering platforms.


Education:

  • Bachelor’s degree/University degree or equivalent experience
  • Master’s degree preferred


This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required.

------------------------------------------------------

Job Family Group:

Technology

------------------------------------------------------

Job Family:

Applications Development

------------------------------------------------------

Time Type:

Full time

------------------------------------------------------

Most Relevant Skills

Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills

For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

 

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.

View Citi’s EEO Policy Statement and the Know Your Rights poster.