Junior data engineer Job Gandhinagar area,West Bengal India,IT/Tech

Role Overview :
We're looking for a Junior Data Engineer to join our Data Platform team. You'll design and maintain scalable data pipelines and architectures using AWS services, enabling reliable data movement, transformation, and analytics 'll collaborate with analytics, product, and engineering teams to support reporting, dashboards, and insights for millions of students and schools.

Key Responsibilities:

Design, build, and maintain ETL/ELT pipelines for large-scale data ingestion, transformation, and loading.
Develop and optimize Spark and PySpark jobs for batch and real-time data processing.
Work with AWS services such as S3, Glue, Lambda, Redshift, Athena, and EMR to manage the data ecosystem.
Support the design and implementation of Data Lake and Data Warehouse architectures.
Implement data validation, partitioning, and schema management for efficient query performance.
Collaborate with data analysts and BI teams to ensure data availability and consistency.
Maintain data lineage, metadata, and ensure data quality and governance.
Implement monitoring and alerting for data ingestion and transformation pipelines.
Use Git and CI/CD tools to manage code and automate deployment of data workflows.

Qualifications:

Bachelor's degree in Computer Science, Information Technology, Data Engineering, or related field.
1–3 years of hands-on experience in data engineering, data pipeline development, or cloud-based data systems.
Strong knowledge of SQL and experience with Python or PySpark.
Practical experience with AWS data stack — S3, Glue, Lambda, Redshift, Athena, EMR, Step Functions, etc.
Understanding of data lake architecture, ETL/ELT frameworks, and data warehousing concepts.
Familiarity with Delta Lake, Spark SQL, or big data frameworks.
Good understanding of data modeling, partitioning, and performance tuning.
Excellent analytical, troubleshooting, and collaboration skills.

Good to Have / Plus
Exposure to GCP (Big Query, Dataflow, Cloud Storage) or Azure (Data Factory, Synapse, ADLS, Databricks).

Experience with Databricks for scalable data processing and Delta Lake management.
Knowledge of Postgre

SQL, MySQL, or No

SQL databases.
Familiarity with Airflow, Step Functions, or other orchestration tools.
Understanding of Dev Ops practices, CI/CD pipelines, and infrastructure automation.
Experience working in an EdTech or public data ecosystem is an advantage


Increase/decrease your Search Radius (miles)



Job Posting Language