×
Register Here to Apply for Jobs or Post Jobs. X

Big Data Engineer

Job in Jersey City, Hudson County, New Jersey, 07390, USA
Listing for: TechDigital Group
Full Time position
Listed on 2025-12-25
Job specializations:
  • IT/Tech
    Data Engineer, Big Data, Cloud Computing, Data Analyst
Job Description & How to Apply Below

Mandatory

Skills:


Apache Spark, Hive, Kafka, Amazon Glue, Google Dataflow, Talend MDM, Hadoop, Presto, Strong experience with MySQL, Postgre

SQL, Mongo

DB, Cassandra.

Role: Big Data Engineer

Job Overview:
We're seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You'll work with structured, semi-structured, and unstructured data, focusing on optimizing data systems, building ETL pipelines, and deploying AI models in cloud environments.

Key Responsibilities:

  • Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.
  • Data Transformation/Validation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.
  • Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud Pub Sub.
  • Task Queues: Manage asynchronous processing with Celery, RQ, Rabbit

    MQ, or Kafka. Implement retry mechanisms and track task status.
  • Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.
  • Cloud Storage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, Big Query, Redshift, Synapse Analytics, and HDFS.
  • Required Skills:

  • ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend.
  • Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto.
  • Databases: Strong experience with MySQL, Postgre

    SQL, Mongo

    DB, Cassandra.
  • Machine Learning: Hands-on with Tensor Flow, PyTorch, Scikit-learn, XGBoost.
  • Cloud Platforms: Experience with AWS, Azure, GCP, Databricks.
  • Task Management: Familiar with Celery, RQ, Rabbit

    MQ, Kafka.
  • Version Control: Git for source code management.
  • Desirable

    Skills:

  • Real-time Data Processing: Experience with Apache Pulsar, Google Cloud Pub Sub.
  • Data Warehousing: Familiarity with Redshift, Big Query, Synapse Analytics.
  • Scalability Optimization: Knowledge of load balancing (NGINX, HAProxy) and parallel processing.
  • Data Governance: Use of MLflow, DVC, or other tools for model and data versioning.
  • Tools Technologies:

  • ETL: Apache Spark, Talend, AWS Glue, Google Dataflow.
  • Big Data: Hadoop, Kafka, Apache Flink, Presto.
  • Databases: MySQL, Postgre

    SQL, Mongo

    DB, Cassandra.
  • Cloud: AWS, GCP, Azure, Databricks.
  • Storage: S3, Big Query, Redshift, Synapse Analytics, HDFS.
  • Version Control: Git.
  • #J-18808-Ljbffr
    To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
    (If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary