Bigdata Engineer
Listed on 2025-12-30
-
IT/Tech
Data Engineer, Cloud Computing
Design interfaces to the data warehouses/data storages and machine learning/Big Data applications using open source tools such as Scala, Java, Python, Perl and shell scripting.
Design and create data pipelines to maintain stable dataflow to the machine learning models – both in batch mode and near real-time mode.
Interface with Engineering/Operations/System Admin/Data Scientist teams to ensure data pipelines and processes fit within the production framework.
Ensure that tools and environments adhere to strict security protocols.
Deploy the machine learning model and serve its outputs as RESTful API calls.
Understand the business needs in close collaborations with subject matter experts (SMEs) and Data Scientists to do efficient feature engineering for machine learning models.
Maintain the code and libraries in code repository.
Work with system administration team to proactively resolve issues/install tools and libraries on the AWS platform.
Research and come up with architecture and solutions most appropriate for problems at hand.
Maintain and improve tools to assist Analytics in ETL, retrospective testing, efficiency, repeatability, and R&D.
Lead by example regarding software best practices, including code style and architecture, documentation, source control, and testing.
Support the Chief Data Scientist/Data Scientists/Big Data Engineers in creating new and novel approaches to solve challenging problems using Machine Learning, Big Data and Cloud.
Handle ADHOC requirements to create reports for the end users.
Required Skills- Strong skills with Apache Spark (Spark SQL) and SCALA with at least 2+ years of experience.
- Understanding of AWS Big Data components and tools.
- Strong Java skills with experience in web services and web development is required.
- Hands on experience with model deployment.
- Hands on experience in application deployment on Docker and/or Kubernetes or other similar technology.
- Linux scripting is a plus.
- Fundamental understanding of AWS cloud components.
- 2+ years of experience in data ingesting, cleansing/processing, storing and querying large datasets.
- 2+ years of experience in engineering large-scale data solutions with Java/Tomcat/ SQL/Linux.
- Experience working in a data intensive role including the extraction of data (db/web/api/etc.), transformation and loading (ETL).
- Exposure with structured and/or unstructured data contents.
- Experience with data cleansing/preparation on Hadoop/Apache Spark Ecosystem – Map Reduce/Hive/HBase/Spark SQL.
- Experience with distributed streaming tools like Apache KAFKA.
- Experience with multiple file formats (Parquet, Avro, OCR).
- Knowledge in AGILE development cycle.
- Efficient coding skills to enhance the performance/cost savings of the job running on AWS platform.
- Experience in building stable, scalable, and high-speed live streams of data and serving web platforms.
- Enthusiastic self-starter with ability to work in a team environment.
- Graduate (MS) or Undergraduate degree in Computer Science/ Engineering/relevant field.
- Strong Software development experience.
- Ability to write custom Map/Reduce programs to clean/prepare complex data.
- Familiarity with Streaming data processing - Experience with distributed real time computation system like Apache STORM/Apache Spark Streaming.
All your information will be kept confidential according to EEO guidelines.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).