Data Engineer Job Houston area,Texas USA,IT/Tech

Data Engineer with Databricks Concentration (Remote)

Company: New Math Data

Location: Houston, TX (Remote)

Salary: $117.20K - $140.70K/yr (Estimated pay)

Job Type: Full-time

Benefits: Medical, Dental, Vision, Retirement, PTO

About New Math Data

New Math Data is an award‑winning data and AI consulting firm and an AWS Advanced Tier Services Partner. We help startups, SMBs, enterprises, and public‑sector organizations build modern data platforms and AI‑enabled systems that operate reliably within real‑world governance, security, and regulatory constraints. Our work spans data foundations, analytics, machine learning, and generative AI, with a strong emphasis on production readiness and long‑term operability.

Position Overview

The Data Engineer plays a hands‑on role in designing, building, and operating modern data platforms centered on Databricks and cloud‑native services. This role focuses on Lakehouse architectures, scalable data pipelines, and high‑quality datasets that support analytics, machine learning, and generative AI workloads. You will work closely with solution architects, ML engineers, and consultants to translate business requirements into reliable, maintainable data systems running primarily on AWS with Databricks as a core platform.

Key Responsibilities

Participate in technical discovery to understand business objectives, data sources, existing platforms, and operational constraints.
Design and implement Databricks‑based Lakehouse architectures using Delta Lake for batch and streaming workloads.
Build and maintain scalable data ingestion, transformation, and orchestration pipelines using Databricks, Apache Spark, and AWS‑native services.
Develop ELT and data modeling workflows that support analytics, BI, machine learning, and generative AI use cases.
Integrate Databricks with cloud storage and leverage services such as Amazon S3, Glue, Lambda, Step Functions, Event Bridge, and Open Search.
Implement data quality checks, schema evolution strategies, and performance optimization for large‑scale datasets.
Support AI and GenAI workflows by preparing curated datasets, embeddings, and feature‑ready data for downstream models and agents.
Contribute production‑grade code, notebooks, infrastructure, and documentation that support secure and maintainable deployments.
Participate in design reviews, sprint planning, and delivery ceremonies, identify risks, and propose pragmatic improvements.
Apply established data governance, security, and compliance patterns including access controls, encryption, logging, and data lifecycle management.

Stakeholder Communication and Collaboration

Communicate complex data engineering concepts clearly to both technical and non‑technical stakeholders.
Collaborate with solution architects and ML engineers to implement architectural designs aligned with cloud and data best practices.
Contribute to internal standards, reference architectures, and reusable templates for Databricks and AWS‑based data platforms.

Required Qualifications

4+ years of experience in data engineering, software engineering, or analytics engineering roles.
Hands‑on experience building production data pipelines using Databricks and Apache Spark.
Strong experience with Lakehouse concepts, Delta Lake, and modern data modeling approaches.
Proficiency in Python and SQL for data transformation, analysis, and pipeline development.
Experience integrating Databricks with AWS services.
Familiarity with batch and streaming data patterns using tools such as Spark Structured Streaming or Kafka‑based services.
Experience with infrastructure as code.
Understanding of data security and cloud architecture principles, including least privilege, encryption in transit and at rest, and network isolation.
Bachelor's degree in Computer Science, Engineering, MIS, or equivalent practical experience.
Strong written and verbal communication skills, with the ability to work effectively in a consulting environment.

Preferred Qualifications

Experience supporting machine learning or generative AI workloads using Databricks, AWS, or similar platforms.
Familiarity with vector search, embeddings, or feature engineering for AI use cases.
Exposure to MLOps…


Increase/decrease your Search Radius (miles)



Job Posting Language