Data Engineer
Listed on 2026-02-16
-
IT/Tech
Data Engineer, Cloud Computing
Data Engineer with Databricks Concentration (Remote)
Company: New Math Data
Location: Houston, TX (Remote)
Salary: $117.20K - $140.70K/yr (Estimated pay)
Job Type: Full-time
Benefits: Medical, Dental, Vision, Retirement, PTO
About New Math DataNew Math Data is an award‑winning data and AI consulting firm and an AWS Advanced Tier Services Partner. We help startups, SMBs, enterprises, and public‑sector organizations build modern data platforms and AI‑enabled systems that operate reliably within real‑world governance, security, and regulatory constraints. Our work spans data foundations, analytics, machine learning, and generative AI, with a strong emphasis on production readiness and long‑term operability.
Position OverviewThe Data Engineer plays a hands‑on role in designing, building, and operating modern data platforms centered on Databricks and cloud‑native services. This role focuses on Lakehouse architectures, scalable data pipelines, and high‑quality datasets that support analytics, machine learning, and generative AI workloads. You will work closely with solution architects, ML engineers, and consultants to translate business requirements into reliable, maintainable data systems running primarily on AWS with Databricks as a core platform.
Key Responsibilities- Participate in technical discovery to understand business objectives, data sources, existing platforms, and operational constraints.
- Design and implement Databricks‑based Lakehouse architectures using Delta Lake for batch and streaming workloads.
- Build and maintain scalable data ingestion, transformation, and orchestration pipelines using Databricks, Apache Spark, and AWS‑native services.
- Develop ELT and data modeling workflows that support analytics, BI, machine learning, and generative AI use cases.
- Integrate Databricks with cloud storage and leverage services such as Amazon S3, Glue, Lambda, Step Functions, Event Bridge, and Open Search.
- Implement data quality checks, schema evolution strategies, and performance optimization for large‑scale datasets.
- Support AI and GenAI workflows by preparing curated datasets, embeddings, and feature‑ready data for downstream models and agents.
- Contribute production‑grade code, notebooks, infrastructure, and documentation that support secure and maintainable deployments.
- Participate in design reviews, sprint planning, and delivery ceremonies, identify risks, and propose pragmatic improvements.
- Apply established data governance, security, and compliance patterns including access controls, encryption, logging, and data lifecycle management.
- Communicate complex data engineering concepts clearly to both technical and non‑technical stakeholders.
- Collaborate with solution architects and ML engineers to implement architectural designs aligned with cloud and data best practices.
- Contribute to internal standards, reference architectures, and reusable templates for Databricks and AWS‑based data platforms.
- 4+ years of experience in data engineering, software engineering, or analytics engineering roles.
- Hands‑on experience building production data pipelines using Databricks and Apache Spark.
- Strong experience with Lakehouse concepts, Delta Lake, and modern data modeling approaches.
- Proficiency in Python and SQL for data transformation, analysis, and pipeline development.
- Experience integrating Databricks with AWS services.
- Familiarity with batch and streaming data patterns using tools such as Spark Structured Streaming or Kafka‑based services.
- Experience with infrastructure as code.
- Understanding of data security and cloud architecture principles, including least privilege, encryption in transit and at rest, and network isolation.
- Bachelor's degree in Computer Science, Engineering, MIS, or equivalent practical experience.
- Strong written and verbal communication skills, with the ability to work effectively in a consulting environment.
- Experience supporting machine learning or generative AI workloads using Databricks, AWS, or similar platforms.
- Familiarity with vector search, embeddings, or feature engineering for AI use cases.
- Exposure to MLOps…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).