Senior Data Engineer
Listed on 2026-04-30
-
Software Development
Data Engineer, Machine Learning/ ML Engineer, AI Engineer
About The Role
As a Senior Data Engineer focused on AI/ML
, you will architect, build, and operate the specialized data infrastructure that powers Tebra’s intelligent features. You will serve as a technical subject matter expert in data systems, partnering closely with Machine Learning Engineers to transform raw, messy healthcare data into high‑quality training sets and real‑time inference features. This hands‑on role involves owning large data sub‑systems, translating business requirements into software solutions that accelerate our ability to deploy AI, and tackling technical challenges from data versioning to feature serving.
Area of Focus
- Architect and write software that solves complex business problems, specifically designing scalable pipelines for feature extraction, training data generation, and model monitoring logs.
- Own and serve as a Subject Matter Expert (SME) for large software systems, such as the organization’s Feature Store or Data Lakehouse, ensuring data availability for both experimentation and production inference.
- Continuously monitor data pipelines in production, detect data drift or quality anomalies, and implement automated recovery systems to ensure the reliability and freshness of features and training data over time.
- Lead Engineering Design Reviews, providing well‑articulated and reasoned explanations for architecture decisions (e.g., choosing between batch processing for training vs. real‑time streaming for inference).
- Write software frameworks that can be extended by others on the team, such as automated data quality checks and schema validation tools that prevent training‑serving skew.
- Translate business requirements into software solutions, bridging the gap between raw data sources and the structured inputs needed for advanced ML models.
- Know when and how to optimize complex code, specifically tuning Spark jobs or SQL queries to handle massive datasets required for Large Language Model (LLM) fine‑tuning or deep learning.
- Collaborate cross‑functionally with ML engineers to implement MLOps best practices, including data versioning, lineage tracking, and reproducibility.
- Expert at scoping tasks, breaking down complex data infrastructure initiatives into manageable deliverables for the squad.
- 5+ years of professional software development experience.
- Deep technical subject matter expertise in 3+ general areas of software development (e.g., Big Data Processing, Distributed Systems, Data Modeling).
- 3+ years of hands‑on experience in Data Engineering with a focus on supporting analytics or data science teams.
- Advanced proficiency in Python and SQL. Comfortable writing production‑grade code for data transformation and orchestration (not just scripts).
- Proven ability to architect and write software that enables ML at scale—moving beyond simple ETL to building robust data platforms.
- Strong background in modern data infrastructure relevant to AI (e.g., Spark, Airflow, Kafka, Vector Databases).
- Experience with Data Lake/Lakehouse architectures (e.g., Databricks, Snowflake, Delta Lake) and understanding how to structure data for efficient model training.
- Familiarity with MLOps concepts: understands the difference between a training set and a test set, and knows what “data leakage” is and how to prevent it in the pipeline.
- Proven ability to deploy and maintain data systems in production with CI/CD, monitoring, and alerting.
- Excellent technical communication and a product mindset—comfortable driving initiatives from concept to delivery.
- Background in healthcare software operations or working with structured business data.
- Experience implementing or managing a Feature Store (e.g., Feast, Tecton).
- Familiarity with Data Versioning Control tools (e.g., DVC, LakeFS).
- Published research or conference papers in data engineering, distributed systems, or machine learning.
- Experience with retrieval‑augmented generation (RAG) pipelines or vector search infrastructure.
- Contributions to open‑source data or ML infrastructure projects.
Zone 1 (National Average): $142,000 USD – $162,500 USD. Compensation is thoughtfully determined by experience, qualifications, specific role requirements, and geographic zone. In addition to base compensation, Tebra offers eligible employees the opportunity for variable pay and a robust benefits package reflecting a commitment to overall well‑being.
EEO StatementTebra is an equal opportunity employer. All applicants will be considered for employment without attention to age, race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).