×
Register Here to Apply for Jobs or Post Jobs. X

Data Engineer, Platform

Job in New York, New York County, New York, 10261, USA
Listing for: Basis Research Institute
Full Time position
Listed on 2026-05-25
Job specializations:
  • IT/Tech
    Data Engineer, Data Scientist, Data Analyst, Data Science Manager
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Location: New York

About Basis

Basis is a nonprofit applied AI research organization with two mutually reinforcing goals.

The first is to understand and build intelligence. This means to establish the mathematical principles of what it means to reason, to learn, to make decisions, to understand, and to explain; and to construct software that implements these principles.

The second is to advance society’s ability to solve intractable problems
. This means expanding the scale, complexity, and breadth of problems that we can solve today, and even more importantly, accelerating our ability to solve problems in the future.

To achieve these goals, we’re building both a new technological foundation that draws inspiration from how humans reason, and a new kind of collaborative organization that puts human values first.

About the Role

Data Engineers on the Platform team at Basis build trustworthy data pipelines with comprehensive provenance and quality gates, curate documented datasets for training and evaluation, and ensure data infrastructure scales reliably. You will work on both platform-specific data needs and cross-project data coordination, preventing duplicate work and facilitating shared datasets.

We are looking for people who are technically excellent and treat data quality as a first-class concern. The ideal Data Engineer has experience with ML data pipelines, understands the full lifecycle from raw data through model training and evaluation, and brings rigor to data provenance, lineage tracking, and quality assurance. You combine software engineering discipline with deep understanding of data systems and ML requirements.

This role is embedded across Platform and Research teams, working on infrastructure that supports both commercial offerings and internal research. You will help Basis scale data operations to support medium-scale models, ensure data governance as we serve external customers, and build systems that researchers can trust for reproducible experiments.

We seek individuals who aspire to do rigorous, high-quality, robust data engineering, but are not afraid to iterate, learn from real usage, and explore different approaches to achieve excellence.

Basis is a collaborative effort, both internally and with our external partners; we are looking for people who enjoy building data foundations for problems larger than ones they can tackle alone.

We expect you to:
  • Have demonstrated significant achievements in data engineering for ML/AI systems
    . Examples include:

    • Building data pipelines for model training or evaluation at scale

    • Developing feature stores or data platforms serving multiple teams

    • Creating data quality frameworks and implementing governance systems

    • Designing data architectures that enabled new ML capabilities

  • Possess strong proficiency in data technologies including SQL (expert level), Python for data processing, distributed computing frameworks (Spark, Dask), and workflow orchestration tools (Airflow, Dagster, Prefect).

  • Have experience with cloud data platforms including data warehouses (Snowflake, Big Query, Redshift), data lakes, object storage (S3), and streaming systems (Kafka, Kinesis, Flink) for both batch and real-time processing.

  • Understand ML data requirements including feature engineering, training/validation/test splits, data versioning, experiment reproducibility, and the specific data needs of different model types and training procedures.

  • Be skilled at data quality and governance including implementing validation frameworks, anomaly detection, data lineage tracking, metadata management, and ensuring compliance with privacy and security policies.

  • Have knowledge of data modeling principles for both relational and No

    SQL systems, understanding of schema design, normalization/denormalization tradeoffs, and performance optimization.

  • Value data provenance and documentation
    . You ensure data pipelines are transparent, decisions are documented, and others can understand and trust the data you deliver.

  • Progress with autonomy on complex data challenges
    . You can scope data projects, make sound architectural decisions, and deliver complete solutions from ingestion through consumption.

  • Be excited about enabling rigorous research through trustworthy data infrastructure that advances our ability to solve intractable problems.

In addition, the following would be an advantage:

  • Experience with feature stores (Tecton, Feast) or building feature platforms.

  • Background in ML research or research engineering providing understanding of data needs across experiment lifecycle.

  • Experience with data lineage tools (Apache Atlas, Data Hub, Monte Carlo) and metadata management.

  • Knowledge of vector databases and embedding pipelines for modern AI applications.

  • Contributions to data engineering open-source projects (Airflow, dbt, Great Expectations).

  • Understanding of responsible AI and data governance practices.

Responsibilities:
  • Design and build data pipelines for training and evaluation across Basis research projects and platform offerings, ensuring…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary