×
Register Here to Apply for Jobs or Post Jobs. X

Senior Software Engineer, Data

Job in Seattle, King County, Washington, 98127, USA
Listing for: Allen Institute for AI (AI2)
Full Time position
Listed on 2026-05-22
Job specializations:
  • IT/Tech
    Data Scientist, Data Engineering
Salary/Wage Range or Industry Benchmark: 126000 - 189000 USD Yearly USD 126000.00 189000.00 YEAR
Job Description & How to Apply Below

Persons in these roles are expected to work from our offices in Seattle. On‑site requirements vary based on position and team. If you have questions about on‑site work arrangements for this role, please ask your recruiter. Our base salary range is $126,000 - $189,000, and in addition we have generous bonus plans to provide a competitive compensation package. Who You Are

The Allen Institute for AI (Ai2) is hiring a Senior Data Engineer to build the data infrastructure behind AI research agents that explore and reason over scholarly literature. You'll work on the Semantic Scholar corpus, expanding what it covers and improving the quality of what’s already there, and create the APIs and tooling that these agents rely on at scale.

This role sits at the intersection of data engineering and applied ML. You'll own pipelines, design schemas, and ship production services, but you'll also apply practical ML techniques (entity resolution, text classification, embedding‑based similarity) to improve data quality and enrich metadata at scale, directly shaping what the agents can do. We're looking for a strong engineer who is comfortable across that full range.

Who

We Are

The Agentic Applications team builds open, production‑grade systems that power scientific discovery and large‑scale AI research. We focus on creating high‑quality structured datasets, integrating diverse content types, and enabling downstream applications across search, citation analysis, and model training. The team combines strong engineering practices with close collaboration across Ai2’s product and research orgs to deliver tools and infrastructure used by millions of researchers and developers worldwide.

Responsibilities
  • Improve the coverage and quality of the Semantic Scholar corpus across academic papers, patents, and new domain‑specific datasets
  • Build and maintain scalable data pipelines for corpus integration, citation resolution, and metadata enrichment
  • Develop and deploy ML models for entity disambiguation, author linking, and topic classification
  • Design and extend APIs that expose structured scholarly data to academic researchers and AI agent workflows
  • Contribute to dashboards and tools for evaluating data quality and model precision
  • Collaborate across engineering and research teams to ensure maintainability, test coverage, and robust deployment
What You’ll Need Required
  • Bachelor’s degree and 8+ years of technical experience; relevant experience may substitute for education.
  • Strong Python engineering skills, especially for building and maintaining data pipelines
  • Experience with SQL and schema design in production settings (Postgre

    SQL preferred)
  • Familiarity with ML workflows (training classifiers, tuning models, deploying for inference), particularly for large‑scale or ambiguous structured datasets
  • Comfortable working with structured data formats (XML/JSON/Parquet) and writing ETL code
  • Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (AWS, S3, Docker)
  • Strong communicator and a strong sense of ownership for results
Preferred
  • Experience with author disambiguation, entity resolution, or record linkage problems
  • Experience applying vector‑based similarity or topic modeling techniques to real‑world corpora at scale
  • Exposure to citation networks or scholarly data systems (e.g., arXiv, Open Alex, USPTO)
  • Familiarity with building APIs or data services consumed by automated or agent‑based workflows
Physical Demands and Work Environment
  • Must be able to remain in a stationary position for long periods of time.
  • The ability to communicate information and ideas so others will understand. Must be able to exchange accurate information in these situations.
  • The ability to observe details at close range.
A Little More About Ai2

Ai2 is a Seattle based non‑profit AI research institute founded in 2014 by the late Paul Allen. Our mission is building breakthrough AI to solve the world’s biggest problems. We develop foundational AI research and innovation to deliver real‑world impact through large‑scale open models, data, robotics, conservation, and beyond.

Ai2 is proud to be an Equal Opportunity employer. We do not discriminate based upon…

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary