×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

Founding Data Engineer

Job in Oakland, Pottawattamie County, Iowa, 51560, USA
Listing for: Elicit
Full Time position
Listed on 2025-10-28
Job specializations:
  • Software Development
    Data Engineer
Salary/Wage Range or Industry Benchmark: 185000 - 270000 USD Yearly USD 185000.00 270000.00 YEAR
Job Description & How to Apply Below
Location: Oakland

About Elicit

Elicit is an AI research assistant that uses language models to help professional researchers and high-stakes decision makers break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty.

What we're aiming for:

  • Elicit radically increases the amount of good reasoning in the world.

    • For experts, Elicit pushes the frontier forward.

    • For non-experts, Elicit makes good reasoning more accessible. People who don't have the tools, expertise, time, or mental energy to make carefully-reasoned decisions on their own can do so with Elicit.

  • Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.

  • Visit our Twitter to learn more about how Elicit is helping researchers and making progress on our mission.

    Why we're hiring for this role

    Two main reasons:

  • Currently, Elicit operates over academic papers and clinical trials. One of your key initial responsibilities will be to build a complete corpus of these documents, available as soon as they're published, combining different data sources and ingestion methods. Once that's done there is a growing list of other document types and sources we'd love to integrate!

  • One of our main initiatives is to broaden the sorts of tasks you can complete in Elicit. We need a data engineer to figure out the best way to ingest massive amounts of heterogeneous data in such a way as to make it usable by LLMs. We need your help to integrate into our customers  custom data providers so that they can create task-specific workflows over them.

  • In general, we're looking for someone who can architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.

    Our tech stack
    • Data pipeline:
      Python, Flyte, Spark

    • Probably less relevant to you, but ICOI:

      • Backend:
        Node and Python, event sourcing

      • Frontend:
        Next.js, Type Script, and Tailwind

    • We like static type checking in Python and Type Script!

    • All infrastructure runs in Kubernetes across a couple of clouds

    • We use Git Hub for code reviews and CI

    • We deploy using the gitops pattern (i.e. deploys are defined and tracked by diffs in our k8s manifests)

    Am I a good fit?

    Consider the questions:

    • How would you optimize a Spark job that's processing a large amount of data but running slowly?

    • What are the differences between RDD, Data Frame, and Dataset in Spark? When would you use each?

    • How does data partitioning work in distributed systems, and why is it important?

    • How would you implement a data pipeline to handle regular updates from multiple academic paper sources, ensuring efficient deduplication?

    If you have a solid answer for these—without reference to documentation—then we should chat!

    Location and travel

    We have a lovely office in Oakland, CA; there are people there every day but we don't all work from there all the time. It's important to us to spend time with our teammates, however, so we ask that all Elicians spend about 1 week out of every 6 with teammates.

    We wrote up more details on this page.

    What you'll bring to the role
    • 5+ years of experience as a data engineer: owning make-or-break decisions about how to ingest, manage, and use data

    • Strong proficiency in Python (5+ years experience)

    • You have created and owned a data platform at rapidly-growing startups—gathering needs from colleagues, planning an architecture, deploying the infrastructure, and implementing the tooling

    • Experience with architecting and optimizing large data pipelines, ideally with particular experience with Spark; ideally these are pipelines which directly support user-facing features (rather than internal BI, for example)

    • Strong SQL skills, including understanding of aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering approaches

    • Experience with columnar data storage formats like Parquet

    • Strong opinions, weakly-held about approaches to data quality management

    • Creative and user-centric problem-solving

    • You should be excited to play a key role in shipping new features to users—not just building out a data platform!

    Nice…
    Position Requirements
    5+ Years work experience
    To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
    (If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary