Academic Researcher Job Austin area,Texas USA,Software Development

Join a leading AI lab's cutting-edge GenAI team to be at the core of the AI revolution, where your expertise fuels the development of the most advanced Large Language Models.

This is a W2 employment position with Cincinnatus LLC, with the opportunity to be placed at a leading AI Lab as part of their extended workforce.

1 Overview

We are seeking Professors and PhD students across all academic disciplines — STEM (ML, Coding, Data Science, CS, Physics, Mathematics, Engineering, Statistics) as well as professional and quantitative domains (Finance, Accounting, Economics, Law, Business) — to contribute to a project supporting a frontier-model evaluation effort focused on coding and agentic workflows.

You’ll design and validate challenging benchmark tasks to help surface and diagnose reasoning and problem-solving gaps in a target model. The work centers on building robust, real-world tasks with executable Python tests and then analyzing model/agent behavior. All applicants are expected to have working proficiency in Python.

2

Key Responsibilities

Design challenging, real-world domain-specific problems drawn from your area of expertise (e.g., financial modeling, legal reasoning, econometrics, ML, coding, scientific computation) that serve as the foundation for agentic tasks. Problems should be constructed to target specific core capability loss failures identified in a frontier AI model.
Integrate the problems into an agentic development environment, preparing all necessary components using Python.
Evaluate the target model’s performance on the tasks.
Identify tasks where the target model fails to pass all tests, specifically classifying the failure as a logical reasoning failure.

3 Core Qualifications

Current or retired professor, OR PhD student, in any of the following areas:
STEM: ML, Coding, Data Science, CS, Physics, Mathematics, Engineering, Statistics, Biology, Chemistry
Professional / Quantitative:
Finance, Accounting, Economics, Law, Business
Degree (or PhD in progress) from a top university in your field.
Working proficiency in Python — applied in research, industry, Git Hub, or coursework (not theoretical familiarity).
Ability to engage reliably for at least 30 hours/week during weekdays (i.e. at least 6 hours/day during weekdays).
Past experience in AI training, model evaluation and data annotation is preferred.
Basic ability to work independently and manage one’s time.
Verbal and written communication skills, problem solving skills, and interpersonal skills.

Equal Employment Opportunity

We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or any other legally protected characteristic. We are committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans throughout the job application process.

#J-18808-Ljbffr