AI Analysis Specialist Job Buffalo area,New York USA,IT/Tech

Remote (US-Based)
· Optional hybrid in New York, NY

About Locked In AI

Locked In AI is the #1 real-time AI interview and meeting copilot, trusted by over one million users worldwide. We are a fast-growing company building the most advanced career preparation platform on the market.

Our platform delivers real-time, AI-powered assistance during live job interviews, coding assessments, and professional meetings — helping candidates communicate with clarity, confidence, and competence.

Role Overview

We are looking for a detail-driven AI Analysis Specialist to measure, evaluate, and optimize the performance of Locked In AI's AI systems across every dimension — from model accuracy and response quality to user-facing impact and business outcomes.

This is an insights-to-action role — you will design evaluation frameworks, analyze large-scale datasets, uncover patterns that reveal how our AI is performing, and translate those findings into concrete recommendations that make the platform smarter and more reliable for over 1 million users.

As an AI Analysis Specialist, you will sit at the intersection of data analysis, AI evaluation, and product intelligence. Your scope spans the full AI lifecycle — analyzing training data quality, benchmarking model outputs, monitoring production performance, and measuring the downstream impact of AI features on user engagement, satisfaction, and success.

The ideal candidate combines deep analytical rigor with a practical understanding of how AI models behave in production. You are equally comfortable building evaluation pipelines, querying large-scale datasets, designing dashboards, and presenting findings to leadership.

Key Responsibilities

AI Model Evaluation & Quality Analysis

Design and maintain comprehensive evaluation frameworks to measure AI model performance, including accuracy, relevance, latency, hallucination rate, and contextual correctness across LLMs and speech-to-text systems
Build and manage benchmark datasets, golden answer sets, and scoring rubrics to systematically assess model output quality and track improvements over time
Conduct deep-dive analyses on model failure modes, edge cases, and quality regressions — identifying root causes and recommending targeted fixes to engineering and research teams
Evaluate retrieval-augmented generation (RAG) pipeline performance, measuring contextual relevance, retrieval accuracy, and groundedness of AI-generated responses

Data Analysis & Insight Generation

Analyze large-scale datasets spanning user interactions, model outputs, conversation logs, and product events to uncover trends, patterns, and opportunities for AI improvement
Develop and automate analytical workflows for ongoing monitoring of key AI performance metrics — including response quality, user satisfaction signals, completion rates, and error frequency
Perform exploratory data analysis to identify correlations between AI behavior and user outcomes, translating findings into hypotheses that inform product and model decisions
Build statistical models and conduct A/B test analysis to quantify the impact of AI changes, prompt updates, and model upgrades on user-facing metrics

AI Observability, Monitoring & Reporting

Design and build real-time dashboards and automated reporting systems that give engineering, product, and leadership clear visibility into AI system health and performance
Implement monitoring and alerting for model drift, latency spikes, hallucination rate increases, and other production anomalies that could degrade user experience
Track and report on AI cost efficiency metrics — including token usage, model routing decisions, and inference costs — to optimize spend across LLM providers
Create executive-level reports and presentations that translate complex AI performance data into clear business insights and strategic recommendations

Product Analytics & User Impact Measurement

Measure the user-facing impact of AI features by analyzing adoption rates, engagement patterns, retention signals, and user feedback data
Design and analyze experiments (A/B tests, holdback tests) to quantify how AI improvements translate into measurable product outcomes
Partner with…