AI Analysis Specialist
Listed on 2026-06-05
-
IT/Tech
AI Engineer, Data Analyst, Data Scientist, Data Science Manager
Remote (US-Based)
· Optional hybrid in New York, NY
Locked In AI is the #1 real-time AI interview and meeting copilot, trusted by over one million users worldwide. We are a fast-growing company building the most advanced career preparation platform on the market.
Our platform delivers real-time, AI-powered assistance during live job interviews, coding assessments, and professional meetings — helping candidates communicate with clarity, confidence, and competence.
Role OverviewWe are looking for a detail-driven AI Analysis Specialist to measure, evaluate, and optimize the performance of Locked In AI's AI systems across every dimension — from model accuracy and response quality to user-facing impact and business outcomes.
This is an insights-to-action role — you will design evaluation frameworks, analyze large-scale datasets, uncover patterns that reveal how our AI is performing, and translate those findings into concrete recommendations that make the platform smarter and more reliable for over 1 million users.
As an AI Analysis Specialist, you will sit at the intersection of data analysis, AI evaluation, and product intelligence. Your scope spans the full AI lifecycle — analyzing training data quality, benchmarking model outputs, monitoring production performance, and measuring the downstream impact of AI features on user engagement, satisfaction, and success.
The ideal candidate combines deep analytical rigor with a practical understanding of how AI models behave in production. You are equally comfortable building evaluation pipelines, querying large-scale datasets, designing dashboards, and presenting findings to leadership.
Key ResponsibilitiesAI Model Evaluation & Quality Analysis
- Design and maintain comprehensive evaluation frameworks to measure AI model performance, including accuracy, relevance, latency, hallucination rate, and contextual correctness across LLMs and speech-to-text systems
- Build and manage benchmark datasets, golden answer sets, and scoring rubrics to systematically assess model output quality and track improvements over time
- Conduct deep-dive analyses on model failure modes, edge cases, and quality regressions — identifying root causes and recommending targeted fixes to engineering and research teams
- Evaluate retrieval-augmented generation (RAG) pipeline performance, measuring contextual relevance, retrieval accuracy, and groundedness of AI-generated responses
- Analyze large-scale datasets spanning user interactions, model outputs, conversation logs, and product events to uncover trends, patterns, and opportunities for AI improvement
- Develop and automate analytical workflows for ongoing monitoring of key AI performance metrics — including response quality, user satisfaction signals, completion rates, and error frequency
- Perform exploratory data analysis to identify correlations between AI behavior and user outcomes, translating findings into hypotheses that inform product and model decisions
- Build statistical models and conduct A/B test analysis to quantify the impact of AI changes, prompt updates, and model upgrades on user-facing metrics
- Design and build real-time dashboards and automated reporting systems that give engineering, product, and leadership clear visibility into AI system health and performance
- Implement monitoring and alerting for model drift, latency spikes, hallucination rate increases, and other production anomalies that could degrade user experience
- Track and report on AI cost efficiency metrics — including token usage, model routing decisions, and inference costs — to optimize spend across LLM providers
- Create executive-level reports and presentations that translate complex AI performance data into clear business insights and strategic recommendations
- Measure the user-facing impact of AI features by analyzing adoption rates, engagement patterns, retention signals, and user feedback data
- Design and analyze experiments (A/B tests, holdback tests) to quantify how AI improvements translate into measurable product outcomes
- Partner with…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).