AI/ML Operations Analyst
Listed on 2026-02-17
-
IT/Tech
AI Engineer, Data Analyst, Data Science Manager
Job Title: AI Platform Operations Analyst
Client
Location:
Remote - Candidates must be near Orlando;
Glendale;
Anaheim or Seattle
Starting: 02/19/2026
Pay Comments:
Minimum Pay (per hour): 68.57
Maximum Pay (per hour): 76.19
Hours: Full-time
Duration: 22 months
Job DescriptionHere is a dynamic and engaging job posting for the AI/ML Ops Analyst role:
Ignite Innovation:
Drive the Future of AI at a Global Leader!
Imagine being at the forefront of innovation for a global leader, shaping the future of entertainment and technology. This organization is building a cutting‑edge Generative AI platform and Center of Excellence, a critical hub powering all AI capabilities across its vast operations, from marketing content generation to core business functions. This is your chance to make a profound impact, ensuring the reliability, efficiency, and cost‑effectiveness of advanced AI systems that touch millions.
As an integral partner with Aquent, you will play a pivotal role in this transformative journey.
As an AI/ML Ops Analyst, you will be the operational nerve center of this groundbreaking AI platform. You'll directly influence the performance, cost, and stability of models, agents, and knowledge bases that drive critical business processes. Your expertise will ensure seamless operations, optimize cloud spend, and provide crucial insights that guide strategic decisions, making a tangible difference in how AI is leveraged across the entire organization.
You will be instrumental in building a robust, scalable, and responsible AI ecosystem, contributing to projects that redefine how AI is integrated into daily operations and strategic initiatives.
- AI/ML Operations: Manage operational workflows for model deployments, updates, and versioning across multi‑cloud environments. Monitor critical model performance metrics including latency, throughput, error rates, token usage, and inference quality. Proactively track model drift, accuracy degradation, and performance anomalies, escalating issues to engineering teams as needed. Support knowledge base operations, ensuring the health of vector embedding pipelines, chunk quality, and timely refresh cycles within cloud AI services.
Maintain a comprehensive model inventory and documentation across diverse cloud environments. Coordinate model evaluation cycles with Responsible AI and Core Engineering teams. - Agent & Server Operations: Monitor the health, performance, and reliability of AI agents. Track agent execution metrics such as task completion rates, tool call success/failure, latency, and error patterns. Support agent deployment and configuration management workflows. Document agent behaviors, known issues, and operational runbooks. Coordinate with Core Engineering on agent updates, testing, and rollouts. Monitor the availability, connection health, and integration status of context management servers.
- Fin Ops & Cost Management: Track and meticulously analyze AI/ML cloud spend across various cloud AI services. Develop insightful cost dashboards with breakdowns by model, application team, use case, and environment. Monitor token consumption, inference costs, and embedding/storage costs to identify trends. Identify and implement critical cost optimization opportunities through model selection, caching strategies, batching, and rightsizing. Provide accurate cost allocation reporting for chargeback/showback to consuming application teams.
Forecast spend trends and proactively flag budget anomalies, partnering closely with Infrastructure and Finance teams on AI cost governance. - Monitoring, Dashboarding & Reporting: Build and maintain comprehensive dashboards for platform performance, model health, agent metrics, and key operational KPIs. Create executive and stakeholder reports on platform adoption, usage trends, and detailed cost allocation. Develop Responsible AI dashboards tracking hallucination rates, accuracy metrics, guardrail triggers, and safety incidents. Monitor API gateway traffic patterns and API consumption trends. Provide regular reporting to product management on use case performance and impact.
- Release Operations Support: Support…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).