×
Register Here to Apply for Jobs or Post Jobs. X

II Software Engineer AI​/ML Ops

Job in Pleasanton, Alameda County, California, 94566, USA
Listing for: BlackLine
Full Time position
Listed on 2026-06-05
Job specializations:
  • IT/Tech
    Machine Learning/ ML Engineer, AI Engineer (Applied/Software)
Salary/Wage Range or Industry Benchmark: 245000 - 307000 USD Yearly USD 245000.00 307000.00 YEAR
Job Description & How to Apply Below
Position: Staff II Software Engineer AI/ML Ops

As a Machine Learning Operations Engineer, you will play a pivotal role in bridging the gap between data science and production environments. You will collaborate with cross‑functional teams to streamline the machine learning lifecycle, ensuring seamless integration into operational systems.

Responsibilities
  • Leadership and Strategy
    • Partner with data science, security, and product teams to set evaluation and governance standards (guardrails, bias, drift, latency SLAs).
    • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments.
    • Lead incident response and reliability strategies for ML/AI systems.
  • AI System Deployment and Integration
    • Collaborate with development teams to integrate AI solutions into existing workflows and applications.
    • Ensure seamless integration with different platforms and technologies.
    • Define and manage MCP Registry onboarding, lifecycle versioning, and dependency governance.
    • Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows.
    • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics.
    • Implement logging, metering, and auditing for agent behavior, function calls, and compliance alignment.
    • Create scalable observability systems—tracking conversation outcomes, factual accuracy, latency, escalation patterns, and safety events.
    • Architect end‑to‑end guardrails for AI agents including prompt injection protection, identity‑aware routing, and tool usage authorization.
    • Collaborate cross‑functionally to standardize authentication, authorization, and session governance for multi‑agent runtimes.
  • Model Deployment and Integration
    • Architect and standardize model registries and feature stores to support version tracking, lineage, and reproducibility across environments.
    • Lead the deployment of machine learning models into production environments, ensuring scalability, reliability, and efficiency.
    • Collaborate with software engineers to integrate machine learning models into existing applications and systems.
    • Implement and maintain APIs for model inference.
  • Infrastructure and Environment Management
    • Design and manage training infrastructure including distributed training orchestration, GPU/TPU resource allocation, and automatic scaling.
    • Implement CI/CD for model workflows using pipelines integrated with model validation, bias checks, and rollback automation.
    • Build standardized experimentation frameworks for reproducible training, tuning, and deployment cycles (MLflow, W&B, Kubeflow).
    • Manage and optimize the infrastructure required for machine learning operations in cloud.
    • Work closely with other teams to ensure the availability, security, and performance of machine learning systems.
  • Monitoring and Maintenance
    • Implement robust monitoring solutions for deployed machine learning models to detect issues and ensure performance.
    • Collaborate with data scientists and engineers to address and resolve model performance and data quality issues.
    • Conduct regular system maintenance, updates, and optimizations to ensure optimal performance of machine learning solutions.
  • Automation and Orchestration
    • Develop and maintain automation scripts and tools for managing machine learning workflows.
    • Implement orchestration systems to streamline the end‑to‑end machine learning lifecycle, from data preparation to model deployment.
  • Collaboration with Data Science Teams
    • Collaborate with data scientists to understand model requirements and constraints for deployment.
    • Facilitate the transition of machine learning models from research to production, ensuring scalability and efficiency.
  • Performance Optimization
    • Identify and implement optimizations to enhance the performance and efficiency of machine learning models in production.
    • Conduct performance analysis and implement improvements based on resource utilization metrics.
  • Security and Compliance
    • Implement security measures to protect machine learning systems and data.
    • Ensure compliance with regulatory requirements and industry standards related to machine learning and data privacy.
    • Integrate audit controls, metadata storage, and lineage…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary