×
Register Here to Apply for Jobs or Post Jobs. X

AIOps Engineer

Job in Stanford, Santa Clara County, California, 94305, USA
Listing for: Select Source International
Full Time position
Listed on 2025-12-31
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Job Description

Location:

408 Panama Mall, Stanford, CA 94305 (Hybrid – 2 days on campus)

Duration: 12 months

Shift: 1st Shift (9am – 6pm)

Position Overview

The AI‑Ops Engineer is a key technical contributor responsible for evolving traditional Dev Ops into AI‑Ops s role leverages AI and machine learning to automate and enhance IT operations, including performance monitoring, anomaly detection, root‑cause analysis, and automated remediation.

Key Responsibilities
  • AI‑Driven Operations & Automation

    • Implement AI‑Ops solutions that use ML algorithms to automate performance monitoring, workload scheduling, and infrastructure management.
    • Build anomaly detection systems that identify infrastructure issues before they impact users.
    • Develop automated root‑cause analysis capabilities using ML to correlate events and filter noise from critical alerts.
    • Create predictive maintenance workflows that analyze historical patterns to proactively mitigate issues.
    • Design and implement automated remediation scripts that respond to incidents without human intervention.
  • Observability & Intelligent Monitoring

    • Architect comprehensive observability platforms that aggregate data from disparate sources into unified dashboards.
    • Implement intelligent alerting systems using NLP and ML to reduce alert fatigue and surface actionable insights.
    • Build real‑time analytics dashboards for coordinated diagnosis across teams.
    • Deploy application performance monitoring (APM) solutions integrated with AI‑driven analytics, ensuring end‑to‑end visibility across cloud infrastructure, applications, and AI/ML workloads.
  • Cloud Infrastructure & Dev Ops

    • Design, build, and maintain scalable, secure AWS infrastructure using Infrastructure as Code (Cloud Formation, Terraform, or CDK).
    • Implement and manage containerised environments using Docker, AWS ECS, Fargate, and Kubernetes (EKS).
    • Build CI/CD pipelines for continuous delivery, integrating AI‑powered code quality and deployment optimisation.
    • Manage cloud automation and optimisation to improve cost‑efficiency and resource utilisation.
    • Ensure compliance with Stanford and regulatory standards (FERPA, GDPR) for secure data handling and governance.
  • Collaboration & Continuous Improvement

    • Partner with cross‑functional teams to implement domain‑agnostic AI‑Ops solutions across the organisation.
    • Use Git‑based version control and code review best practices as part of a collaborative, agile workflow.
    • Document operational procedures, runbooks, and AI‑Ops workflows for team knowledge sharing.
    • Continuously evaluate and adopt emerging AI‑Ops tools, AWS services, and AI‑driven automation technologies.
    • Contribute to building an AI‑first operational culture that prioritises automation and predictive capabilities.
  • Requirements

    Required Qualifications
    • Education &

      Certifications:

      Bachelor’s degree in Computer Science, Dev Ops, Cloud Engineering, or a related field (Master’s preferred). AWS certification preferred (Solutions Architect, Sys Ops Administrator, or Dev Ops Engineer); professional‑level certification is a plus.
    • Experience: 3+ years in Dev Ops, SRE, or Cloud Engineering. 2+ years of hands‑on AWS experience (EC2, ECS, Lambda, S3, IAM, VPC) and scaling monitoring/observability solutions.
    • Familiarity: With ML/AI concepts and their application to operational automation.
    Technical Skills
    • Languages: Python (required);
      Bash, Go, or Type Script (preferred).
    • AIOps & Monitoring: Cloud Watch, X‑Ray, Prometheus, Grafana, Datadog, or Splunk with ML capabilities.
    • Infrastructure as Code: AWS Cloud Formation, Terraform, or AWS CDK.
    • Containers & Orchestration: Docker, AWS ECS/Fargate, Kubernetes (EKS).
    • AWS Services: Lambda, EC2, S3, API Gateway, Event Bridge, Cloud Watch, IAM, VPC, Code Pipeline, Sage Maker.
    • CI/CD Tools: Git Hub Actions, AWS Code Pipeline, Jenkins, or Git Lab CI.
    • Data & Analytics: Log aggregation, metrics analysis, and event correlation platforms.
    Desired Attributes
    • Strong understanding of AI‑Ops principles – using AI to enhance, not just support, IT operations.
    • Excellent problem‑solving, debugging, and root‑cause analysis skills.
    • Rapid learning, adaptability, and continuous improvement mindset.
    • Strong communication and collaboration skills with…
    To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
    (If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary