×
Register Here to Apply for Jobs or Post Jobs. X

AWS DevOps Engineer AI​/ML

Job in Plano, Collin County, Texas, 75094, USA
Listing for: 3B Staffing
Full Time position
Listed on 2026-06-20
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Job Description & How to Apply Below
Position: AWS DevOps Engineer with AI/ML
Job Title: AWS Dev Ops Engineer with AI/ML

Location:

Plano, TX
Duration: 06 Month RTH


Job Description:
Must Have

Skills:

  • AI/ML knowledge (2-3 years)
  • AWS
  • Observability tools
    - Grafana, Slunk, Dynatrace
  • Automation scripting languages
    - Bash, shell script, groovy
Nice to have skills:
  • Programming language - java
  • DEvOps skillset
  • Terraform
Key Responsibilities

• Design, develop, and deploy AI/ML models and agent-based systems that automate technology and platform workflows for the internal Deposits
2.0 platform.

• Lead the integration of intelligent agents into operational processes to improve decision-making, workflow execution, and process optimization across engineering and operations.

• Build AI-assisted tooling that improves Infrastructure as Code (IaC) development, validation, and change management (examples: Terraform, Ansible, Cloud Formation style patterns) across cloud and on-prem environments.

• Partner with Dev Ops and platform engineering teams to enhance CI/CD pipelines using AI/ML for signal detection, predictive analytics, and automated remediation.

• Develop AI-powered observability automation to monitor, analyze, and proactively manage application and infrastructure health for internal platform services.

• Automate alert triage, root cause analysis assistance, and incident response workflows using ML-driven techniques, with clear guardrails and measurable outcomes.

• Engineer or enable data pipelines and feature workflows needed to support model training, evaluation, and real-time or near-real-time inference use cases.

• Implement and operate MLOps capabilities (deployment patterns, monitoring, quality gates, rollback strategies, documentation) aligned to enterprise expectations for reliability and risk management.

• Collaborate with cross-functional teams (engineering, product, SRE, operations, architecture, controls) to identify high-value automation opportunities and deliver outcomes that can be adopted at scale.

• Continuously evaluate emerging AI/ML approaches and tooling, and translate them into practical, secure, and maintainable platform capabilities.

Required Qualifications (Minimum)

• Bachelor's or Master's degree in Computer Science, Engineering, or related field, or equivalent practical experience.

• 5+ years total engineering experience with 3+ years hands-on experience delivering AI/ML engineering solutions in production environments.

• Strong programming skills in Python;
Java experience is a plus, especially for enterprise platform integration.

• Experience with ML frameworks such as PyTorch, Tensor Flow, and scikit-learn, including model training and evaluation workflows.

Hands-on experience building agent-based systems and integrating them into real operational or engineering workflows.

• Experience applying AI/ML to automate or improve Infrastructure as Code workflows (examples: generation assistance, validation, policy checks, drift detection, change risk scoring).

• Familiarity with observability fundamentals and toolsets (examples: Prometheus, Grafana, ELK stack) and experience automating operational workflows using AI/ML.

• Strong foundations in data structures, algorithms, machine learning, statistics, and software engineering best practices.

• Experience integrating AI capabilities into modern software development practices and supporting legacy modernization or code transformation initiatives.

• Strong communication and collaboration skills, with the ability to work effectively across engineering and operations stakeholders.

Preferred Qualifications

• Depth in one or more areas such as large language models, NLP, knowledge graphs, reinforcement learning, ranking and recommendation, or time-series analysis.

• Experience with retrieval-augmented generation patterns or tool-using agents, including multi-step workflows and structured evaluation approaches.

• Experience with MLOps practices and concepts such as model registries, feature store patterns, CI/CD for ML, and model monitoring and drift detection.

• Experience building production systems on public cloud platforms (examples: AWS, Azure, GCP) and operating containerized workloads (examples: Docker, Kubernetes).

• Experience operating ML solutions in regulated enterprise environments, including producing controls-oriented documentation and supporting auditability expectations.

• Experience with platform-scale operational automation, including incident response automation and measurable reductions in operational toil.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary