AWS DevOps Engineer AI/ML
Job in
Plano, Collin County, Texas, 75094, USA
Listed on 2026-06-20
Listing for:
3B Staffing
Full Time
position Listed on 2026-06-20
Job specializations:
-
IT/Tech
AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Job Description & How to Apply Below
Job Title: AWS Dev Ops Engineer with AI/ML
Location:
Plano, TX
Duration: 06 Month RTH
Job Description:
Must Have
Skills:
- AI/ML knowledge (2-3 years)
- AWS
- Observability tools
- Grafana, Slunk, Dynatrace - Automation scripting languages
- Bash, shell script, groovy
- Programming language - java
- DEvOps skillset
- Terraform
• Design, develop, and deploy AI/ML models and agent-based systems that automate technology and platform workflows for the internal Deposits
2.0 platform.
• Lead the integration of intelligent agents into operational processes to improve decision-making, workflow execution, and process optimization across engineering and operations.
• Build AI-assisted tooling that improves Infrastructure as Code (IaC) development, validation, and change management (examples: Terraform, Ansible, Cloud Formation style patterns) across cloud and on-prem environments.
• Partner with Dev Ops and platform engineering teams to enhance CI/CD pipelines using AI/ML for signal detection, predictive analytics, and automated remediation.
• Develop AI-powered observability automation to monitor, analyze, and proactively manage application and infrastructure health for internal platform services.
• Automate alert triage, root cause analysis assistance, and incident response workflows using ML-driven techniques, with clear guardrails and measurable outcomes.
• Engineer or enable data pipelines and feature workflows needed to support model training, evaluation, and real-time or near-real-time inference use cases.
• Implement and operate MLOps capabilities (deployment patterns, monitoring, quality gates, rollback strategies, documentation) aligned to enterprise expectations for reliability and risk management.
• Collaborate with cross-functional teams (engineering, product, SRE, operations, architecture, controls) to identify high-value automation opportunities and deliver outcomes that can be adopted at scale.
• Continuously evaluate emerging AI/ML approaches and tooling, and translate them into practical, secure, and maintainable platform capabilities.
Required Qualifications (Minimum)
• Bachelor's or Master's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
• 5+ years total engineering experience with 3+ years hands-on experience delivering AI/ML engineering solutions in production environments.
• Strong programming skills in Python;
Java experience is a plus, especially for enterprise platform integration.
• Experience with ML frameworks such as PyTorch, Tensor Flow, and scikit-learn, including model training and evaluation workflows.
• Hands-on experience building agent-based systems and integrating them into real operational or engineering workflows.
• Experience applying AI/ML to automate or improve Infrastructure as Code workflows (examples: generation assistance, validation, policy checks, drift detection, change risk scoring).
• Familiarity with observability fundamentals and toolsets (examples: Prometheus, Grafana, ELK stack) and experience automating operational workflows using AI/ML.
• Strong foundations in data structures, algorithms, machine learning, statistics, and software engineering best practices.
• Experience integrating AI capabilities into modern software development practices and supporting legacy modernization or code transformation initiatives.
• Strong communication and collaboration skills, with the ability to work effectively across engineering and operations stakeholders.
Preferred Qualifications
• Depth in one or more areas such as large language models, NLP, knowledge graphs, reinforcement learning, ranking and recommendation, or time-series analysis.
• Experience with retrieval-augmented generation patterns or tool-using agents, including multi-step workflows and structured evaluation approaches.
• Experience with MLOps practices and concepts such as model registries, feature store patterns, CI/CD for ML, and model monitoring and drift detection.
• Experience building production systems on public cloud platforms (examples: AWS, Azure, GCP) and operating containerized workloads (examples: Docker, Kubernetes).
• Experience operating ML solutions in regulated enterprise environments, including producing controls-oriented documentation and supporting auditability expectations.
• Experience with platform-scale operational automation, including incident response automation and measurable reductions in operational toil.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×