Principal IT Software Engineer
Listed on 2026-05-25
-
Software Development
Cloud Engineer - Software, AI Engineer, DevOps, Software Engineer
Job Summary
As an AI-Native Cloud Software Engineer, you won't just manage environments; you will build the software engines, intelligent pipelines, and autonomous systems that power our cloud presence. We are shifting from rigid configuration management to AI-driven, self-healing software architectures.
Role OverviewYou will design, develop, and optimize highly available, distributed cloud applications and infrastructure services across AWS, Azure, and GCP. Treating the entire cloud ecosystem as a programmable, AI-orchestrated software entity, you will bridge the gap between deep systems engineering, application development, and LLM-powered systems orchestration.
Responsibilities- Multi-Cloud Generative IaC & Software-Defined Infrastructure:
Architect and maintain scalable cloud systems across AWS, Azure, and GCP using Pulumi, AWS CDK, or Terraform. Integrate AI development workflows and custom LLM agents to accelerate safe infrastructure compilation, drift detection, and automated cross-cloud refactoring. - Intelligent Automation & Agentic Workflows:
Engineer custom software utilities, internal services, and autonomous agents using Type Script/Node.js, Go, or Python, alongside frameworks like Lang Chain or CrewAI to orchestrate complex provisioning, predictive auto-scaling, and closed-loop self-healing systems. - AI-Driven Cloud Governance & Economics:
Leverage predictive machine learning models to analyze multi-cloud spend patterns, autonomously executing real-time resource-optimization strategies via API-driven software actions such as dynamic spot-instance bidding, intelligent right-sizing across AWS, Azure, and GCP. - Cognitive Observability & Infrastructure Security:
Implement next-gen observability frameworks (Open Telemetry, Prometheus) coupled with AI anomaly detection. Embed security directly into the deployment pipeline, utilizing LLMs to automatically audit Cloud IAM policies, scan for vulnerabilities, and generate contextual patches. - Intelligent Container Orchestration:
Manage production-grade Kubernetes clusters (EKS, AKS, GKE). Optimize resource allocation, cluster auto-scaling, and service meshes using AI-driven traffic routing and predictive capacity planning. - Autonomous Incident Response:
Act as a tier-3 software escalation engineer for complex distributed systems anomalies. Help design and train our internal "On-Call AI Agent" to ingest logs, perform automated Root Cause Analysis (RCA), and submit pre-validated Pull Requests to resolve underlying system defects.
- Software Engineering & AI Orchestration:
Strong software engineering fundamentals in Type Script (Node.js), Go, or Python. Experience interfacing with LLM APIs (OpenAI, Anthropic, Google Vertex AI, AWS Bedrock), vector databases, and prompt engineering for systems-level orchestration. - Multi-Cloud & Containers:
Deep proficiency in at least two major cloud platforms (AWS, Azure, GCP) with a strong architectural understanding of the third. Expert-level knowledge of Kubernetes (CKA preferred) and cloud-native networking. - Next-Gen CI/CD:
Experience building intelligent delivery pipelines using Git Hub Actions or Git Lab CI, featuring integrated automated testing, security gates, and AI-assisted code reviews. - Systems Mastery:
Deep understanding of Linux internals, distributed systems architecture, asynchronous programming patterns, and performance tuning
6+ years of experience in Cloud Software Engineering, Site Reliability Engineering (SRE), or Distributed Systems Infrastructure.
2+ years of hands‑on experience integrating AI tools, LLMs, or predictive analytics into deployment workflows, pipelines, or software platforms.
Proven track record of architecting and operating large-scale, high-throughput distributed systems.
PreferredAgentic Problem‑Solving: A mindset that moves past "how do I automate this task?" to "how do I build an autonomous system that solves this permanently?"
Collaborative AI‑First Culture:
Ability to partner with Core AI/ML teams to bridge the gap between model deployment and high‑availability cloud infrastructure.
The compensation offered for this position will depend on…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).