×
Register Here to Apply for Jobs or Post Jobs. X

Principal, Software Engineer

Job in Joplin, Jasper County, Missouri, 64803, USA
Listing for: Walmart
Full Time position
Listed on 2026-01-06
Job specializations:
  • IT/Tech
    AI Engineer, Systems Engineer
Job Description & How to Apply Below
Position: (USA) Principal, Software Engineer

Position Summary. Serve as a technical thought leader driving the next phase of Walmart’s Performance and Resiliency Engineering. Architect, build, and scale intelligent agentic AI/ML systems that proactively optimize speed, reliability, and business continuity across Walmart’s global platforms. Operate at the intersection of engineering, data science, and business—translating visionary ideas into actionable architecture and tangible solutions.

About Team. Building the right technology foundation for Infrastructure & platforms is vital to success at the scale of Walmart. Our team builds and maintains the foundational technologies that support the tech organization. Included in this are data platforms, enterprise architecture, Dev Ops, cloud computing, and infrastructure. All of these products and services are supported by scalable and powerful infrastructure, ensuring a secure and seamless employee and customer experience across stores, digital channels, and distribution centers.

Key Responsibilities
  • AI/ML & Agentic System Leadership
    • Design, fine-tune, and deploy Generative AI models (including LLMs) and agentic frameworks (e.g., RAG, Crew AI) for performance monitoring, anomaly detection, and automated remediation.
    • Develop and optimize LLM-based agents for multi-step reasoning, knowledge grounding, and decision-making.
    • Architect scalable, distributed AI systems with a focus on performance, fault tolerance, and disaster recovery.
    • Integrate external data sources (vector databases, observability stacks) to build dynamic, context-aware, and self-healing systems.
    • Lead the development of LLM evaluation pipelines (factuality, consistency, relevance) and implement safety guardrails.
  • Performance Engineering
    • Architect and implement AI/ML-driven solutions for continuous performance monitoring, automated tuning, and predictive scaling.
    • Establish and enforce performance benchmarks, SLAs, and SLOs; integrate performance testing into CI/CD pipelines.
    • Leverage advanced observability tools (Grafana, ELK, Splunk, Prometheus) and distributed tracing for actionable insights.
    • Optimize LLM inference (prompt caching, quantization, retrieval filtering) and system throughput.
  • Resiliency & Chaos Engineering
    • Champion resilient architectures that maintain business continuity during failures or spikes.
    • Lead chaos engineering initiatives: design and execute controlled failure scenarios, analyze impact, and drive improvements.
    • Leverage AI/ML for predictive failure detection, drift monitoring, and autonomous remediation.
    • Develop and maintain playbooks for critical/non-critical dependency failures and disaster recovery.
  • Technical Leadership & Collaboration
    • Guide engineering teams on best practices, technical design, and architectural decisions for AI/ML and agentic systems.
    • Collaborate with data scientists, ML engineers, SRE, and product teams to operationalize AI/ML models and integrate them into production.
    • Mentor engineers, foster a culture of continuous learning, and contribute to internal platform standards and engineering playbooks.
    • Drive experimentation (A/B testing, multi-armed bandits, causal inference) and champion innovation.
  • Product Integration & Delivery
    • Partner with cross-functional teams to deliver end-to-end, cloud-native solutions (GCP, Azure, Kubernetes, Docker).
    • Shape the architecture and roadmap for AI-powered performance and resiliency systems.
    • Ensure high standards for quality, security, and performance through rigorous design and code reviews.
What you’ll bring
  • Proven experience with LLMs, GenAI, RAG, agentic frameworks, and embedding-based workflows.
  • Deep expertise in distributed systems, cloud-native architectures, and scalable microservices (GCP, Azure, Kubernetes, Docker).
  • Strong programming skills:
    Python, Java, SQL; hands‑on with ML frameworks (PyTorch, Tensor Flow, Hugging Face Transformers).
  • Experience with performance engineering, chaos engineering, and building resilient, fault‑tolerant systems.
  • Demonstrated success in technical leadership, mentoring, and cross‑functional collaboration.
  • Strong experimentation background (A/B testing, causal inference) and MLOps (CI/CD, monitoring, drift detection).
  • Excellent…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary