Principal Systems/Software Administrator Job Vancouver BC Canada,IT/Tech

Principal Systems/Software Administrator

This role has been designated as ‘Remote/Teleworker’, which means you will primarily work from home.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next.

We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career, our culture will embrace you. Open up opportunities with HPE.

Job Description:

HPE / Mist is seeking a Principal Systems/Software engineer (SRE) to join our cloud infrastructure team. In this role, you will support and scale highly available SaaS platforms powered by AI-driven cloud technologies.

You will play a critical role in maintaining production stability, improving reliability, and enabling rapid growth across multi-cloud environments (AWS and GCP). Your primary focus will be incident management, release management, and operational excellence for large-scale distributed systems.

Key Responsibilities

Ensure high availability, reliability, and performance of large-scale cloud infrastructure across AWS and GCP, meeting defined SLAs and SLOs
Operate and support infrastructure components, data streaming platforms, and databases, including:
Kubernetes, Kafka, Flink, Storm, Spark;
Cassandra, Elasticsearch, Redis, Postgres, Arango

DB, and related technologies
Monitor, troubleshoot, and resolve production issues across microservices and distributed systems
Partner closely with software engineering teams to debug and resolve complex production incidents
Participate in a 24x7 on‑call rotation supporting a multi‑cloud environment
Monitor system metrics, application performance, and infrastructure health
Own the full incident management lifecycle, including detection, mitigation, RCA creation, and post‑incident reviews
Develop, maintain, and improve runbooks and automated operational processes
Perform capacity planning using performance, usage, and utilization data
Apply and promote SRE best practices, operational standards, and continuous improvement initiatives

Required Qualifications

Bachelor’s degree in computer science, Computer Engineering, or equivalent practical experience
10+ years of overall Dev Ops / Site Reliability Engineering experience
7+ years of hands‑on experience with cloud platforms such as AWS or GCP, including:
Compute (EC2 / GCE), IAM, object storage (S3 / GCS);
Docker, Kubernetes (pods and clusters); CI/CD tools such as Jenkins;
Monitoring and observability tools (Prometheus, Cloud Watch, Stackdriver);
Linux‑based systems and configuration management (Ansible)
7+ years of experience deploying and managing production workloads using CI/CD pipelines in AWS or GCP environments
5+ years of administration experience with distributed systems and streaming platforms, including Kafka, Cassandra, Elasticsearch, Spark, Flink, Storm, and cloud services such as EMR, Dataproc, Elasti Cache, AWS RDS, or GCP SQL
5+ years of automation experience using Python, Go, and/or Rust, plus shell scripting
5+ years of experience designing and implementing metrics to monitor infrastructure and application health
Working knowledge of Infrastructure as Code (Terraform, Cloud Formation, or equivalent)

Nice to Have

Open‑source software contributions
Experience with AIOps or Generative AI technologies
Workflow and automation experience using Git Hub Actions, Google Workflows, Jenkins, Git Lab, Slack, and Jira/Confluence
Experience managing microservices release operations at scale

Additional

Skills:

Not listed

What We Can Offer You:

Health & Wellbeing

We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.

Personal & Professional Development

We also invest in your…


Increase/decrease your Search Radius (miles)



Job Posting Language

Principal Systems​/Software Administrator

Principal Systems/Software Administrator