×
Register Here to Apply for Jobs or Post Jobs. X

Principal Systems​/Software Administrator

Remote / Online - Candidates ideally in
Vancouver, BC, Canada
Listing for: Hewlett Packard Enterprise
Remote/Work from Home position
Listed on 2026-03-10
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 CAD Yearly CAD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Principal Systems/Software Administrator

This role has been designated as ‘Remote/Teleworker’, which means you will primarily work from home.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next.

We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career, our culture will embrace you. Open up opportunities with HPE.

Job Description:

HPE / Mist is seeking a Principal Systems/Software engineer (SRE) to join our cloud infrastructure team. In this role, you will support and scale highly available SaaS platforms powered by AI-driven cloud technologies.

You will play a critical role in maintaining production stability, improving reliability, and enabling rapid growth across multi-cloud environments (AWS and GCP). Your primary focus will be incident management, release management, and operational excellence for large-scale distributed systems.

Key Responsibilities
  • Ensure high availability, reliability, and performance of large-scale cloud infrastructure across AWS and GCP, meeting defined SLAs and SLOs
  • Operate and support infrastructure components, data streaming platforms, and databases, including:
    Kubernetes, Kafka, Flink, Storm, Spark;
    Cassandra, Elasticsearch, Redis, Postgres, Arango

    DB, and related technologies
  • Monitor, troubleshoot, and resolve production issues across microservices and distributed systems
  • Partner closely with software engineering teams to debug and resolve complex production incidents
  • Participate in a 24x7 on‑call rotation supporting a multi‑cloud environment
  • Monitor system metrics, application performance, and infrastructure health
  • Own the full incident management lifecycle, including detection, mitigation, RCA creation, and post‑incident reviews
  • Develop, maintain, and improve runbooks and automated operational processes
  • Perform capacity planning using performance, usage, and utilization data
  • Apply and promote SRE best practices, operational standards, and continuous improvement initiatives
Required Qualifications
  • Bachelor’s degree in computer science, Computer Engineering, or equivalent practical experience
  • 10+ years of overall Dev Ops / Site Reliability Engineering experience
  • 7+ years of hands‑on experience with cloud platforms such as AWS or GCP, including:
    Compute (EC2 / GCE), IAM, object storage (S3 / GCS);
    Docker, Kubernetes (pods and clusters); CI/CD tools such as Jenkins;
    Monitoring and observability tools (Prometheus, Cloud Watch, Stackdriver);
    Linux‑based systems and configuration management (Ansible)
  • 7+ years of experience deploying and managing production workloads using CI/CD pipelines in AWS or GCP environments
  • 5+ years of administration experience with distributed systems and streaming platforms, including Kafka, Cassandra, Elasticsearch, Spark, Flink, Storm, and cloud services such as EMR, Dataproc, Elasti Cache, AWS RDS, or GCP SQL
  • 5+ years of automation experience using Python, Go, and/or Rust, plus shell scripting
  • 5+ years of experience designing and implementing metrics to monitor infrastructure and application health
  • Working knowledge of Infrastructure as Code (Terraform, Cloud Formation, or equivalent)
Nice to Have
  • Open‑source software contributions
  • Experience with AIOps or Generative AI technologies
  • Workflow and automation experience using Git Hub Actions, Google Workflows, Jenkins, Git Lab, Slack, and Jira/Confluence
  • Experience managing microservices release operations at scale
Additional

Skills:

Not listed

What We Can Offer You:

Health & Wellbeing

We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.

Personal & Professional Development

We also invest in your…

Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary