Site Reliability Engineer
About Scale Pad
Scale Pad is a market‑leading SaaS company headquartered in Vancouver, Toronto, Montreal and Phoenix, AZ. With a global employee reach, we serve over 12,000 MSPs worldwide, helping them increase client value through integrated, automated products. Scale Pad has earned multiple industry awards, including MSP Today’s Product of the Year and G2’s 2024 Fastest Growing Product.
Site Reliability Engineer – OverviewAs an SRE at Scale Pad, you will enable reliable, scalable infrastructure and developer platforms. You’ll automate operations, optimize performance, and maintain high availability across our cloud environments.
Responsibilities- Strong proficiency in system operations, observability, and infrastructure monitoring
- Full understanding of AWS offerings, including core compute, networking, storage, and IAM
- Experience with Infrastructure as Code (IaC) tools such as Terraform
- Proficiency in scripting and automation using Python, Bash, or equivalent languages
- Base knowledge of Java, Go, and Python is a strong plus
- Knowledge of CI/CD pipelines and best practices for continuous integration and delivery
- Experience with containerization and orchestration technologies such as Kubernetes and Docker
- Strong understanding of SLOs, SLAs, and incident management best practices
- Ability to troubleshoot and resolve complex system issues in a high‑availability environment
- Familiarity with Agile methodologies and Dev Ops culture
- Participate in the 24/7 on‑call rotation, responding to and resolving system outages
- Maintain and improve system uptime and reliability according to established Service Level Objectives (SLOs)
- Monitor and optimize system performance using observability tools like Prometheus and Grafana
- Implement and maintain alerting systems to proactively detect and resolve issues
- Execute capacity planning and scaling activities, ensuring infrastructure efficiency
- Respond to and resolve production incidents within defined Service Level Agreements (SLAs)
- Document incident responses and contribute to post‑mortem analysis to improve system resilience
- Implement preventive measures based on insights from incidents
- Manage escalations and coordinate with teams to resolve complex system issues
- Develop and maintain Infrastructure as Code (IaC) to enable automated infrastructure management
- Create and optimize CI/CD pipelines, ensuring smooth and reliable software releases
- Write automation scripts for routine operational tasks, reducing manual workload
- Implement monitoring solutions and dashboards to provide real‑time system visibility
- Work closely with development teams, ensuring seamless integration of SRE principles into application design
- Participate in team planning and retrospective meetings, contributing to continuous improvement
- Document technical processes and procedures, making knowledge accessible across teams
- Contribute to knowledge base maintenance, sharing best practices and troubleshooting insights
- Everyone’s an Owner:
Through our Employee Stock Option Plan (ESOP), each team member has a stake in our success and shares in the rewards. - Growth, Longevity and Stability:
Benefit from insights and training from our leadership and founder, whose extensive experience creates a stable environment for long‑term career growth. - Annual Training & Development:
Every employee receives an annual budget for professional development. - Hybrid Flexibility:
Work from our world‑class offices in Vancouver, Toronto, and Montreal, or from home with cutting‑edge gear. - Wellness at Work:
Our Vancouver office features a fitness facility and outdoor ping‑pong tables. - Comprehensive Benefits: 100% medical and dental coverage, RRSP matching after one year, and a monthly stipend to offset hybrid costs.
- Flexible Time Off:
Unlimited flex‑time plus accrued vacation for a healthy work‑life balance.
Applicants must be eligible to work in Canada. Scale Pad is committed to fostering a diverse, equitable, inclusive, and belonging environment. We value every individual’s unique experiences and perspectives.
Application ProcessPlease apply online. We only contact successful applicants. Recruiters and phone calls will not be accepted.
Seniority level:
Entry level |
Employment type:
Full‑time | Job function:
Engineering and Information Technology | Industries:
Software Development
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: