Senior Site Reliability Engineer
Job in
Grande Prairie, Alberta, Canada
Listed on 2026-06-03
Listing for:
Shippo Dev
Full Time
position Listed on 2026-06-03
Job specializations:
-
Software Development
Job Description & How to Apply Below
About the Role
As a Senior Site Reliability Engineer (SRE) on our team, you will leverage platform engineering principles to ensure that Shippo's services are reliable, scalable, and performant. You will be a hybrid software development and operations engineer, responsible for designing, building, and maintaining the infrastructure that supports our applications. Your work will directly impact our ability to meet and exceed SLAs, and you will collaborate closely with other engineering teams to create services that are automatable, measurable, and resilient to failure.
Responsibilities- Design, scale, and secure infrastructure to stay ahead of business needs through fault‑tolerant architecture design, performance testing, profiling, and tuning, and capacity planning
- Design, build, deploy, and maintain automation, monitoring, and alerting systems, as well as design, implement, and test disaster recovery solutions
- Ensure scalability and maintainability through microservices adoption, decoupling of concerns and data model, queuing of jobs and application layering
- Enhance and maintain our CI/CD pipeline for smooth and safe production releases via automated testing and verification
- Verify and ensure performance and correctness of systems in response time and throughput
- Participate in peer reviews and testing and contribute to automated test suites and in design reviews for new features, products, and systems
- Participate in an on‑call rotation
- Experience developing, managing and troubleshooting highly available distributed systems, including operational experience with Kubernetes in a production environment
- Extensive expertise with at least one public cloud provider (AWS, GCP, Azure)
- Exceptional verbal, written, and interpersonal communication skills
- Interest in and understanding of best‑in‑class security practices, and automation and testing methods
- Familiarity with configuration and maintenance of common infrastructure components such as Redis, Elasticsearch, and Hadoop
- Deep understanding of customer needs and passion for customer success
- BS or MS degree in Computer Science or equivalent experience
- Advanced knowledge of managing and optimizing Postgresql server configuration
- 3+ years of experience in software development
- E xperience with:
- Defining and monitoring Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) to ensure that systems meet reliability and performance targets;
- Monitoring Tools like New Relic, Prometheus, Grafana and/or Datadog
- Open Telemetry knowledge for distributed tracing and metrics collection and experience on using it in production environments
- Managing Python and Golang applications in production
- Dev Ops tooling such as Docker, Terraform, ArgoCD, Argo Workflows, Circle
CI, Github Actions, New Relic, Pager Duty, etc - AWS/Cloud services such as EKS, EC2, S3, Lambda, Route 53, Cloud Front, Cloudflare, IAM, etc.
All qualified individuals are encouraged to apply. If you need assistance, or a reasonable accommodation during the application and recruiting process, please contact us at
#J-18808-LjbffrPosition Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×