Senior Site Reliability Engineer Job Grande Prairie area,Alberta Canada,Software Development

About the Role

As a Senior Site Reliability Engineer (SRE) on our team, you will leverage platform engineering principles to ensure that Shippo's services are reliable, scalable, and performant. You will be a hybrid software development and operations engineer, responsible for designing, building, and maintaining the infrastructure that supports our applications. Your work will directly impact our ability to meet and exceed SLAs, and you will collaborate closely with other engineering teams to create services that are automatable, measurable, and resilient to failure.

Responsibilities

Design, scale, and secure infrastructure to stay ahead of business needs through fault‑tolerant architecture design, performance testing, profiling, and tuning, and capacity planning
Design, build, deploy, and maintain automation, monitoring, and alerting systems, as well as design, implement, and test disaster recovery solutions
Ensure scalability and maintainability through microservices adoption, decoupling of concerns and data model, queuing of jobs and application layering
Enhance and maintain our CI/CD pipeline for smooth and safe production releases via automated testing and verification
Verify and ensure performance and correctness of systems in response time and throughput
Participate in peer reviews and testing and contribute to automated test suites and in design reviews for new features, products, and systems
Participate in an on‑call rotation

Qualifications

Experience developing, managing and troubleshooting highly available distributed systems, including operational experience with Kubernetes in a production environment
Extensive expertise with at least one public cloud provider (AWS, GCP, Azure)
Exceptional verbal, written, and interpersonal communication skills
Interest in and understanding of best‑in‑class security practices, and automation and testing methods
Familiarity with configuration and maintenance of common infrastructure components such as Redis, Elasticsearch, and Hadoop
Deep understanding of customer needs and passion for customer success
BS or MS degree in Computer Science or equivalent experience

Bonus

Advanced knowledge of managing and optimizing Postgresql server configuration
3+ years of experience in software development
E xperience with:
Defining and monitoring Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) to ensure that systems meet reliability and performance targets;
Monitoring Tools like New Relic, Prometheus, Grafana and/or Datadog
Open Telemetry knowledge for distributed tracing and metrics collection and experience on using it in production environments
Managing Python and Golang applications in production
Dev Ops tooling such as Docker, Terraform, ArgoCD, Argo Workflows, Circle

CI, Github Actions, New Relic, Pager Duty, etc
AWS/Cloud services such as EKS, EC2, S3, Lambda, Route 53, Cloud Front, Cloudflare, IAM, etc.

All qualified individuals are encouraged to apply. If you need assistance, or a reasonable accommodation during the application and recruiting process, please contact us at

#J-18808-Ljbffr