Site Reliability Engineer
Join to apply for the Site Reliability Engineer role at Kyndryl.
Direct message the job poster from Kyndryl.
Recruitment & Strategic Staffing @Kyndryl | Partnering with IT Consultants in Financial Services & Technology- Position: Site Reliability Engineer
- Client: Financial Services - Capital Markets Technology
- Duration: 12-month contract with potential extensions
- Location: Toronto, Canada - 2 to 3 days onsite per week
- Language: English
- Hours: 37.5 hours/week
Our client is looking for a Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission‑critical batch workloads across Capital Markets Technology. The SRE will serve as a technical lead focused on automation, application development, systems performance engineering, and observability using Dynatrace. This position is pivotal in driving operational excellence and maturing reliability practices across the organization.
Qualifications- Expert‑level Python skills, including performance tuning, concurrency (async/multiprocessing), testing, and packaging.
- Strong Linux systems engineering expertise (kernel tuning, networking, process management, file system optimization).
- Proven experience optimizing batch workloads for performance, reliability, and cost efficiency.
- Deep knowledge of Dynatrace for observability (dashboards, KPIs, tagging, alerts, anomaly detection).
- Hands‑on experience with Apache Airflow (DAG design, scheduler tuning, SLA management).
- Strong understanding of distributed systems concepts — retries, idempotency, back pressure, data integrity.
- Experience with CI/CD pipelines (Git Hub Actions, Azure Dev Ops, Jenkins) and Infrastructure as Code (Terraform, Ansible).
- Familiarity with containers and orchestration tools (Docker, Kubernetes).
- Excellent incident management, troubleshooting, and communication skills.
- Reliability & Performance:
Engineer resilient and performant batch processing pipelines by reducing runtime and minimizing failures. - Observability:
Implement and maintain Dynatrace dashboards, alerts, and runbooks to ensure deep visibility into system health. - Systems Engineering:
Configure and tune Linux and Windows environments for optimal reliability and speed. - Automation & Orchestration:
Design and refine Airflow DAGs, automate deployments with CI/CD pipelines, and reduce operational toil through code. - Incident Management:
Lead incident response, conduct root‑cause analysis, and implement improvements based on post‑mortems and SLOs. - Security & Compliance:
Ensure all reliability and automation processes adhere to security best practices and regulatory compliance standards.
Please note this is for a contract position with one of our clients and not a full-time employment role with Kyndryl Canada.
Seniority level- Mid‑Senior level
- Contract
- Information Technology
- IT Services and IT Consulting
Referrals increase your chances of interviewing at Kyndryl by 2x.
Sign in to set job alerts for “Site Reliability Engineer” roles.#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: