Infrastructure Engineer – Meteorological Development combines infrastructure engineering, platform reliability, and software development to deliver highly available, scalable, and high-performance systems supporting global meteorological data. Working in a modern cloud-native environment you design and operate distributed systems on Kubernetes (AWS EKS & GCP GKE) with Istio service mesh, ECS, and Docker containers, apply infrastructure as code (Terraform, Terragrunt), and manage CI/CD pipelines built on Jenkins.
You operate across AWS and/or GCP services, manage application delivery with Helm, and support stateful and caching layers such as Redis and Elasticache. Reliability is maintained through Linux operations and observability tooling (Prometheus, Grafana, Kibana, Elastic Search, Jaeger, Kiali). You create playbooks and runbooks using Bash and Ansible, collaborate with technical leads, developers, operations teams, and other infrastructure administrators to modernize and standardize infrastructure through automation, observability, and cloud/platform engineering best practices.
This role ensures platform stability, scalability, security, and operational excellence across critical meteorological systems and data pipelines that power The Weather Network, Meteo Media, ElTiempo.es, and related subsidiaries. The role is a contract position until February 2027, requires Canadian work eligibility, and is performed in a hybrid model at the Oakville, ON location.
- Design and implement core meteorological infrastructure services on Kubernetes, Istio, ECS, and AWS/GCP, deploying via Jenkins CI/CD and Helm.
- Use AI assisted development tools to improve productivity, code quality, and operational efficiency.
- Provide technical advice, estimate effort, and execute work based on priorities.
- Collaborate with cross-functional teams to ensure all systems function as intended and integrate monitoring systems.
- Work with Technology Operations to ensure end-to-end data monitoring in software design from acquisition to delivery.
- Leverage database knowledge to build applications that are distributed, multi-tier, and handle large data volumes.
- Support operational teams for production systems and maintain availability and stability.
- Maintain operational runbooks, automation playbooks, and reliable cloud infrastructure procedures for production systems.
- Degree or diploma in Computer Science, Engineering, Mathematics, or equivalent practical experience.
- 3+ years designing, implementing, and managing containerized environments on Docker and Kubernetes (EKS & GKE).
- Strong knowledge of AWS, GCP, Cloud Stack, and Proxmox.
- Experience with core networking concepts (TCP/IP, NAT, DNS, load balancing, firewalls).
- Strong experience building and supporting production-grade distributed systems in Linux.
- Proficiency in at least one language:
Go, Python, C/C++, or Rust. - Experience with scripting and automation (Python, Bash, Terraform, Helm, Terragrunt, Groovy, or JavaScript).
- Understanding of data structures, algorithms, and performance optimization.
- Experience designing and supporting highly available, scalable systems handling large data volumes.
- Experience with relational and No
SQL databases, including caching technologies such as Redis or AWS Elasti Cache. - Experience with CI/CD practices and tools such as Jenkins.
- Experience building and consuming RESTful APIs and services.
- Familiarity with security best practices and secure infrastructure design principles.
- Experience with monitoring and observability tools:
Prometheus, Grafana, Elastic Search, Kibana, Jaeger, Kiali. - Understanding of cloud cost optimization.
- Strong problem-solving and communication skills with ability to gather requirements and collaborate across teams.
- Experience maintaining operational runbooks, automation playbooks, and reliable cloud infrastructure procedures.
- Experience with AI/ML infrastructure or platforms is an asset.
- Interest in data and data mining, problem solving.
- Experience with spatiotemporal data sets and scientific formats (netCDF, HDF, GRIB, BUFR).
- Experience with Ansible.
- Knowledge of applying ML/AI to augment data analysis.
- Contract role until February 2027.
- Eligible to work in Canada and able to work in a hybrid model at the Oakville, ON location.
- Transparent communication and open forums with leadership.
- Employee pulse surveys and anonymous reporting platform promote inclusive feedback.
- Focus on doing the right thing and delivering real-user impact quickly.
- Learning opportunities and openness to new technologies.
- Reduced unnecessary meetings.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: