Lead Platform Engineer
Job in
Baltimore, Anne Arundel County, Maryland, 21212, USA
Listed on 2026-06-02
Listing for:
3B Staffing
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing, IT Support
Job Description & How to Apply Below
Location:
Hybrid - Must reside within 50 miles of Baltimore, MD / Wilmington, DE / Charlotte, NC / Dallas, TX / New York, NY / Evansville, IN (local only to these location only)
Duration: 6+ Months Contract
Employment Type:
Contract
GC, USC
Required skills:
SRE, Open Telemetry, Elastic Observability, Grafana, Ops Ramp, Big Panda, AWS/Azure, Kubernetes, Docker, Cloud Watch, Monitoring & Alerting, SLI/SLO, Automation, Python/Bash/Power Shell, Distributed Systems, Incident Management, CI/CD, Terraform/Ansible.
Job Overview:
We are seeking an experienced Lead Platform Engineer to join a high-performing Monitoring & Observability Engineering team within a fast-paced enterprise environment. The ideal candidate will have strong expertise in Site Reliability Engineering (SRE), Open Telemetry, Elastic Observability, cloud monitoring, and enterprise platform reliability.
This role will focus on designing and enhancing observability frameworks, telemetry pipelines, monitoring standards, dashboards, alerting systems, and automation capabilities to improve system reliability, reduce MTTR, and support mission-critical production platforms.
Key Responsibilities:
• Design, deploy, and maintain Open Telemetry-based telemetry pipelines and observability frameworks
• Build and support enterprise monitoring solutions using Grafana, Elastic Stack, Ops Ramp, Big Panda, AWS Cloud Watch, and Azure Monitor
• Implement SRE best practices including SLIs, SLOs, error budgets, and reliability dashboards
• Collaborate with development, infrastructure, and platform teams to define observability and monitoring standards
• Develop actionable alerts, dashboards, tracing, and telemetry for distributed systems and business-critical applications
• Automate monitoring, incident response, self-healing workflows, and operational tasks
• Improve system reliability, performance tuning, capacity planning, and proactive issue detection
• Support incident management, root cause analysis, escalation, and recovery processes
• Maintain technical documentation, monitoring standards, runbooks, and diagnostic guides
• Mentor junior engineers and promote reliability engineering and operational excellence across teams
Required
Skills & Experience:
• 5+ years of experience in Platform Engineering, SRE, Reliability Engineering, or Monitoring Engineering roles
• Strong hands-on expertise with Open Telemetry, Elastic Observability (APM, Logs, Metrics, Traces), Grafana, Ops Ramp, Big Panda, Cloud Watch, and Azure Monitor
• Experience building scalable monitoring, dashboarding, telemetry, and alerting solutions across distributed environments
• Strong scripting/programming experience with Bash, Power Shell, Python, JavaScript, or C-family languages
• Expertise with AWS and/or Azure cloud platforms
• Strong experience with Kubernetes, Docker, and containerized platforms
• Understanding of distributed systems, networking, Dev Sec Ops , security, and performance engineering
• Excellent troubleshooting, analytical, and communication skills
• Ability to work cross-functionally in large enterprise environments
Preferred Qualifications:
• Experience with CI/CD tools such as Jenkins, Git Hub, Git Lab CI, or CircleCI
• Knowledge of Infrastructure as Code tools like Terraform or Ansible
• Experience with REST APIs, JSON, and Service Now
• Familiarity with microservices and event-driven architectures
• Experience with time-series data visualization and analytics
Education:
• Bachelor's degree in Computer Science, Information Technology, or related field preferred
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×