Senior Site Reliability Engineer Job Charlotte area,North Carolina USA,IT/Tech

Job Description:

Asset Mark is a leading strategic provider of innovative investment and consulting solutions serving independent financial advisors. We provide investment, relationship, and practice management solutions that advisors use in helping clients achieve wealth, independence, and purpose.

The Opportunity

We are seeking a Site Reliability Engineer (SRE) to join our Charlotte-based engineering team. This role sits at the center of platform resilience - ensuring high availability, performance, recoverability, and operational maturity across Asset Mark's production systems.

This is not a traditional operations role. Our SREs are engineers first: designing automation, building observability frameworks, improving deployment safety, defining reliability standards, and reducing operational toil through code. You will influence architectural decisions, strengthen incident management practices, and raise the reliability bar across both legacy and cloud-native systems.

You will work on systems that operate 24/7, support financial transactions and advisor workflows, and must meet strict regulatory and security requirements.

The right candidate is energized by complex distributed systems, high-stakes production environments, and the responsibility of building durable, scalable financial infrastructure.

At Asset Mark, reliability is a first-order expression of client obsession. Our SRE team plays a critical role in delivering the consistent, trusted technology experience that advisors depend on to run their businesses.

We can only consider candidates for this position who are able to accommodate a hybrid work schedule and are close to our Charlotte, NC office.

Key Responsibilities

Reliability Engineering & Operations

* Design, implement, and continuously improve the reliability, availability, and performance of critical Asset Mark systems (batch, APIs, integrations, and customer-facing platforms)

* Define and operationalize SLIs, SLOs, and error budgets for critical services in partnership with engineering and product teams

* Participate in on-call rotations, incident response, and major incident management

* Lead and contribute to blameless post-incident reviews, driving root cause analysis and measurable reliability improvements

* Proactively identify reliability risks and lead remediation efforts before they impact clients

Observability & Monitoring

* Build and maintain end-to-end observability across applications, infrastructure, and integrations (metrics, logs, traces, alerts)

* Implement actionable monitoring and alerting to reduce noise and improve signal quality

* Partner with application teams to instrument services using best-in-class observability practices

* Ensure visibility into system health, capacity, performance, and failure modes across environments

Automation & Toil Reduction

* Identify repetitive operational tasks and automate them through code

* Improve deployment reliability through automation, self-service tooling, and safe rollout patterns

* Reduce manual intervention in batch processing, integrations, and operational workflows

* Apply Infrastructure-as-Code and configuration automation to improve consistency and repeatability

Cloud, Platform & Infrastructure Reliability

* Support reliability of Azure-based infrastructure, containerized workloads, and hybrid environments

* Partner with platform, Dev Ops, and infrastructure teams to improve resilience, scalability, and recovery

* Contribute to capacity planning, performance tuning, and cost-aware reliability decisions

* Ensure systems meet RTO/RPO, backup, and disaster recovery expectations

Secure & Compliant Operations

* Embed security, compliance, and risk controls into operational practices

* Work closely with Security and Compliance teams to meet financial services regulatory requirements

* Ensure production systems follow least privilege, secure configuration, and auditability standards

* Support vulnerability remediation and secure operational processes

Collaboration & Enablement

* Partner with application engineering teams to improve production readiness and operational maturity

* Influence system design by advocating for reliability-first architectural decisions

* Provide guidance on operational best practices, deployment safety, and observability standards

* Document operational patterns, runbooks, and reliability guidelines in Confluence

* Act as a reliability advocate across Asset Mark engineering teams

Knowledge, Skills, Abilities

* Strong software engineering skills in .NET / C# (or Python, Java, or similar)

* Experience operating distributed systems in production

* Deep understanding of SRE principles: SLIs/SLOs, error budgets, toil reduction, incident management

* Experience with Azure (or AWS/GCP), including compute, networking, and managed services

* Knowledge of containerization and orchestration (Docker, Kubernetes preferred)

* Experience with monitoring, logging, tracing, and alerting tools

* Familiarity with CI/CD pipelines, automation, and…