Site Reliability Engineer Job Columbia area,Maryland USA,IT/Tech

👋🏼 Hey!

Bitwise is a leading provider of mission‑focused intelligence solutions that advance national security for the Intelligence Community and Department of Defense. We’re a small and growing company, so you can expect to hop in on the ground floor with us and be a consequential member of the team. You'll be more than a contract performer for us – you’ll also be asked for ideas to improve our company and your career, and you’ll contribute to our team culture.

We value growth and community above almost all else, so we gather regularly for game nights, happy hours, tech talks, and plenty more. We think you’ll like it here!

Remember, Bitwise is not a cult. 🖖🏼

🎯 What We Look For

Bitwise hires talented engineers who are driven by purpose and who value a culture of technical excellence, growth, and overall wellness. We deliver new and innovative intelligence solutions to our customers at the very forefront of our country’s national security missions. And we do it every single day. Our work matters, and so will you.

We ask that every new hire be able to:

Contribute meaningful thought leadership — if not right away, over time
Interact with our customers and earn their confidence in your abilities
Be detailed, even if that means taking a little more time to get it right
Contribute their ideas and ideals to improve all aspects of our company
Allow us to invest in their education so we both grow together
Know and live our core values every single day

💡 About this Role

You will be on a team of 20 full‑stack developers building AI‑powered cybersecurity tools that actually get used. Our work only matters if it lands in front of users and they love it—this philosophy drives everything we do. We've built a fleet of cloud‑based AI/ML products that our customers depend on, but we haven't cracked the SRE culture internally yet. Our developers are incredible at shipping features and talking to users, but infrastructure reliability needs love.

We need someone who gets genuinely excited about uptime, observability, and making systems more resilient. This role is about embedding with our full‑stack team and becoming personally invested in reliability. You'll be tinkering with and improving existing systems, not just keeping the lights on. If you're the kind of person who sees a flaky deployment pipeline or a mysterious latency spike and thinks "I need to figure this out," we want to talk to you.

We're looking for someone with SRE experience who also has a specific interest in keeping AI/ML infrastructure running smoothly. Passion for the SRE domain comes first. We need someone who can evangelize why this work matters.

Responsibilities will include:

Own the reliability and uptime of our cloud‑based product fleet, with particular focus on AI/ML infrastructure components.
Build and improve monitoring, alerting, and observability systems so we actually know what's happening in production.
Investigate and resolve incidents, then push for the systemic changes that prevent them from happening again.
Work directly with developers to improve deployment pipelines, reduce friction, and build reliability into the development process.
Drive SRE practices and mindset across the team, evangelize why this work matters and get others excited about infrastructure.
Automate toil away so you can focus on making things better, not just keeping them running.
Identify and address reliability risks before they become customer‑facing problems.
Tinker with and optimize existing systems—you want improvement opportunities everywhere.

📋 Requirements

Active TS/SCI with Polygraph. Candidates without a current clearance will not be considered.
3+ years of experience in SRE, Dev Ops, or infrastructure engineering roles.
Hands‑on experience with cloud platforms (AWS, Azure, GCP, or similar).
Strong understanding of monitoring and observability tools (we need you to teach us what good looks like).
Experience with infrastructure as code and configuration management.
Solid scripting and automation skills (Python, Bash, Go, or similar).
Understanding of CI/CD pipelines and deployment strategies.
Ability to debug complex distributed systems and trace issues across multiple…


Increase/decrease your Search Radius (miles)



Job Posting Language