This is a U.S. based position. All of the programs we support require U.S. citizenship to be eligible for employment. All work must be conducted within the continental U.S.
Who we are:
Raft (https://TeamRaft.com) is a customer-obsessed non-traditional defense tech company dedicated to empowering U.S. military and government agencies with cutting-edge AI/ML and data solutions. We are a leader in autonomous data fusion and Agentic AI, with a purposeful focus on Distributed Data Systems, Platforms at Scale, and Complex Application Development. With headquarters in McLean, VA, our range of clients includes innovative federal and public agencies leveraging design thinking, cutting-edge tech stack, and cloud-native ecosystem. We build digital solutions that impact the lives of millions of Americans.
About the role:
Raft is building mission-critical data platforms for the Department of War that process billions of events per day from hundreds of sensors and operational sources, delivering intelligence to operators who use it to make time-sensitive decisions. Our platform runs across multiple classification levels and deployment environments.
As a Senior DevOps Engineer at Raft, you won’t be operating in a pure infrastructure lane. You will be expected to understand the software you’re deploying, contribute to it when needed, and engage with the data pipelines flowing through the systems you manage. This is a role for someone who thinks end-to-end, from data ingest and pipeline performance through to Kubernetes-based deployment, observability, and secure operations in defense environments.
You will work across cloud and on-premises environments, partner closely with software and data engineers, and help Raft maintain the operational rigor and platform reliability that our most demanding customers depend on.
What You’ll Do
- Design, implement, and maintain secure Kubernetes-based infrastructure supporting data platform workloads across cloud and on-premises environments
- Build, manage, and improve CI/CD pipelines using GitLab and GitOps-based delivery patterns, enabling reliable, repeatable deployments across multiple classification levels
- Develop and maintain Infrastructure as Code (IaC) using tools such as Terraform and Ansible to provision, configure, and lifecycle-manage platform infrastructure
- Collaborate directly with software engineers to understand service architectures, dependencies, and runtime behavior, and contribute code-level changes where needed to improve deployability, reliability, or observability
- Support and optimize data streaming and processing pipelines built on technologies such as Kafka, Kafka Streams, Flink, and Pinot, diagnosing bottlenecks, tuning configurations, and ensuring data integrity across the platform
- Implement and manage platform observability using monitoring (Prometheus, Grafana), logging (Fluentbit, Loki, Kibana), and alerting tooling to maintain operational awareness in production environments
- Apply and enforce DevSecOps practices including container hardening, vulnerability scanning, software supply chain security, and compliance-driven deployment patterns in regulated government environments
- Manage and debug complex Helm chart deployments, service mesh configurations (Istio), and Kubernetes networking across multi-cluster and multi-environment topologies
- Support operations across multiple deployment targets, cloud-hosted (AWS, Azure), on-premises data centers, and edge/tactical environments, adapting platform patterns to the constraints of each
- Write clean, maintainable automation and tooling in Java or Go to accelerate platform operations, reduce toil, and improve developer experience across engineering teams
- Engage directly with customers at the most operationally demanding locations in the Department of War
What we are looking for:
- 5+ years of relevant hands-on experience in DevOps or platform engineering roles.
- 5+ years of production experience with Docker and Kubernetes, including provisioning, operating, and troubleshooting clusters in real-world environments
- Strong experience building and maintaining CI/CD pipelines, with hands-on proficiency in GitLab CI, GitOps workflows (Flux, ArgoCD), and modern software delivery practices
- Experience supporting data-intensive platforms using streaming technologies such as Kafka, or Flink, including configuration, tuning, and operational support
- Solid understanding of data engineering fundamentals, including ETL/ELT pipeline design, data storage patterns, data governance concepts, and integration with downstream consumers
- Proficiency with Infrastructure as Code tooling, particularly Terraform; experience with Ansible or similar configuration management tools
- Strong Helm proficiency, including authoring and maintaining charts for complex multi-service deployments
- Hands-on experience with platform observability tooling: Prometheus, Grafana, Fluentbit, Loki or Elasticsearch/Kibana
- Demonstrable software development skills in Java and/or Go, comfortable reading, modifying, and contributing to application codebases, not just deploying them
- Experience with cloud infrastructure on AWS and/or Azure, including networking, IAM, storage, and managed Kubernetes services
- Strong systems thinking, troubleshooting discipline, and the ability to work independently in a fast-moving environment with competing priorities
- Experience applying secure and compliant deployment practices in regulated or government environments
- Active Secret clearance required; must be eligible for and willing to obtain a Top Secret/SCI clearance
- Ability to obtain Security+ certification within the first 90 days of employment
- Ability to travel up to 25%
Highly preferred:
- Experience with service mesh technologies, particularly Istio, including traffic management, mTLS, and observability integration
- Familiarity with Kubernetes-based ML/AI platforms such as Kubeflow, KServe, or Ray, and experience supporting GPU-enabled workloads
- Experience with software supply chain security tools including container image scanning, SBOM generation, and runtime vulnerability management
- Background supporting deployments across multiple classification levels or air-gapped / disconnected environments
- Experience with package and dependency management across polyglot environments (Maven, Gradle, NPM, Yarn, pip)
- Familiarity with compliance frameworks relevant to DoW software deployment, including RMF, STIGs, and IL4/IL5/IL6 requirements
- Contributions to or ownership of internal developer platforms, golden path tooling, or shared infrastructure services
- Experience with distributed tracing and APM tooling (e.g., OpenTelemetry, Jaeger, Tempo)
- Existing TS/SCI clearance strongly preferred
What Success Looks Like
- Platform deployments are reliable, repeatable, and secure across every environment Raft operates in, from commercial cloud to classified on-premises
- Engineering teams move faster because CI/CD workflows, infrastructure tooling, and deployment patterns are solid, well-documented, and easy to use
- Data pipelines running through Raft’s platform are stable, observable, and performant, with clear ownership of issues when they arise
- You’ve earned the trust of software engineers by understanding what they’ve built and engaging meaningfully in conversations about architecture, runtime behavior, and operational trade-offs
- Compliance and security posture across deployment environments is continuously maintained, not bolt-on
Clearance Requirements:
- Minimum active Secret clearance with ability to obtain and maintain an active TS SCI security clearance
Salary Range: $150,000.00 - $200,000.00
Work Type:
- Hybrid with up to 25% travel
- Active Secret clearance required to start; TS/SCI eligibility required
What we will offer you:
- Highly competitive salary
- Fully covered healthcare, dental, and vision coverage
- 401(k) and company match
- Take as you need PTO + 11 paid holidays
- Education & training benefits
- Generous Referral Bonuses
- And More!
Our Vision Statement:
We bridge the gap between humans and data through radical transparency and our obsession with the mission.
Our Customer Obsession:
We will approach every deliverable like it's a product. We will adopt a customer-obsessed mentality. As we grow, and our footprint becomes larger, teams and employees will treat each other not only as teammates but customers. We must live the customer-obsessed mindset, always. This will help us scale and it will translate to the interactions that our Rafters have with their clients and other product teams that they integrate with. Our culture will enable our success and set us apart from other companies.
How do we get there?
Public-sector modernization is critical for us to live in a better world. We, at Raft, want to innovate and solve complex problems. And, if we are successful, our generation and the ones that follow us will live in a delightful, efficient, and accessible world where out-of-box thinking, and collaboration is a norm.
Raft’s core philosophy is Ubuntu: I Am, Because We are. We support our “nadi” by elevating the other Rafters. We work as a hyper collaborative team where each team member brings a unique perspective, adding value that did not exist before. People make Raft special. We celebrate each other and our cognitive and cultural diversity. We are devoted to our practice of innovation and collaboration.
We’re an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.