QTS Data Centers

Site Reliability and Security Engineer

Ashburn, VA Full time

The Senior Site Reliability and Security Engineer is responsible for ensuring the reliability, observability, and security posture of the QTS OS and SDP platforms deployed on AWS.

This role combines deep technical expertise in cloud operations and application-level security with leadership in incident response, production monitoring, and proactive threat detection.

The engineer will work closely with DevOps, backend, and application development teams to identify risks, resolve incidents, and drive continuous improvement in availability and security.

RESPONSIBILITIES

  • Monitor and analyze production environments for reliability, performance, and security risks across QTS OS and SDP platforms.
  • Lead troubleshooting of production incidents, guiding development teams to identify root causes and implement permanent fixes.
  • Collaborate with engineering teams to design and maintain robust observability practices — including metrics, logs, traces, and alerts — using AWS CloudWatch and related tools.
  • Identify and mitigate security threats across cloud infrastructure (IAM, Security Groups, VPC/PrivateLink, WAF) and application layers (API Gateway, Lambda, ECS).
  • Perform reviews of AWS IAM policies, roles, and network security configurations to detect privilege escalation or exposure risks.
  • Participate in and facilitate threat-modeling sessions with development teams.
  • Analyze code changes and 3rd-party dependencies to ensure alignment with internal security and compliance policies.
  • Partner with developers to design secure patterns for integrating new AWS services and frameworks.
  • Support incident response, participate in on-call rotation, and contribute to post-incident RCA documentation and follow-up actions.
  • Review service metrics, dashboards, and alarms to ensure coverage for critical user paths and backend systems.
  • Recommend and implement process improvements in monitoring, alerting, and escalation workflows.
  • Work with DevOps and architecture teams to assess the impact of new deployments on reliability and security posture.
  • Contribute to internal standards and best practices for secure and resilient system design.
  • Document technical findings, detection strategies, and mitigations for recurring risks.

Technical Expertise

  • AWS Services: ECS/Fargate, IAM, Security Groups, VPC/PrivateLink, WAF, Lambda, CloudWatch, S3, SQS/SNS, API Gateway, CloudFront.
  • Languages: Python, Java, TypeScript — with focus on scripting, automation, and code reviews for secure patterns.
  • Infrastructure & Tools: Terraform (IaC reviews and security validation), GitHub Actions (CI/CD observability).
  • Monitoring & Observability: AWS CloudWatch, CloudTrail, metrics/alarms configuration, log correlation, anomaly detection.
  • Security Practices: Threat modeling, SCA (Static Component Analysis), IAM least-privilege design, network isolation, runtime behavior analysis.
  • Incident Management: Root cause analysis, mitigation design, documentation, and coordination during live incidents.
  • Preferred familiarity: Snyk, CodeQL, AWS Config, or similar tools for vulnerability management.

BASIC QUALIFICATIONS

  • Bachelor’s degree in Computer Science, Engineering, or related field.
  • Due to the nature of systems and data supported, U.S. citizenship is required for this position.
  • 5+ years of experience supporting production systems in AWS, focusing on reliability or cloud security.
  • Strong understanding of AWS networking, IAM, and monitoring services.
  • Proven ability to guide teams during incident response and troubleshooting.
  • Demonstrated experience detecting and resolving infrastructure or application security risks.
  • Excellent diagnostic and analytical skills for complex distributed systems.

PREFERRED QUALIFICATIONS

  • Hands-on experience securing and monitoring large-scale AWS microservice deployments (ECS/Fargate).
  • Familiarity with automated dependency scanning, SCA tools, and security compliance monitoring.
  • Experience facilitating threat modeling sessions and implementing mitigation strategies.
  • Working knowledge of software development in Python, Java, or TypeScript.
  • Experience collaborating with DevOps and application teams to enforce secure SDLC practices.
  • Solid understanding of observability patterns, alert tuning, and SLO/SLA-driven reliability.

KNOWLEDGE, SKILLS, AND ABILITIES

  • Strong interpersonal skills for collaboration with engineering and operations teams at all levels.
  • Ability to balance security rigor with operational pragmatism and delivery timelines.
  • Excellent written and verbal communication for documenting incidents, standards, and remediation steps.
  • Capable of working independently and leading technical investigations of varying complexity.
  • Demonstrated ownership mindset for system health, reliability, and security posture.
  • Willingness to participate in limited on-call rotation supporting production environments.

TOTAL REWARDS

This role is also eligible for a competitive benefits package that includes: medical, dental, vision, life, and disability insurance; 401(k) retirement plan; flexible spending and HSA accounts; paid holidays; paid time off; paid volunteer days; employee assistance program; tuition assistance; parental leave; military leave assistance; QTS scholarship for dependents; wellness program, and other company benefits.

This position is Bonus eligible.

We conform to all the laws, statutes, and regulations concerning equal employment opportunities and affirmative action.  We strongly encourage women, minorities, individuals with disabilities and veterans to apply to all of our job openings.  We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, gender identity, or national origin, age, disability status, Genetic Information & Testing, Family & Medical Leave, protected veteran status, or any other characteristic protected by law.  We prohibit retaliation against individuals who bring forth any complaint, orally or in writing, to the employer or the government, or against any individuals who assist or participate in the investigation of any complaint or discrimination claim.

The "Know Your Rights" Poster is included here:

Know Your Rights (English)

Know Your Rights (Spanish)

The pay transparency policy is available here:

Pay Transparency Nondiscrimination Poster-Formatted

QTS is committed to working with and providing reasonable accommodations to individuals with disabilities. If you need a reasonable accommodation because of a disability for any part of the employment process, please send an e-mail to talentacquisition@qtsdatacenters.com and let us know the nature of your request and your contact information.