Job Summary
Synechron is seeking an experienced Cloud SRE Architect to lead the strategy, design, and implementation of scalable, resilient, and secure cloud platform solutions. This role involves establishing enterprise-wide reliability standards, managing large-scale cloud infrastructure, and fostering a culture of automation, observability, and continuous improvement. The ideal candidate will drive operational excellence, oversee incident response processes, and guide cross-functional teams to ensure high system availability and performance at scale.

Software Requirements

Required:
- In-depth knowledge of cloud platforms such as AWS, Azure, or GCP, with extensive experience in core services including ECS/Fargate, EKS/Kubernetes, EC2, S3, Auto Scaling, and VPCs
- Proven experience in designing and operating container platforms (ECS, Kubernetes)
- Strong understanding of Infrastructure as Code (Terraform, CloudFormation)
- Expertise in monitoring, logging, and observability tools such as Prometheus, Grafana, Datadog, Splunk, or Dynatrace
- Solid experience in implementing security best practices like IAM, least-privilege policies, and cloud guardrails
- Automation and scripting proficiency using Python, Bash, or similar

Preferred:
- Experience with multi-region, multi-cloud architectures
- Knowledge of service mesh architectures and advanced traffic management techniques
- Familiarity with cost optimization (FinOps) practices in cloud environments
- Experience with ITSM platforms such as ServiceNow

Overall Responsibilities

Define and drive enterprise-wide cloud reliability strategies, standards, and reference architectures
Architect and evolve highly available, scalable cloud infrastructure and platforms, ensuring they meet security, compliance, and performance benchmarks
Lead design and governance of SLI/SLO frameworks, error budgets, and KPIs across cloud services and microservices ecosystems
Establish and mature incident management processes including incident response, post-incident reviews, and operational readiness
Develop and implement observability architectures including metrics, logs, traces, and synthetic monitoring tools
Partner with security teams to define and enforce cloud security models, access controls, and audit policies
Promote automation in provisioning, deployment, and operational tasks, reducing manual efforts and operational risks
Mentor engineering teams on resilience patterns such as multi-AZ, multi-region deployments, and graceful degradation
Influence platform evolution and ensure infrastructure aligns with organizational cloud roadmap and scalability targets
Act as escalation point for complex production issues and systemic reliability risks

Technical Skills (By Category)

Cloud Technologies:
AWS (ECS/Fargate, EKS, S3, EC2, VPC, IAM), Azure, GCP — core services for deployment, scaling, and security

Infrastructure as Code:
Terraform, CloudFormation, or similar tools for automation and resource management

Containerization & Orchestration:
Docker, Kubernetes (EKS or alternative) for containerized workloads

Monitoring & Observability:
Prometheus, Grafana, Datadog, Splunk, Dynatrace for system health, performance, and troubleshooting

Security & Compliance:
Implementation of least-privilege policies, encryption, security guardrails, and compliance with industry standards

Automation & Scripting:
Proficient in Python, Bash, or PowerShell for automation tasks and system scripting

Experience Requirements

Minimum of 15+ years of experience in Site Reliability Engineering, Platform Engineering, or cloud infrastructure roles
Proven expertise in designing, deploying, and managing large-scale cloud platforms with high availability and security standards
Extensive hands-on experience with AWS services like ECS, EKS, S3, IAM, CloudFormation, and auto-scaling solutions
Demonstrated leadership in incident management, operational readiness, and reliability governance
Experience with multi-cloud and multi-region architectures is desirable
Proven ability to lead cross-functional teams across DevOps, security, and product areas

Day-to-Day Activities

Develop and enforce cloud reliability frameworks, standards, and best practices across enterprise platforms
Architect and optimize cloud infrastructure, ensuring high availability, security, and scalability
Lead incident response efforts, root cause analysis, and post-incident reviews for systemic issues
Monitor system health through observability tools, automate recovery and scaling processes, and improve system resilience
Collaborate with product, security, and engineering teams to implement automation, security guardrails, and cost management strategies
Influence platform roadmaps and technical strategies aligned with enterprise objectives
Provide escalation support for complex outages and systemic reliability concerns

Qualifications

Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field
Certifications such as AWS Certified Solutions Architect, Azure Solutions Architect, or GCP Professional Cloud Architect are preferred
Extensive experience in cloud platform architecture, automation, and high-availability systems in large enterprise environments
Proven leadership in reliability engineering, incident management, and operational excellence in cloud environments

Professional Competencies

Strong analytical and troubleshooting skills for complex systemic issues
Excellent communication skills for engaging stakeholders and cross-team collaboration
Leadership capabilities to guide and mentor engineering teams on best practices
Strategic thinking aligned with enterprise cloud roadmap and operational goals
Ability to adapt quickly to technological advancements and evolving operational needs
Effective time and project management skills to handle multiple priorities with precision

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Cloud SRE Architect | AWS, Kubernetes, Infrastructure as Code, Observability, Reliability Frameworks

Related Jobs

Software Engineer

Director, Analytics & AI Enablement – PDS BI&T

Senior Business Intelligence and Analytics Developer

Associate, Python Backend Developer

SOFTWARE DEVELOPMENT ENGINEER

Data Analytics Senior Developer