Job Summary
Synechron is seeking an experienced Cloud SRE Architect to lead the strategy, design, and implementation of scalable, resilient, and secure cloud platform solutions. This role involves establishing enterprise-wide reliability standards, managing large-scale cloud infrastructure, and fostering a culture of automation, observability, and continuous improvement. The ideal candidate will drive operational excellence, oversee incident response processes, and guide cross-functional teams to ensure high system availability and performance at scale.
Software Requirements
Required:
In-depth knowledge of cloud platforms such as AWS, Azure, or GCP, with extensive experience in core services including ECS/Fargate, EKS/Kubernetes, EC2, S3, Auto Scaling, and VPCs
Proven experience in designing and operating container platforms (ECS, Kubernetes)
Strong understanding of Infrastructure as Code (Terraform, CloudFormation)
Expertise in monitoring, logging, and observability tools such as Prometheus, Grafana, Datadog, Splunk, or Dynatrace
Solid experience in implementing security best practices like IAM, least-privilege policies, and cloud guardrails
Automation and scripting proficiency using Python, Bash, or similar
Preferred:
Experience with multi-region, multi-cloud architectures
Knowledge of service mesh architectures and advanced traffic management techniques
Familiarity with cost optimization (FinOps) practices in cloud environments
Experience with ITSM platforms such as ServiceNow
Overall Responsibilities
Define and drive enterprise-wide cloud reliability strategies, standards, and reference architectures
Architect and evolve highly available, scalable cloud infrastructure and platforms, ensuring they meet security, compliance, and performance benchmarks
Lead design and governance of SLI/SLO frameworks, error budgets, and KPIs across cloud services and microservices ecosystems
Establish and mature incident management processes including incident response, post-incident reviews, and operational readiness
Develop and implement observability architectures including metrics, logs, traces, and synthetic monitoring tools
Partner with security teams to define and enforce cloud security models, access controls, and audit policies
Promote automation in provisioning, deployment, and operational tasks, reducing manual efforts and operational risks
Mentor engineering teams on resilience patterns such as multi-AZ, multi-region deployments, and graceful degradation
Influence platform evolution and ensure infrastructure aligns with organizational cloud roadmap and scalability targets
Act as escalation point for complex production issues and systemic reliability risks
Technical Skills (By Category)
Cloud Technologies:
AWS (ECS/Fargate, EKS, S3, EC2, VPC, IAM), Azure, GCP — core services for deployment, scaling, and security
Infrastructure as Code:
Terraform, CloudFormation, or similar tools for automation and resource management
Containerization & Orchestration:
Docker, Kubernetes (EKS or alternative) for containerized workloads
Monitoring & Observability:
Prometheus, Grafana, Datadog, Splunk, Dynatrace for system health, performance, and troubleshooting
Security & Compliance:
Implementation of least-privilege policies, encryption, security guardrails, and compliance with industry standards
Automation & Scripting:
Proficient in Python, Bash, or PowerShell for automation tasks and system scripting
Experience Requirements
Minimum of 15+ years of experience in Site Reliability Engineering, Platform Engineering, or cloud infrastructure roles
Proven expertise in designing, deploying, and managing large-scale cloud platforms with high availability and security standards
Extensive hands-on experience with AWS services like ECS, EKS, S3, IAM, CloudFormation, and auto-scaling solutions
Demonstrated leadership in incident management, operational readiness, and reliability governance
Experience with multi-cloud and multi-region architectures is desirable
Proven ability to lead cross-functional teams across DevOps, security, and product areas
Day-to-Day Activities
Develop and enforce cloud reliability frameworks, standards, and best practices across enterprise platforms
Architect and optimize cloud infrastructure, ensuring high availability, security, and scalability
Lead incident response efforts, root cause analysis, and post-incident reviews for systemic issues
Monitor system health through observability tools, automate recovery and scaling processes, and improve system resilience
Collaborate with product, security, and engineering teams to implement automation, security guardrails, and cost management strategies
Influence platform roadmaps and technical strategies aligned with enterprise objectives
Provide escalation support for complex outages and systemic reliability concerns
Qualifications
Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field
Certifications such as AWS Certified Solutions Architect, Azure Solutions Architect, or GCP Professional Cloud Architect are preferred
Extensive experience in cloud platform architecture, automation, and high-availability systems in large enterprise environments
Proven leadership in reliability engineering, incident management, and operational excellence in cloud environments
Professional Competencies
Strong analytical and troubleshooting skills for complex systemic issues
Excellent communication skills for engaging stakeholders and cross-team collaboration
Leadership capabilities to guide and mentor engineering teams on best practices
Strategic thinking aligned with enterprise cloud roadmap and operational goals
Ability to adapt quickly to technological advancements and evolving operational needs
Effective time and project management skills to handle multiple priorities with precision
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.
All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.