Synechron

Cloud SRE Architect | AWS, Kubernetes, Infrastructure as Code, Observability, Reliability Frameworks

Bengaluru - EC-2 Gateway campus Full time

Job Summary
Synechron is seeking an experienced Cloud SRE Architect to lead the strategy, design, and implementation of scalable, resilient, and secure cloud platform solutions. This role involves establishing enterprise-wide reliability standards, managing large-scale cloud infrastructure, and fostering a culture of automation, observability, and continuous improvement. The ideal candidate will drive operational excellence, oversee incident response processes, and guide cross-functional teams to ensure high system availability and performance at scale.

Software Requirements

  • Required:

    • In-depth knowledge of cloud platforms such as AWS, Azure, or GCP, with extensive experience in core services including ECS/Fargate, EKS/Kubernetes, EC2, S3, Auto Scaling, and VPCs

    • Proven experience in designing and operating container platforms (ECS, Kubernetes)

    • Strong understanding of Infrastructure as Code (Terraform, CloudFormation)

    • Expertise in monitoring, logging, and observability tools such as Prometheus, Grafana, Datadog, Splunk, or Dynatrace

    • Solid experience in implementing security best practices like IAM, least-privilege policies, and cloud guardrails

    • Automation and scripting proficiency using Python, Bash, or similar

  • Preferred:

    • Experience with multi-region, multi-cloud architectures

    • Knowledge of service mesh architectures and advanced traffic management techniques

    • Familiarity with cost optimization (FinOps) practices in cloud environments

    • Experience with ITSM platforms such as ServiceNow

Overall Responsibilities

  • Define and drive enterprise-wide cloud reliability strategies, standards, and reference architectures

  • Architect and evolve highly available, scalable cloud infrastructure and platforms, ensuring they meet security, compliance, and performance benchmarks

  • Lead design and governance of SLI/SLO frameworks, error budgets, and KPIs across cloud services and microservices ecosystems

  • Establish and mature incident management processes including incident response, post-incident reviews, and operational readiness

  • Develop and implement observability architectures including metrics, logs, traces, and synthetic monitoring tools

  • Partner with security teams to define and enforce cloud security models, access controls, and audit policies

  • Promote automation in provisioning, deployment, and operational tasks, reducing manual efforts and operational risks

  • Mentor engineering teams on resilience patterns such as multi-AZ, multi-region deployments, and graceful degradation

  • Influence platform evolution and ensure infrastructure aligns with organizational cloud roadmap and scalability targets

  • Act as escalation point for complex production issues and systemic reliability risks

Technical Skills (By Category)

  • Cloud Technologies:
    AWS (ECS/Fargate, EKS, S3, EC2, VPC, IAM), Azure, GCP — core services for deployment, scaling, and security

  • Infrastructure as Code:
    Terraform, CloudFormation, or similar tools for automation and resource management

  • Containerization & Orchestration:
    Docker, Kubernetes (EKS or alternative) for containerized workloads

  • Monitoring & Observability:
    Prometheus, Grafana, Datadog, Splunk, Dynatrace for system health, performance, and troubleshooting

  • Security & Compliance:
    Implementation of least-privilege policies, encryption, security guardrails, and compliance with industry standards

  • Automation & Scripting:
    Proficient in Python, Bash, or PowerShell for automation tasks and system scripting

Experience Requirements

  • Minimum of 15+ years of experience in Site Reliability Engineering, Platform Engineering, or cloud infrastructure roles

  • Proven expertise in designing, deploying, and managing large-scale cloud platforms with high availability and security standards

  • Extensive hands-on experience with AWS services like ECS, EKS, S3, IAM, CloudFormation, and auto-scaling solutions

  • Demonstrated leadership in incident management, operational readiness, and reliability governance

  • Experience with multi-cloud and multi-region architectures is desirable

  • Proven ability to lead cross-functional teams across DevOps, security, and product areas

Day-to-Day Activities

  • Develop and enforce cloud reliability frameworks, standards, and best practices across enterprise platforms

  • Architect and optimize cloud infrastructure, ensuring high availability, security, and scalability

  • Lead incident response efforts, root cause analysis, and post-incident reviews for systemic issues

  • Monitor system health through observability tools, automate recovery and scaling processes, and improve system resilience

  • Collaborate with product, security, and engineering teams to implement automation, security guardrails, and cost management strategies

  • Influence platform roadmaps and technical strategies aligned with enterprise objectives

  • Provide escalation support for complex outages and systemic reliability concerns

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field

  • Certifications such as AWS Certified Solutions Architect, Azure Solutions Architect, or GCP Professional Cloud Architect are preferred

  • Extensive experience in cloud platform architecture, automation, and high-availability systems in large enterprise environments

  • Proven leadership in reliability engineering, incident management, and operational excellence in cloud environments

Professional Competencies

  • Strong analytical and troubleshooting skills for complex systemic issues

  • Excellent communication skills for engaging stakeholders and cross-team collaboration

  • Leadership capabilities to guide and mentor engineering teams on best practices

  • Strategic thinking aligned with enterprise cloud roadmap and operational goals

  • Ability to adapt quickly to technological advancements and evolving operational needs

  • Effective time and project management skills to handle multiple priorities with precision

S​YNECHRON’S DIVERSITY & INCLUSION STATEMENT
 

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.


All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice