Important Information

Experience: +10 years

Job Mode: Full-time

Work Mode: Remote

ID: 20192

Job Summary

We are seeking a Senior AWS Infrastructure & Platform Engineer with deep expertise in designing, optimizing, and operating cloud infrastructure on AWS. This role is responsible for improving the reliability, performance, and cost-efficiency of our AWS environments while enabling engineering teams to deploy faster and with greater confidence. The ideal candidate combines hands-on AWS architecture skills with a strategic mindset for infrastructure optimization and a track record of measurable improvements in system performance, deployment velocity, and cloud spend.

Responsibilities and Duties

AWS Infrastructure Strategy & Cost Optimization

AWS Platform Strategy: Define and implement the strategic vision for LendKey’s AWS infrastructure, ensuring alignment with business goals while continuously optimizing for performance, reliability, and cost.
Cost Optimization: Proactively analyze and reduce AWS spend using Cost Explorer, Trusted Advisor, Compute Optimizer, and custom reporting. Implement rightsizing, Reserved Instance/Savings Plan strategies, spot instance utilization, and architectural efficiencies to drive measurable cost reductions.
Infrastructure Efficiency: Identify and eliminate waste across compute, storage, networking, and data transfer. Establish tagging strategies and cost allocation models to drive accountability across teams.

Reliability & Performance Engineering

System Reliability: Design and implement highly available, fault-tolerant architectures across multiple AWS availability zones. Drive improvements to meet and exceed 99.9% SLA through proactive capacity planning, auto-scaling, and chaos engineering practices.
Performance Optimization: Continuously profile, benchmark, and optimize system performance across compute, networking, storage, and database layers. Reduce latency, improve throughput, and eliminate bottlenecks.
Disaster Recovery: Design, document, and regularly test disaster recovery procedures to ensure business continuity across all critical systems.

Deployment Speed & Developer Experience

CI/CD Pipeline Optimization: Design, maintain, and continuously improve CI/CD pipelines to minimize build times, reduce deployment friction, and enable engineering teams to release frequently and safely.
Deployment Strategies: Implement and manage blue/green, canary, and rolling deployment patterns to minimize downtime and reduce deployment risk.
Developer Self-Service: Build internal tooling, templates, and self-service capabilities that enable engineering teams to independently provision resources, deploy services, and troubleshoot issues within established guardrails.
Infrastructure as Code (IaC): Champion Terraform best practices including module development, state management, code review workflows, and automated drift detection to ensure all infrastructure is version-controlled and reproducible.

Monitoring, Observability & Incident Management

Observability: Implement and maintain comprehensive monitoring, logging, and tracing solutions (e.g., CloudWatch, New Relic, Prometheus/Grafana, OpenTelemetry) to provide full-stack visibility into system health and performance.
SLI/SLO Framework: Define and instrument Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to create data-driven reliability targets and alerting.
Incident Response: Lead a rotational on-call schedule. Drive blameless post-incident reviews and implement preventive measures to reduce incident recurrence.

What We Are Looking For

A hands-on AWS infrastructure engineer who architects and optimizes cloud environments for reliability, speed, and cost-efficiency.
A systems thinker who proactively identifies bottlenecks—whether in deployment pipelines, cloud costs, or system performance—and drives measurable improvements.
A builder who creates tools, automation, and self-service platforms that multiply the effectiveness of engineering teams.
A leader who thrives in collaborative environments and can independently drive complex projects from concept to production.
Someone who stays current with AWS services, cloud-native architecture patterns, and infrastructure best practices.
A professional who brings a FinOps mindset—balancing cost, performance, and reliability in every infrastructure decision.

Essential Qualifications

10+ years in the technology field, with at least 3–5 years focused specifically on AWS infrastructure engineering, platform engineering, or cloud/site reliability operations.

Proven track record of managing and optimizing large-scale AWS environments in production, with demonstrated cost savings or performance improvements.

AWS Cloud Services & Architecture

Deep, hands-on experience with core AWS services: ECS, EC2, EKS, EBS, S3, RDS/Aurora, Lambda, CloudFront, Route 53, ALB/NLB, and VPC networking.
Demonstrated ability to design highly available, fault-tolerant, and auto-scaling architectures across multiple availability zones.
Hands-on experience with AWS cost management tools (Cost Explorer, Trusted Advisor, Compute Optimizer) and proven implementation of cost reduction strategies including rightsizing, Reserved Instances, Savings Plans, and spot instance optimization.
Working knowledge of the AWS Well-Architected Framework, particularly the Cost Optimization, Reliability, and Performance Efficiency pillars.

Container Orchestration

Experience managing and scaling Kubernetes clusters (EKS) in production, including cluster lifecycle management, node group optimization, and workload right-sizing.
Understanding of Kubernetes networking (CNI, service mesh), resource management (requests/limits, HPA/VPA), and container security best practices.

Infrastructure as Code & Automation

Strong proficiency in Terraform for building, managing, and versioning AWS infrastructure, including module authoring, remote state management, and workspace strategies.
Experience with configuration management tools (Ansible, Chef, or equivalent) for operational automation.
Proficiency in scripting languages (Python, Bash, or Go) for building infrastructure automation, tooling, and custom integrations.

CI/CD & Deployment Engineering

Hands-on experience designing and optimizing CI/CD pipelines (e.g., GitHub Actions, Jenkins, or AWS CodePipeline) for fast, safe, and repeatable deployments.
Experience implementing and managing deployment strategies such as blue/green, canary, and rolling deployments to minimize risk and downtime.

Monitoring & Observability

Experience implementing and managing monitoring, logging, and alerting solutions (CloudWatch, New Relic, Prometheus/Grafana, ELK/OpenSearch).
Demonstrated ability to define SLIs/SLOs and build dashboards and alerts that drive proactive issue detection and resolution.

Documentation & Communication

Strong capabilities in creating and maintaining comprehensive technical documentation including runbooks, architecture decision records (ADRs), and operational playbooks.
Ability to communicate complex technical concepts clearly to both engineering teams and non-technical business stakeholders.

Preferred Qualifications

AWS certifications such as Solutions Architect Professional, DevOps Engineer Professional, or SysOps Administrator.
Experience with FinOps practices, including building cost visibility dashboards and driving cost accountability across engineering teams.
Experience with GitOps workflows and tools (ArgoCD, Flux).
Familiarity with service mesh technologies or API gateway management.
Experience migrating or re-architecting legacy workloads to cloud-native patterns on AWS.

About Encora

Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services, Product Engineering & Application Modernization, Data & Analytics, Digital Experience & Design Services, DevSecOps, Cybersecurity, Quality Engineering, AI & LLM Engineering, among others.

At Encora, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.

Senior AWS Infrastructure & Platform Architect

Related Jobs

Registered Nurse - Mental Health - Outpatient

Registered Nurse - Mental Health - Outpatient

Library Technician

Technician II

Mental Health Technician-Part Time Overnight (Adult Residential Program)

System Validation Engineer (Various Levels)