Job Description:
Rakuten International oversees 7 businesses with over 4,000 employees globally. The brand is recognized for its leadership and innovation in e-commerce, digital content, advertising, entertainment and communications, bringing the joy of discovery and access to more than 1 billion members across the world. Our teams deliver on the company’s mission to delight merchants and customers through innovation, optimism, and teamwork.
Rakuten Viki is a global entertainment streaming platform that specializes in Asian content. Our platform enables millions of viewers to discover and enjoy primetime shows and movies, subtitled in over 150 languages. Headquartered in San Mateo, California, we also have offices in Singapore, Seoul, and Shanghai, ensuring a strong global presence and a deep connection to the heart of Asian entertainment. Our platform is home to a large and loyal community of fans who share a passion for Asian culture and entertainment. Join us in our mission to bridge cultures and connect the world to Asian entertainment. At Rakuten Viki, we offer a chance to be part of a global community that celebrates culture, creativity, and connection.
We are in search of a Senior Engineer, SRE to join our team and support our business growth. This role will be based in Singapore and reporting to SRE Manager.
About the SRE Team:
The Site Reliability Engineering (SRE) team at Viki builds and operates the platform that powers Viki’s large-scale, distributed systems. We develop and maintain services that power Viki's API and business intelligence, as well as make architecture changes to keep them scalable, reliable, secure, and cost‑efficient. Our scope spans Performance engineering, FinOps, Security, Reliability Engineering to CI/CD. We run our systems on GCP with GKE and our media pipeline on AWS. We also use Spinnaker, Cloudbuild, Datadog, PostgreSQL, Redis to name a few tools.
Our team has recently delivered significant cost savings by optimizing GCP infrastructure and routing traffic efficiently across multiple geographical regions. We’ve led deep-dive network security reviews, including traffic analysis and evaluating modern WAF solutions to strengthen our defences. We're also actively using AI to boost developer productivity across Engineering.
Key Responsibilities:
Work closely with developers to help build systems that abstract out infrastructure for the organization.
Instill current SRE principles into various development teams.
Drive Incident Management, Capacity Planning and SLO definition with various stakeholders.
Automate infrastructure tasks at every opportunity with Infrastructure as Code mindset.
Develop software and tools, including AI agents and tooling, to boost developer productivity, reduce SRE toil, and drive meaningful efficiency improvements.
Obsess over non functional requirements such as security, performance and reliability.
Continuously improve and scale our systems, and create clear guidelines for developers.
Embed security and FinOps practices into the development lifecycle.
Own architectural decisions, mentor junior team members and represent SRE in cross-functional planning.
Be a part of the on call roster to ensure reliability and availability of the platform.
Requirements:
Bachelor's Degree in Computer Science/Engineering or equivalent
Minimum 5 years of experience in SRE, DevOps or Platform Engineering domains.
Experience in backend development.
Passion to build scalable systems and deliver top tier services with impact worldwide.
We don't require experience in any particular technology, but you should have the ability to chew through difficult technical problems and gain insights from them.
A solid foundation in understanding of practical operating system concepts around Linux/ Unix and grasp of basic networking are essential.
Familiarity with Docker / Kubernetes or any equivalent systems is a must
Familiarity with either AWS or GCP or Azure is a must
Familiarity with IaC, CI/CD, performance engineering and Observability concepts & tools would be an added advantage.
Experience in scaling systems or working with high scale systems is a must.
Curiosity about applying AI including LLMs and agents to operational and engineering challenges.
Rakuten provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type. Rakuten considers applicants for employment without regard to race, color, religion, age, sex, national origin, disability status, genetic information, protected veteran status, sexual orientation, gender, gender identity or expression, or any other characteristic protected by federal, state, provincial or local laws.
Five Principles for Success
Our worldwide practices describe specific behaviors that make Rakuten unique and united across the world. We expect Rakuten employees to model these 5 Shugi Principles of Success.
Always improve, Always Advance - Only be satisfied with complete success - Kaizen
Passionately Professional - Take an uncompromising approach to your work and be determined to be the best
Hypothesize - Practice - Validate – Shikumika - Use the Rakuten Cycle to succeed in unknown territory
Maximize Customer Satisfaction - The greatest satisfaction for our teams is seeing their customers smile
Speed!! Speed!! Speed!! - Always be conscious of time - take charge, set clear goals, and engage your team