Owkin

Data Engineer

London - Germany - Remote in UK and Germany Full Time

About us

Owkin is an AI company on a mission to solve the complexity of biology. It is building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software. At the heart of this system is Owkin K, an AI copilot and its new LLM fine-tuned on biology called Owkin Zero, used by researchers, clinicians, and drug developers to better understand biology, validate scientific hypotheses, and deliver better diagnostics and therapies faster.

Position is based in our London office or remotely in UK and Germany.

Please submit your CV in English

About the role:

You will be part of the Engineering team. This role involves designing, building, and optimizing scalable ETL/ELT pipelines with Airflow to process complex datasets efficiently while ensuring reliability and performance. You will organize and structure data systems, aligning them with business objectives, and demonstrate expertise in scientific and healthcare information systems to deliver data products tailored for machine learning and AI research. Clear reporting and meticulous attention to detail are essential, as is the ability to manage high-volume,  complex workstreams while prioritizing multiple deadlines. The role requires professional interpersonal skills to collaborate with diverse stakeholders in biotechnology and the ability to streamline production workflows for scientific processing and quality assurance.

  • Organize and structure data systems at both macro and micro levels, designing and implementing data architectures that support business goalsOptimize data pipelines for performance, reliability, and scalability
  • Design, build, and maintain scalable ETL/ELT pipelines with Airflow to process large-scale, complex datasets
  • Demonstrate ability to delivery of of  data products  useful for machine learning and AI research and development (data models, metadata and semantics)
  • Strong organizational skills to effectively manage high-volume, complex workstreams while prioritizing multiple deadline
  • Demonstrate knowledge of scientific and healthcare information systems and data sources and relevant software tools
  • Demonstrate ability to handle a variety of activities across operational delivery and development and initiatives
  • Demonstrate professional interpersonal skills with ability to work both independently and collaboratively with a variety of stakeholders on complex biotechnology areas.
  • Streamline the process of taking scientific processing and quality check in production, ensuring proper monitoring of the production workflows. 

In particular, you will:

  • Design and optimizing data pipelines using Airflow
  • Develop robust solutions in Python and SQL
  • Design, develop, and operate scalable ETL/ELT pipelines to process and transform datasets.
  • Work with cross-functional teams, including data scientists, business developers, software engineers and bio medical researchers to deliver high-quality data solutions.
  • Manage and monitor containerized data infrastructures with Docker and Kubernetes and other cloud platforms.
  • Implement and enforce best practices for data governance, security, and compliance.
  • Build, optimize and maintain data architectures, including data lakes, data warehouses, and analytical Insights 
  • Productionize the data processing pipelines, setting and enforcing standards and best practices across scientific teams to deliver high quality data in an efficient and scalable way.

 

About you

Required qualifications / experience:

  • Master degree in computer sciences or specialization in Data 
  • Significant experience (5+ years) as a Data Engineer and have good knowledge of DataOps practices.
  • Experience in Python and SQL and you have familiarity with R
  • Experience in architectural design of complex data platforms
  • Proficient in the technologies like Airflow, AWS steps functions, PostgreSQL, Docker, Kubernetes, Grafana, Infrastructure as Code
  • Autonomous, meticulous, and enjoy teamwork
  • Software development with a focus on code quality, simplicity, maintainability
  • Experience in designing data architecture and building data products
  • Experience handling sensitive personal information
  • Fluent in English

Preferred qualifications/bonus:

  • Knowledge in healthcare or biology areas
  • Data quality tools: Great expectation, pydantic, pandera, SQLMesh etc,  
  • Debugging & Refactoring skills

 

 #LI-MD1

What we offer

  • Flexible work organization 
  • Friendly and informal working environment
  • Opportunity to work with an international team with high technical and scientific backgrounds

Recruitment Process & Security

  • Please complete the form and submit your CV.
  • Owkin is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, sex, gender, sexual orientation, age, color, religion, national origin, protected veteran status or on the basis of disability.
  • Owkin is a great place to work. As a coveted workplace we are, unfortunately, vulnerable to recruitment phishing scams. We urge all job seekers and candidates to be wary of potential scams. Most of these have individuals posing as representatives of prominent companies, including Owkin, with the aim of obtaining personal, sensitive, or financial information from applicants. These scams prey upon an individual’s desire to obtain a job and can sometimes “feel” like a genuine recruitment process. Some red flags are identified below. Should you encounter a recruitment process that claims to be for Owkin but is not consistent with the below, please do not provide any personal or financial information:
  • Legitimate Owkin recruitment processes include communication with candidates through recognized professional networks, such as LinkedIn. 
  • Communication is always through an official Owkin email address (from the @owkin.com domain), over the phone or through our applicant tracking system (Greenhouse).
  • The Owkin talent team do use platforms such as LinkedIn and Job Teaser, however if you have any concern or doubt about this contact, please ask for them to send an email from @Owkin.com.
  • The Owkin talent team will not solicit personal data from candidates during the application phase including, but not limited to, date of birth, social security numbers, or bank account information;
  • Legitimate Owkin interviews may be conducted over the phone, in person, or via an approved enterprise videoconferencing service (Google Meets). They will not occur via Signal, Telegram or Messenger
  • Owkin offers of employment are based on merit and only extended once a candidate has interviewed with members of the talent and hiring team. Offers will be extended both verbally and in written format.

 

If you think that you have been a victim of fraud,