The Compensation Range is the span between the minimum and maximum base salary for a position. The midpoint of the range is approximately halfway between the minimum and the maximum and represents an employee that possesses full job knowledge, qualifications and experience for the position. In the normal course, employees will be hired, transferred or promoted between the minimum and midpoint of the salary range for a job.
Note: Applications will be accepted until 11:59 PM on the Posting End Date.
Job End Date
March 31, 2027
At UBC, we believe that attracting and sustaining a diverse workforce is key to the successful pursuit of excellence in research, innovation, and learning for all faculty, staff and students. Our commitment to employment equity helps achieve inclusion and fairness, brings rich diversity to UBC as a workplace, and creates the necessary conditions for a rewarding career.
Job Summary
The Research Methodologist for Data Science will be a key member of the research team in the Regulatory Science Lab at the University of British Columbia’s (UBC) Academy of Translational Medicine (ATM). They will contribute to ongoing and emerging research activities augmenting health data and data systems to enable rapid evidence generation for innovative health products. The Research Methodologist will support data preparation and application of natural language processing (NLP) and other machine learning (ML) methods, including training and deploying large language models (LLMs) for semi-structured electronic health records (EHRs). They will also undertake benchmark implementation and statistical analysis for performance evaluation. Other responsibilities include maintaining reproducible code repositories, supporting grant applications, writing reports and manuscripts, presenting results to multidisciplinary audiences and other forms of knowledge exchange. The working environment will require both independent research and working as a core member of project teams, with opportunities to engage with interdisciplinary researchers, health care stakeholders, and policy makers.
This is a full time 1-year term position, with the possibility of extension.
Organizational Status
The Research Methodologist for Data Science works independently and reports directly to Dr. Dean Regier on technical and project management aspects. The incumbent will interact and work with other faculty, research staff, and graduate students within the Regulatory Science Lab and the ATM, and external collaborators.
Housed within the Faculty of Medicine, the School of Population and Public Health (SPPH) is an innovative unit that encompasses many of the health-related groupings at UBC as a collaborative venture. The School is structured around four divisions: Occupational and Environmental Health; Health Services and Policy; Epidemiology, Biostatistics and Public Health Practice; and Health in Populations. The resulting mix of professions and disciplines is seen as a means of connecting individuals and learners to galvanize the relationship between health research, public health and health services and to enhance learning.
The ATM at the University of British Columbia (UBC) is established by the Faculty of Medicine. The ATM is a nucleus for translational and regulatory science research, education and training, and innovation. The ATM is a strategic imperative for the Faculty of Medicine and has a mandate to collaborate with other units in the Faculty of Medicine, the wider University, and beyond to accelerate the translational medicine continuum. The ATM will drive impactful medical and policy research to create new knowledge to improve health outcomes and benefit society. The ATM will be a nucleus for the development of new academic and training programs, creating a cutting-edge ecosystem in which top educators and researchers train the next generation of health innovators. The ATM has supported the recruitment and establishment of over 36 new faculty.
Work Performed
Develop, program, test, validate, and report on ML pipelines, including LLMs, and other machine learning models for heterogeneous health data extraction and augmentation to support downstream research questions.
Develop and implement benchmarks and continual model performance monitoring systems across various data sources and scenarios while assessing safety and risks.
Review internal databases and code repositories and ensure compliance with ethics, privacy, and data security requirements.
Conduct bias audits to ensure model predictions are fair across all relevant data sources.
Stay up-to-date with reviewing and potentially integrating new approaches and techniques from emerging scientific literature on language models, optimization, and evaluation techniques.
Define the scope and data elements required for machine learning projects.
Document data and methodologies of projects, including on data sources, training protocols, and compliance with data safety and ethics requirements, and ensure project deliverables are met.
Develop scripts and data processing pipelines using R or Python, leveraging advanced expertise for data processing, model development, and data visualization for both model training and deployment.
Work with senior staff members to develop and modify models, and mentor junior staff and students, as needed.
Develop, write, coordinate, and edit scientific manuscripts, reports and other knowledge translation products, including design, compilation, synthesis, dissemination and evaluation.
Contribute to development of new research proposals and other knowledge translation activities.
Consequence of Error/Judgement
The incumbent is given wide latitude for exercising independent initiative and judgment in performing specialized duties and responsibilities. A lack of judgment could harm the research team and partner organizations' research and funding. The incumbent will interact with multiple researchers across various organizations to address their data needs and research findings, and discretion is vital.
Consequences of inappropriate judgment exercised by position include:
1. Loss of funding opportunities and collaborative partnerships.
2. Compromising the quality of research findings.
3. Missed project deadlines.
4. Damage to the reputation of any or all of the investigators and/or their affiliated organizations.
Failure to maintain a high degree of attention to detail could negatively affect the accuracy of research findings.
This position requires employees to work under strict confidentiality requirements; internal procedures and policies to protect personal information must be followed and adherence to these requirements will be regularly reviewed by the employer.
Supervision Received
The incumbent will be able to work independently with minimal supervision and regularly report to the Regulatory Science Lab Principal Investigator. The incumbent will also receive support from research collaborators as needed. Performance will be reviewed periodically based on the quality and timeliness of work.
Supervision Given
The incumbent may assist the Regulatory Science Lab Principal Investigator with the supervision and mentoring of junior research trainees, such as summer students.
Minimum Qualifications
Post-graduate degree in Statistics. Minimum of three years of related experience in research analysis, or the equivalent combination of education and experience.
- Willingness to respect diverse perspectives, including perspectives in conflict with one’s own
- Demonstrates a commitment to enhancing one’s own awareness, knowledge, and skills related to equity, diversity, and inclusion
Preferred Qualifications
Graduate degree with demonstrated educational background in machine learning or language models.
Minimum of two years of related experience applying popular machine learning/NLP libraries (pytorch, langchain, ollama, transformers, pytorch-lightning, etc.) for specialized tasks, or the equivalent combination of education and experience.
Minimum of two years of experience with relational databases (polars, pandas, SQL, PostgreSQL, etc.)
Proficiency in git branch-merge collaboration, as demonstrated through portfolios or contributions to open-access projects.
Graduate degree in Computer Sciences, Statistics, Biostatistics, Health Economics, Epidemiology, Public Health, Statistics, or other relevant fields (preferred emphasis on health research).
Experience developing and applying deep learning research to novel applications
Demonstrated competence in deep learning training frameworks (pytorch, tensorflow, transformers, accelerate, pytorch-lightning)
Experience working with EHR datasets (MIMIC-IV, eICU, Amsterdam UMC DB, etc.).
Experience with LLM deployment frameworks (transformers, langchain, ollama, etc.)
Ability to reproduce and translate academic papers as code
Familiarity with linux-based high performance computing environments (Slurm, virtual environment management, resource allocation, SSH)
Experience working with and interpreting multimodel medical/health-related data in a machine learning context. Experience with pre-processing and analyzing complex, large scale, potentially incomplete data from varying sources.
Experience implementing LLM performance improvement techniques like prompt-engineering, Retrieval-Augmented Generation (RAG), PEFT, SFT, or instruction-tuning for unstructured data applications using local deployment.
Detailed knowledge of study design, data collection, and analysis.
Prior TCSP-II or CITI certification
Knowledge of differential privacy frameworks (opendp, DP-SGD via Opacus, private-transformers)
Publication record in relevant conferences and venues (NeurIPS, ICML, ICLR, AAAI, AISTATS, CHIL, MLHC, MICCAI, NAACL, EMNLP, KDD, FAccT, Nature Digital Health, Lancet Digital Medicine, etc)
Knowledge of software engineering best practices (linting, git hooks, vulnerability scanning, docker/podman)
Experience with health research projects in an academic setting and familiarity with writing, editing, and reviewing reports and journal manuscripts.
Excellent communication skills including the ability to build and maintain effective working relationships both internally and externally,
Ability to exercise tact, discretion, initiative, confidentiality and judgment.
Demonstrated analytical and problem-solving skills including the ability to comprehend complex issues and related data/information and present information in concise and meaningful ways.
Ability to maintain accuracy and attention to detail.
Demonstrated organizational skills to generate work plans to meet project deadlines within timeline, scope and budget.
Ability to work effectively independently and in a diverse team environment.
Ability to handle multiple concurrent tasks.