About the Opportunity

POSITION SUMMARY

You will develop, implement, and validate AI scoring systems that evaluate student transcripts using research-validated rubrics. This role combines natural language processing, prompt engineering, and educational assessment methodology to create scalable evaluation tools. You will work closely with the research team to ensure AI systems achieve acceptable agreement with human expert ratings.

ESSENTIAL QUALIFICATIONS

Graduate student or professional in Computer Science, Data Science, Computational Linguistics, or related field
Strong Python programming skills (pandas, scikit-learn, numpy)
Experience with large language model APIs (OpenAI GPT-4, Claude, or similar)
Demonstrated ability to design and implement NLP pipelines
Understanding of evaluation metrics (precision, recall, F1, correlation, agreement statistics)
Experience with version control (Git) and collaborative coding practices

HIGHLY DESIRED

Experience with prompt engineering and LLM optimization
Background in natural language processing or computational linguistics
Familiarity with educational assessment or psychometrics
Knowledge of inter-annotator agreement statistics (Cohen's kappa, Krippendorff's alpha)
Experience with retrieval-augmented generation (RAG) systems
Understanding of bias detection and fairness metrics in AI systems

RESPONSIBILITIES

Design AI scoring architecture combining rule-based and LLM components
Develop prompt engineering strategies for rubric-aligned evaluation
Implement quantitative indicator scoring (frequency counts, behavioral thresholds)
Create LLM-based qualitative judgment systems for complex indicators
Build data processing pipelines for transcript preparation and analysis
Compare AI ratings against human expert ground truth (n=200+ transcripts)
Calculate agreement statistics (correlation, Cohen's kappa, confusion matrices)
Identify systematic rating discrepancies and refine prompts/algorithms
Implement confidence scoring to flag uncertain AI judgments
Conduct bias audits across demographic subgroups (gender, race/ethnicity)
Deploy validated scoring system for 300+ student transcripts
Create technical documentation for scoring methodology
Develop user-facing explanations of AI ratings for educators

TECHNICAL COMPONENTS

Prompt engineering for 4-level rubric criteria interpretation
Evidence extraction and citation from transcript text
Confidence scoring for rating certainty
Batch processing optimization for cost efficiency

Position Type

Temporary

Additional Information

Northeastern University considers factors such as candidate work experience, education and skills when extending an offer.

Northeastern has a comprehensive benefits package for benefit eligible employees. This includes medical, vision, dental, paid time off, tuition assistance, wellness & life, retirement- as well as commuting & transportation. Visit https://hr.northeastern.edu/benefits/ for more information.

All qualified applicants are encouraged to apply and will receive consideration for employment without regard to race, religion, color, national origin, age, sex, sexual orientation, disability status, or any other characteristic protected by applicable law.

This job is for a current or anticipated job vacancy.

Pay Rate:

$25/hour

AI/ML Engineer for Assessment Systems

POSITION SUMMARY

ESSENTIAL QUALIFICATIONS

HIGHLY DESIRED

RESPONSIBILITIES

TECHNICAL COMPONENTS

Related Jobs

Technical Lead Software Engineer (Principle Engineer)ZE

Analista de Branding Júnior

System Administrator I Onsite (Detroit, MI)

Electrical Engineer (PCB Design)

(Senior/Principal) Product Manager, DEX Market

(Senior/Principal) Product Manager, DEX Market