Northeastern

AI/ML Engineer for Assessment Systems

Boston, MA (Main Campus) Part time

About the Opportunity

POSITION SUMMARY

You will develop, implement, and validate AI scoring systems that evaluate student transcripts using research-validated rubrics. This role combines natural language processing, prompt engineering, and educational assessment methodology to create scalable evaluation tools. You will work closely with the research team to ensure AI systems achieve acceptable agreement with human expert ratings.

ESSENTIAL QUALIFICATIONS

  • Graduate student or professional in Computer Science, Data Science, Computational Linguistics, or related field
  • Strong Python programming skills (pandas, scikit-learn, numpy)
  • Experience with large language model APIs (OpenAI GPT-4, Claude, or similar)
  • Demonstrated ability to design and implement NLP pipelines
  • Understanding of evaluation metrics (precision, recall, F1, correlation, agreement statistics)
  • Experience with version control (Git) and collaborative coding practices

HIGHLY DESIRED

  • Experience with prompt engineering and LLM optimization
  • Background in natural language processing or computational linguistics
  • Familiarity with educational assessment or psychometrics
  • Knowledge of inter-annotator agreement statistics (Cohen's kappa, Krippendorff's alpha)
  • Experience with retrieval-augmented generation (RAG) systems
  • Understanding of bias detection and fairness metrics in AI systems

RESPONSIBILITIES

  • Design AI scoring architecture combining rule-based and LLM components
  • Develop prompt engineering strategies for rubric-aligned evaluation
  • Implement quantitative indicator scoring (frequency counts, behavioral thresholds)
  • Create LLM-based qualitative judgment systems for complex indicators
  • Build data processing pipelines for transcript preparation and analysis
  • Compare AI ratings against human expert ground truth (n=200+ transcripts)
  • Calculate agreement statistics (correlation, Cohen's kappa, confusion matrices)
  • Identify systematic rating discrepancies and refine prompts/algorithms
  • Implement confidence scoring to flag uncertain AI judgments
  • Conduct bias audits across demographic subgroups (gender, race/ethnicity)
  • Deploy validated scoring system for 300+ student transcripts
  • Create technical documentation for scoring methodology
  • Develop user-facing explanations of AI ratings for educators

TECHNICAL COMPONENTS

  • Prompt engineering for 4-level rubric criteria interpretation
  • Evidence extraction and citation from transcript text
  • Confidence scoring for rating certainty
  • Batch processing optimization for cost efficiency

Position Type

Temporary

Additional Information

Northeastern University considers factors such as candidate work experience, education and skills when extending an offer.  

Northeastern has a comprehensive benefits package for benefit eligible employees. This includes medical, vision, dental, paid time off, tuition assistance, wellness & life, retirement- as well as commuting & transportation. Visit https://hr.northeastern.edu/benefits/ for more information.  

All qualified applicants are encouraged to apply and will receive consideration for employment without regard to race, religion, color, national origin, age, sex, sexual orientation, disability status, or any other characteristic protected by applicable law.

This job is for a current or anticipated job vacancy.

Pay Rate:

$25/hour