Improving Embeddings Representations for Comparing Higher Education Curricula: A Use Case in Computing

1Federal University of Rio Grande do Sul, 2Cardiff University, 3Weber State University
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Abstract

We propose an approach for comparing curricula of study programs in higher education. Pre-trained word embeddings are fine-tuned in a study program classification task, where each curriculum is represented by the names and content of its courses. By combining metric learning with a novel course-guided attention mechanism, our method obtains more accurate curriculum representations than strong baselines. Experiments on a new dataset with curricula of computing programs demonstrate the intuitive power of our approach via attention weights, topic modeling, and embeddings visualizations. We also present a use case comparing computing curricula from USA and Latin America to showcase the capabilities of our improved embeddings representations

Dataset

We collected curricula from university study programs from different countries and categorized them into five computing disciplines: Computer Science (CS), Computer Engineering (CE), Information Technology (IT), Information Science (IS), and Software Engineering (SE).

Description of the image

Approach

In order to obtain better representations of textual curricula, we propose to use pre-trained BERT embeddings that have been finetuned in a computing discipline classification task, using an approach that combines a novel course-based attention mechanism and metric learning. Figure shows an overview of our method. Course-based attention identifies the most and the least important courses following the intuition of core and elective courses, while metric learning learns boundaries to form well-defined groups.

Description of the image

Results

Quantitative Experiments. Our approach outperforms competitive baselines.

Description of the image

Embeddings Visualization. Our approach separates computing programs more clearly than BERT.

Description of the image

Attention Weights Visualization. Our approach identifies core courses per computing career.

Description of the image

BibTeX


@inproceedings{murrugarra-llerena-etal-2022-improving,
    title = "Improving Embeddings Representations for Comparing Higher Education Curricula: A Use Case in Computing",
    author = "Murrugarra-Llerena, Jeffri  and
      Alva-Manchego, Fernando  and
      Murrugarra-LLerena, Nils",
    editor = "Goldberg, Yoav  and
      Kozareva, Zornitsa  and
      Zhang, Yue",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.776/",
    doi = "10.18653/v1/2022.emnlp-main.776",
    pages = "11299--11307",
    abstract = "We propose an approach for comparing curricula of study programs in higher education. Pre-trained word embeddings are fine-tuned in a study program classification task, where each curriculum is represented by the names and content of its courses. By combining metric learning with a novel course-guided attention mechanism, our method obtains more accurate curriculum representations than strong baselines. Experiments on a new dataset with curricula of computing programs demonstrate the intuitive power of our approach via attention weights, topic modeling, and embeddings visualizations. We also present a use case comparing computing curricula from USA and Latin America to showcase the capabilities of our improved embeddings representations."}
}