Improving Embeddings Representations for Comparing Higher Education Curricula: A Use Case in Computing

¹Federal University of Rio Grande do Sul, ²Cardiff University, ³Weber State University

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Abstract

We propose an approach for comparing curricula of study programs in higher education. Pre-trained word embeddings are fine-tuned in a study program classification task, where each curriculum is represented by the names and content of its courses. By combining metric learning with a novel course-guided attention mechanism, our method obtains more accurate curriculum representations than strong baselines. Experiments on a new dataset with curricula of computing programs demonstrate the intuitive power of our approach via attention weights, topic modeling, and embeddings visualizations. We also present a use case comparing computing curricula from USA and Latin America to showcase the capabilities of our improved embeddings representations

Dataset

We collected curricula from university study programs from different countries and categorized them into five computing disciplines: Computer Science (CS), Computer Engineering (CE), Information Technology (IT), Information Science (IS), and Software Engineering (SE).

Approach

In order to obtain better representations of textual curricula, we propose to use pre-trained BERT embeddings that have been finetuned in a computing discipline classification task, using an approach that combines a novel course-based attention mechanism and metric learning. Figure shows an overview of our method. Course-based attention identifies the most and the least important courses following the intuition of core and elective courses, while metric learning learns boundaries to form well-defined groups.

Results

Quantitative Experiments. Our approach outperforms competitive baselines.

Embeddings Visualization. Our approach separates computing programs more clearly than BERT.

Attention Weights Visualization. Our approach identifies core courses per computing career.

BibTeX

@inproceedings{murrugarra-llerena-etal-2022-improving, title = "Improving Embeddings Representations for Comparing Higher Education Curricula: A Use Case in Computing", author = "Murrugarra-Llerena, Jeffri and Alva-Manchego, Fernando and Murrugarra-LLerena, Nils", editor = "Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-main.776/", doi = "10.18653/v1/2022.emnlp-main.776", pages = "11299--11307", abstract = "We propose an approach for comparing curricula of study programs in higher education. Pre-trained word embeddings are fine-tuned in a study program classification task, where each curriculum is represented by the names and content of its courses. By combining metric learning with a novel course-guided attention mechanism, our method obtains more accurate curriculum representations than strong baselines. Experiments on a new dataset with curricula of computing programs demonstrate the intuitive power of our approach via attention weights, topic modeling, and embeddings visualizations. We also present a use case comparing computing curricula from USA and Latin America to showcase the capabilities of our improved embeddings representations."} }