We propose an approach for comparing curricula of study programs in higher education. Pre-trained word embeddings are fine-tuned in a study program classification task, where each curriculum is represented by the names and content of its courses. By combining metric learning with a novel course-guided attention mechanism, our method obtains more accurate curriculum representations than strong baselines. Experiments on a new dataset with curricula of computing programs demonstrate the intuitive power of our approach via attention weights, topic modeling, and embeddings visualizations. We also present a use case comparing computing curricula from USA and Latin America to showcase the capabilities of our improved embeddings representations
We collected curricula from university study programs from different countries and categorized them into five computing disciplines: Computer Science (CS), Computer Engineering (CE), Information Technology (IT), Information Science (IS), and Software Engineering (SE).
In order to obtain better representations of textual curricula, we propose to use pre-trained BERT embeddings that have been finetuned in a computing discipline classification task, using an approach that combines a novel course-based attention mechanism and metric learning. Figure shows an overview of our method. Course-based attention identifies the most and the least important courses following the intuition of core and elective courses, while metric learning learns boundaries to form well-defined groups.
Quantitative Experiments. Our approach outperforms competitive baselines.
Embeddings Visualization. Our approach separates computing programs more clearly than BERT.
Attention Weights Visualization. Our approach identifies core courses per computing career.
@inproceedings{murrugarra-llerena-etal-2022-improving,
title = "Improving Embeddings Representations for Comparing Higher Education Curricula: A Use Case in Computing",
author = "Murrugarra-Llerena, Jeffri and
Alva-Manchego, Fernando and
Murrugarra-LLerena, Nils",
editor = "Goldberg, Yoav and
Kozareva, Zornitsa and
Zhang, Yue",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.776/",
doi = "10.18653/v1/2022.emnlp-main.776",
pages = "11299--11307",
abstract = "We propose an approach for comparing curricula of study programs in higher education. Pre-trained word embeddings are fine-tuned in a study program classification task, where each curriculum is represented by the names and content of its courses. By combining metric learning with a novel course-guided attention mechanism, our method obtains more accurate curriculum representations than strong baselines. Experiments on a new dataset with curricula of computing programs demonstrate the intuitive power of our approach via attention weights, topic modeling, and embeddings visualizations. We also present a use case comparing computing curricula from USA and Latin America to showcase the capabilities of our improved embeddings representations."}
}