Automated Class Numbers Prediction for Books: an AI/ML Based Approach Using Annif
- DOI
- 10.2991/978-94-6463-712-0_12How to use a DOI?
- Keywords
- Annif; Automated indexing; DDC; F1@5; Library classification; Neural network; NDCG
- Abstract
In this research study as reported here, we endeavor to explore the possibilities of an AI/ML-based automated indexing system for the vast collections in a library. Library classification systems are considered pre-coordinated indexing approaches a while ago. Various machine learning techniques are applying to synthesizing classification numbers. A recently popular technique involves using a supervised learning algorithm to train a model on a set of documents that have been manually indexed/classified by their corresponding annotations using standardized terminology by trained library professionals’ experts using controlled vocabularies. The trained model learns patterns from the reference data and then predict the subject and class number for new documents. In the preliminary phase, we gathered a substantial collected around 2 lacks MARC-21 formatted bibliographic records where Tag 082 (DDC Call Number), Tag 245 (Title of Document), Tag 520 (Summary Note), and Tag 650 (Subject Descriptors) are contained in the datasets. After that processed this data using the data wrangling software named OpenRefine. Then dataset was subsequently divided into three sections: (i) a training dataset, (ii) a validation dataset and (ii) a test dataset. Here We usedAnnif, an open-source AI environment to analyze the dataset using the Dewey Decimal Classification (DDC) Scheme. Training Annif involved utilizing a substantial set of bibliographic records, based on the MARC-21 tags mentioned previously. In the next stage, the framework was trained using a various of backend algorithms, such asOmikuji, fastText, SVC (associative group), and simple and neural network (ensemble)based on neural network model. In order to assess the effectiveness of these models, all of these machine learning backends were finally compared using two crucial retrieval metrics: F1@5 and NDCG. When it comes to automated class number building, we have discovered that the neural network model outperforms rather than all other backends. This overall framework based on open-source software, an open dataset, and open standards.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Soumik Kerketta AU - Parthasarathi Mukhopadhyay PY - 2025 DA - 2025/05/15 TI - Automated Class Numbers Prediction for Books: an AI/ML Based Approach Using Annif BT - Proceedings of the International Conference on Marching Beyond the Libraries (ICMBL): Leadership, Creativity, and Innovation (ICMBL 2024) PB - Atlantis Press SP - 140 EP - 147 SN - 2352-5428 UR - https://doi.org/10.2991/978-94-6463-712-0_12 DO - 10.2991/978-94-6463-712-0_12 ID - Kerketta2025 ER -