Automated Class Numbers Prediction for Books: an AI/ML Based Approach Using Annif

Soumik Kerketta; Parthasarathi Mukhopadhyay

doi:10.2991/978-94-6463-712-0_12

<Previous Article In Volume

Next Article In Volume>

Automated Class Numbers Prediction for Books: an AI/ML Based Approach Using Annif

Authors

Soumik Kerketta¹^{, *}, Parthasarathi Mukhopadhyay²

¹Junior Research Fellow (JRF), Department of Library and Information Science, University of Kalyani, Kalyani, Nadia, 741235, West Bengal, India

²Professor, Department of Library and Information Science, University of Kalyani, Kalyani, Nadia, 741235, West Bengal, India

^*Corresponding author. Email: soumik.kerketta.bdy@gmail.com

Corresponding Author

Soumik Kerketta

Available Online 15 May 2025.

DOI: 10.2991/978-94-6463-712-0_12 How to use a DOI?
Keywords: Annif; Automated indexing; DDC; F1@5; Library classification; Neural network; NDCG
Abstract: In this research study as reported here, we endeavor to explore the possibilities of an AI/ML-based automated indexing system for the vast collections in a library. Library classification systems are considered pre-coordinated indexing approaches a while ago. Various machine learning techniques are applying to synthesizing classification numbers. A recently popular technique involves using a supervised learning algorithm to train a model on a set of documents that have been manually indexed/classified by their corresponding annotations using standardized terminology by trained library professionals’ experts using controlled vocabularies. The trained model learns patterns from the reference data and then predict the subject and class number for new documents. In the preliminary phase, we gathered a substantial collected around 2 lacks MARC-21 formatted bibliographic records where Tag 082 (DDC Call Number), Tag 245 (Title of Document), Tag 520 (Summary Note), and Tag 650 (Subject Descriptors) are contained in the datasets. After that processed this data using the data wrangling software named OpenRefine. Then dataset was subsequently divided into three sections: (i) a training dataset, (ii) a validation dataset and (ii) a test dataset. Here We usedAnnif, an open-source AI environment to analyze the dataset using the Dewey Decimal Classification (DDC) Scheme. Training Annif involved utilizing a substantial set of bibliographic records, based on the MARC-21 tags mentioned previously. In the next stage, the framework was trained using a various of backend algorithms, such asOmikuji, fastText, SVC (associative group), and simple and neural network (ensemble)based on neural network model. In order to assess the effectiveness of these models, all of these machine learning backends were finally compared using two crucial retrieval metrics: F1@5 and NDCG. When it comes to automated class number building, we have discovered that the neural network model outperforms rather than all other backends. This overall framework based on open-source software, an open dataset, and open standards.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Marching Beyond the Libraries (ICMBL): Leadership, Creativity, and Innovation (ICMBL 2024)
Series: Advances in Economics, Business and Management Research
Publication Date: 15 May 2025
ISBN: 978-94-6463-712-0
ISSN: 2352-5428
DOI: 10.2991/978-94-6463-712-0_12 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Soumik Kerketta
AU  - Parthasarathi Mukhopadhyay
PY  - 2025
DA  - 2025/05/15
TI  - Automated Class Numbers Prediction for Books: an AI/ML Based Approach Using Annif
BT  - Proceedings of the International Conference on Marching Beyond the Libraries (ICMBL): Leadership, Creativity, and Innovation (ICMBL 2024)
PB  - Atlantis Press
SP  - 140
EP  - 147
SN  - 2352-5428
UR  - https://doi.org/10.2991/978-94-6463-712-0_12
DO  - 10.2991/978-94-6463-712-0_12
ID  - Kerketta2025
ER  -

download .riscopy to clipboard