Proceedings of the International Conference on Marching Beyond the Libraries (ICMBL): Leadership, Creativity, and Innovation (ICMBL 2024)

Automated Class Numbers Prediction for Books: an AI/ML Based Approach Using Annif

Authors
Soumik Kerketta1, *, Parthasarathi Mukhopadhyay2
1Junior Research Fellow (JRF), Department of Library and Information Science, University of Kalyani, Kalyani, Nadia, 741235, West Bengal, India
2Professor, Department of Library and Information Science, University of Kalyani, Kalyani, Nadia, 741235, West Bengal, India
*Corresponding author. Email: soumik.kerketta.bdy@gmail.com
Corresponding Author
Soumik Kerketta
Available Online 15 May 2025.
DOI
10.2991/978-94-6463-712-0_12How to use a DOI?
Keywords
Annif; Automated indexing; DDC; F1@5; Library classification; Neural network; NDCG
Abstract

In this research study as reported here, we endeavor to explore the possibilities of an AI/ML-based automated indexing system for the vast collections in a library. Library classification systems are considered pre-coordinated indexing approaches a while ago. Various machine learning techniques are applying to synthesizing classification numbers. A recently popular technique involves using a supervised learning algorithm to train a model on a set of documents that have been manually indexed/classified by their corresponding annotations using standardized terminology by trained library professionals’ experts using controlled vocabularies. The trained model learns patterns from the reference data and then predict the subject and class number for new documents. In the preliminary phase, we gathered a substantial collected around 2 lacks MARC-21 formatted bibliographic records where Tag 082 (DDC Call Number), Tag 245 (Title of Document), Tag 520 (Summary Note), and Tag 650 (Subject Descriptors) are contained in the datasets. After that processed this data using the data wrangling software named OpenRefine. Then dataset was subsequently divided into three sections: (i) a training dataset, (ii) a validation dataset and (ii) a test dataset. Here We usedAnnif, an open-source AI environment to analyze the dataset using the Dewey Decimal Classification (DDC) Scheme. Training Annif involved utilizing a substantial set of bibliographic records, based on the MARC-21 tags mentioned previously. In the next stage, the framework was trained using a various of backend algorithms, such asOmikuji, fastText, SVC (associative group), and simple and neural network (ensemble)based on neural network model. In order to assess the effectiveness of these models, all of these machine learning backends were finally compared using two crucial retrieval metrics: F1@5 and NDCG. When it comes to automated class number building, we have discovered that the neural network model outperforms rather than all other backends. This overall framework based on open-source software, an open dataset, and open standards.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Marching Beyond the Libraries (ICMBL): Leadership, Creativity, and Innovation (ICMBL 2024)
Series
Advances in Economics, Business and Management Research
Publication Date
15 May 2025
ISBN
978-94-6463-712-0
ISSN
2352-5428
DOI
10.2991/978-94-6463-712-0_12How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Soumik Kerketta
AU  - Parthasarathi Mukhopadhyay
PY  - 2025
DA  - 2025/05/15
TI  - Automated Class Numbers Prediction for Books: an AI/ML Based Approach Using Annif
BT  - Proceedings of the International Conference on Marching Beyond the Libraries (ICMBL): Leadership, Creativity, and Innovation (ICMBL 2024)
PB  - Atlantis Press
SP  - 140
EP  - 147
SN  - 2352-5428
UR  - https://doi.org/10.2991/978-94-6463-712-0_12
DO  - 10.2991/978-94-6463-712-0_12
ID  - Kerketta2025
ER  -