Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025)

Extracting and Classifying Beneficiary Information from CSR Reports Using Rule-Based NER and Similarity-Based Approach

Authors
Rifda Qurrotul ‘Ain1, *, Entin Martiana Kusumaningtyas1, Ali Ridho Barakbah1, Renovita Edelani1
1Electronic Engineering Polytechnic Institute of Surabaya, Surabaya, Indonesia
*Corresponding author. Email: rifda.q.a@gmail.com
Corresponding Author
Rifda Qurrotul ‘Ain
Available Online 31 December 2025.
DOI
10.2991/978-94-6463-926-1_72How to use a DOI?
Keywords
cosine similarity; Corporate Social Responsibility; information extraction; named entity recognition; text classification
Abstract

Extracting structured beneficiary information from Corporate Social Responsibility (CSR) reports remains a practical challenge when key data is embedded in narratives shared through platforms such as WhatsApp. Reliable reporting is essential to measure program reach and demonstrate impact. Yet inconsistent formats and increasing volumes often require significant manual effort. This study presents an automated system that combines rule-based Named Entity Recognition (NER) with cosine similarity-based classification to extract and organize beneficiary data from Indonesian CSR articles. The system identifies five core entities—datetime, program, branch, city, and beneficiary—and classifies each article into one of four CSR pillars using TF-IDF-weighted keyword vectors Experimental results show that the NER achieved high accuracy, with Branch (F1 = 0.99), City (F1 = 0.96), and Beneficiary (F1 = 0.95) performing strongest, while Program and Datetime also scored above 0.92. Classification reached 94.9% overall accuracy, with robust results in Education, Health, and Environment, and weaker performance in Economy (F1 = 0.73) due to sparse data and conceptual overlap with Education. The extracted information is compiled into a structured tabular format aligned with internal reporting standards at PT United Tractors Tbk., reducing manual effort and improving consistency. Although developed for a specific organizational context, the approach is transparent, low-cost, and adaptable to other semi-structured reporting environments.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025)
Series
Advances in Engineering Research
Publication Date
31 December 2025
ISBN
978-94-6463-926-1
ISSN
2352-5401
DOI
10.2991/978-94-6463-926-1_72How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Rifda Qurrotul ‘Ain
AU  - Entin Martiana Kusumaningtyas
AU  - Ali Ridho Barakbah
AU  - Renovita Edelani
PY  - 2025
DA  - 2025/12/31
TI  - Extracting and Classifying Beneficiary Information from CSR Reports Using Rule-Based NER and Similarity-Based Approach
BT  - Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025)
PB  - Atlantis Press
SP  - 640
EP  - 648
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-926-1_72
DO  - 10.2991/978-94-6463-926-1_72
ID  - ‘Ain2025
ER  -