Extracting and Classifying Beneficiary Information from CSR Reports Using Rule-Based NER and Similarity-Based Approach
- DOI
- 10.2991/978-94-6463-926-1_72How to use a DOI?
- Keywords
- cosine similarity; Corporate Social Responsibility; information extraction; named entity recognition; text classification
- Abstract
Extracting structured beneficiary information from Corporate Social Responsibility (CSR) reports remains a practical challenge when key data is embedded in narratives shared through platforms such as WhatsApp. Reliable reporting is essential to measure program reach and demonstrate impact. Yet inconsistent formats and increasing volumes often require significant manual effort. This study presents an automated system that combines rule-based Named Entity Recognition (NER) with cosine similarity-based classification to extract and organize beneficiary data from Indonesian CSR articles. The system identifies five core entities—datetime, program, branch, city, and beneficiary—and classifies each article into one of four CSR pillars using TF-IDF-weighted keyword vectors Experimental results show that the NER achieved high accuracy, with Branch (F1 = 0.99), City (F1 = 0.96), and Beneficiary (F1 = 0.95) performing strongest, while Program and Datetime also scored above 0.92. Classification reached 94.9% overall accuracy, with robust results in Education, Health, and Environment, and weaker performance in Economy (F1 = 0.73) due to sparse data and conceptual overlap with Education. The extracted information is compiled into a structured tabular format aligned with internal reporting standards at PT United Tractors Tbk., reducing manual effort and improving consistency. Although developed for a specific organizational context, the approach is transparent, low-cost, and adaptable to other semi-structured reporting environments.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Rifda Qurrotul ‘Ain AU - Entin Martiana Kusumaningtyas AU - Ali Ridho Barakbah AU - Renovita Edelani PY - 2025 DA - 2025/12/31 TI - Extracting and Classifying Beneficiary Information from CSR Reports Using Rule-Based NER and Similarity-Based Approach BT - Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025) PB - Atlantis Press SP - 640 EP - 648 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6463-926-1_72 DO - 10.2991/978-94-6463-926-1_72 ID - ‘Ain2025 ER -