Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025)

Dataset Construction for Multimodal Detection of Online Gambling Advertisements

Authors
I Wayan Budi Sentana1, *, I Nyoman Gede Arya Astawa1, Junda Lu2, I Made Ari Dwi Suta Atmaja1, Ni Ketut Pradani Gayatri Sarja1, Ni Nyoman Harini Puspita1
1Information Technology Department, Politeknik Negeri Bali, Bali, Indonesia
2School of Computing, Western Sydney University, Sydney, Australia
*Corresponding author. Email: budisentana@pnb.ac.id
Corresponding Author
I Wayan Budi Sentana
Available Online 31 December 2025.
DOI
10.2991/978-94-6463-926-1_6How to use a DOI?
Keywords
Online Gambling Ad; Multimodal Dataset Type; Semantic Type Dataset; Visual Type Dataset
Abstract

This study presents the construction of a multimodal dataset designed to detect online gambling advertisement infiltrations on websites. The dataset incorporates both visual (image-based) and textual data extracted from compromised web pages. Data collection begins with a Google Engine Scraper that utilizes specialized search commands (commonly known as Google Hacking techniques) to identify URLs containing keywords frequently associated with online gambling in Bahasa Indonesia. Once identified, these URLs are processed using an automated Selenium-based module that retrieves and extracts the content of each webpage. The extracted content is then categorized into visual and textual components. The textual data is further analyzed using a large language model (LLM) via the OpenAI API to assist in the preliminary classification of gambling-related content. Final verification and labeling are performed manually to ensure accuracy. The resulting dataset comprises 600 samples—300 positively labeled as containing online gambling advertisements and 300 as non-infiltrated, forming a balanced and validated corpus for future multimodal detection model development.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025)
Series
Advances in Engineering Research
Publication Date
31 December 2025
ISBN
978-94-6463-926-1
ISSN
2352-5401
DOI
10.2991/978-94-6463-926-1_6How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - I Wayan Budi Sentana
AU  - I Nyoman Gede Arya Astawa
AU  - Junda Lu
AU  - I Made Ari Dwi Suta Atmaja
AU  - Ni Ketut Pradani Gayatri Sarja
AU  - Ni Nyoman Harini Puspita
PY  - 2025
DA  - 2025/12/31
TI  - Dataset Construction for Multimodal Detection of Online Gambling Advertisements
BT  - Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2025 (iCAST-ES 2025)
PB  - Atlantis Press
SP  - 39
EP  - 46
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-926-1_6
DO  - 10.2991/978-94-6463-926-1_6
ID  - Sentana2025
ER  -