Proceedings of the Conference on Social and Sustainable Innovation in Technology & Engineering (SASI-ITE 2025)

Benchmarking Data Science Prowess in LLMs: A Holistic Evaluation Framework

Authors
Mithilesh Reddy Maddi1, *
1University of Colorado Denver, Denver, CO, 80204, USA
*Corresponding author. Email: mithul24reddy@gmail.com
Corresponding Author
Mithilesh Reddy Maddi
Available Online 31 December 2025.
DOI
10.2991/978-94-6463-940-7_13How to use a DOI?
Keywords
Large Language Models; Data Science Evaluation; Benchmarking; Task-Function-Code; Frequency Analysis
Abstract

This paper introduces a new benchmarking framework, “DataBench360,” created to evaluate the abilities of Large Language Models (LLMs) in solving practical data science problems. Unlike earlier benchmarks that focus only on narrow or simplified measures, DataBench360 uses a structured process to build reliable ground truths, check outputs against clear validation rules, and measure performance across six key data science areas. The framework applies a simple Task-Function- Code (TFC) break- down that makes evaluations more transparent and reproducible. Testing 23 models, both API-based and open- source, shows clear differences in performance, with API-based systems handling complex tasks more effectively. The novelty of this work lies in providing a broader, multi- dimensional evaluation compared to existing single-metric benchmarks, making DataBench360 a valuable tool for advancing AI in real-world data science applications.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the Conference on Social and Sustainable Innovation in Technology & Engineering (SASI-ITE 2025)
Series
Advances in Intelligent Systems Research
Publication Date
31 December 2025
ISBN
978-94-6463-940-7
ISSN
1951-6851
DOI
10.2991/978-94-6463-940-7_13How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Mithilesh Reddy Maddi
PY  - 2025
DA  - 2025/12/31
TI  - Benchmarking Data Science Prowess in LLMs: A Holistic Evaluation Framework
BT  - Proceedings of the Conference on Social and Sustainable Innovation in Technology & Engineering (SASI-ITE 2025)
PB  - Atlantis Press
SP  - 174
EP  - 181
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-940-7_13
DO  - 10.2991/978-94-6463-940-7_13
ID  - Maddi2025
ER  -