Proceedings of the International Conference on Intelligent Information Systems Design and Indian Knowledge System Applications (ICISDIKSA 2026)

Demystifying the Databricks Ecosystem: An Industry-Oriented Guide to Lakehouse Architecture

Authors
Saikiran Gogineni1, *, Yuvaraju Chinnam1, Kanaka Durga Returi2, Vaka Murali Mohan3, G. Suryanarayana4
1Department of Computer Science and Engineering, Malla Reddy (MR) Deemed to Be University, Hyderabad, India
2Department of Computer Science and Engineering, Malla Reddy Vishwavidyapeeth (Deemed to Be University), Hyderabad, India
3Department of Computer Science and Engineering, Malla Reddy College of Engineering and Technology, Hyderabad, India
4Department of Computer Science and Engineering, Symbiosis Institute of Technology, Hydera-bad Campus, Symbiosis International (Deemed University), Pune, India
*Corresponding author. Email: goginenisaikiran31677@gmail.com
Corresponding Author
Saikiran Gogineni
Available Online 29 December 2025.
DOI
10.2991/978-94-6463-976-6_5How to use a DOI?
Keywords
Databricks; Lakehouse; Delta Lake; Data Engineering; PySpark; Unity Catalog; MLflow; Azure Synapse; Data Factory; Power BI; Real-time Analytics; Cloud Data Platforms; Fraud Detection; Performance Benchmarking; Industry Use Case
Abstract

The exponential growth in data has created a need for platforms capable of storing both structured & unstructured data, effectively processing the data, analyzing, and creating machine learning models. Traditional data lakes and warehouses often lack the flexibility and performance to provide these capabilities. So, the new Lakehouse paradigm is introduced. Databricks is an implementation of the Lakehouse paradigm that is cloud-native and built upon Apache Spark. However, there is a lack of substantial academic work describing the ecosystem. This paper presents a comprehensive description of the Databricks ecosystem, showing it both as an architecture and as a platform already in use. We will go through the main components of Databricks architecture, including Delta Lake, Unity Catalog, cluster management, Azure integrations, and discuss their roles in creating secure, scalable, and cost-effective data engineering workflows. We also present an experiment that is designed to bridge the gap between theory and practice by demonstrating the ingestion of data, feature engineering, and basic fraud detection. We utilized a synthetic dataset of financial transactions. The experimental procedure and metrics such as query latency, processing throughput, speed, and performance results are described in detail offering a reproducible benchmark for evaluating similar workloads in Databricks. This work is designed to function as a technical resource for individuals in industry, data engineers, and researchers who are interested in working with the Databricks environment for large-scale analytics.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Intelligent Information Systems Design and Indian Knowledge System Applications (ICISDIKSA 2026)
Series
Advances in Intelligent Systems Research
Publication Date
29 December 2025
ISBN
978-94-6463-976-6
ISSN
1951-6851
DOI
10.2991/978-94-6463-976-6_5How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Saikiran Gogineni
AU  - Yuvaraju Chinnam
AU  - Kanaka Durga Returi
AU  - Vaka Murali Mohan
AU  - G. Suryanarayana
PY  - 2025
DA  - 2025/12/29
TI  - Demystifying the Databricks Ecosystem: An Industry-Oriented Guide to Lakehouse Architecture
BT  - Proceedings of the International Conference on Intelligent Information Systems Design and Indian Knowledge System Applications (ICISDIKSA 2026)
PB  - Atlantis Press
SP  - 67
EP  - 81
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-976-6_5
DO  - 10.2991/978-94-6463-976-6_5
ID  - Gogineni2025
ER  -