Proceedings of the 2025 4th International Conference on Mathematical Statistics and Economic Analysis (MSEA 2025)

Utilizing Large Language Models for Information Extraction from Real Estate Transactions

Authors
Yu Zhao1, *, Haoxiang Gao2, Jinghan Cao3, Shiqi Yang4
1Rotman School of Management, University of Toronto, Toronto, Canada
2ECE Department Carnegie Mellon University, Mountain View, USA
3Department of Computer Science, San Francisco State University, San Francisco, USA
4New York University, New York, NY, USA
*Corresponding author. Email: yzqr.zhao@mail.utoronto.ca
Corresponding Author
Yu Zhao
Available Online 20 February 2026.
DOI
10.2991/978-94-6463-992-6_25How to use a DOI?
Keywords
Artificial intelligence; machine learning; large language models
Abstract

Real estate sales contracts contain crucial information for property transactions, but manual data extraction can be time-consuming and error-prone. This paper explores the application of large language models, specifically transformer-based architectures, for automated information extraction from real estate contracts. We discuss challenges, techniques, and future directions in leveraging these models to improve efficiency and accuracy in real estate contract analysis. We generated synthetic contracts using the real-world transaction dataset, thereby fine-tuning the large-language model. To facilitate fine-tuning, we generated synthetic contracts based on a real-world transaction dataset. The fine-tuned models were evaluated on both information retrieval and reasoning tasks, demonstrating a 15% improvement in BERT F1-score over the LLaMA-8B baseline. Qualitative analysis further reveals that the fine-tuned model provides more concise and relevant answers, reducing verbosity and irrelevant content.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2025 4th International Conference on Mathematical Statistics and Economic Analysis (MSEA 2025)
Series
Advances in Economics, Business and Management Research
Publication Date
20 February 2026
ISBN
978-94-6463-992-6
ISSN
2352-5428
DOI
10.2991/978-94-6463-992-6_25How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Yu Zhao
AU  - Haoxiang Gao
AU  - Jinghan Cao
AU  - Shiqi Yang
PY  - 2026
DA  - 2026/02/20
TI  - Utilizing Large Language Models for Information Extraction from Real Estate Transactions
BT  - Proceedings of the 2025 4th International Conference on Mathematical Statistics and Economic Analysis (MSEA 2025)
PB  - Atlantis Press
SP  - 261
EP  - 274
SN  - 2352-5428
UR  - https://doi.org/10.2991/978-94-6463-992-6_25
DO  - 10.2991/978-94-6463-992-6_25
ID  - Zhao2026
ER  -