Multimodal Deepfake Detection using Multi-Scale Transformers – A Detailed Review

S. K. Vishal; S. Kanmani

doi:10.2991/978-94-6239-616-6_39

<Previous Article In Volume

Next Article In Volume>

Multimodal Deepfake Detection using Multi-Scale Transformers – A Detailed Review

Authors

S. K. Vishal¹^{, *}, S. Kanmani²^{, *}

¹Department of Information Technology, Puducherry Technological University, Puducherry, India

²Department of Information Technology, Puducherry Technological University, Puducherry, India

^*Corresponding author. Email: skvishal1008@gmail.com

^*Corresponding author. Email: kanmani@ptuniv.edu.in

Corresponding Authors

S. K. Vishal, S. Kanmani

Available Online 31 March 2026.

DOI: 10.2991/978-94-6239-616-6_39 How to use a DOI?
Keywords: Multimodal Deepfake Detection; Multi-Scale Transformers; Cross-Modal Analysis; Synthetic Media; Audio-Visual Forensics
Abstract: This paper reviews recent progress in multimodal deepfake detection with an emphasis on multi-scale transformer architectures. It examines the challenges of detecting manipulations across both visual and audio modalities, focusing on cross-modal inconsistencies and synchronization issues. Approaches such as multi-scale attention, hybrid CNN-transformer models, and multimodal fusion are analyzed. Benchmark datasets including Face Forensics + +, Celeb-DF, DFDC, and FakeAVCeleb, are discussed for training and evaluation. The study highlights the limitations of CNN-based methods while demonstrating the advantages of transformers in capturing spatial, temporal, and auditory cues. Finally, it outlines future directions for robust and scalable deepfake detection.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 31 March 2026
ISBN: 978-94-6239-616-6
ISSN: 1951-6851
DOI: 10.2991/978-94-6239-616-6_39 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - S. K. Vishal
AU  - S. Kanmani
PY  - 2026
DA  - 2026/03/31
TI  - Multimodal Deepfake Detection using Multi-Scale Transformers – A Detailed Review
BT  - Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
PB  - Atlantis Press
SP  - 518
EP  - 530
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-616-6_39
DO  - 10.2991/978-94-6239-616-6_39
ID  - Vishal2026
ER  -

download .riscopy to clipboard