Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.062922
Special Issues
Table of Content

Open Access

ARTICLE

Deepfake Detection Method Based on Spatio-Temporal Information Fusion

Xinyi Wang*, Wanru Song, Chuanyan Hao, Feng Liu
Department of Digital Media Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
* Corresponding Author: Xinyi Wang. Email: email
(This article belongs to the Special Issue: Applications of Artificial Intelligence for Information Security)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.062922

Received 31 December 2024; Accepted 25 February 2025; Published online 18 March 2025

Abstract

As Deepfake technology continues to evolve, the distinction between real and fake content becomes increasingly blurred. Most existing Deepfake video detection methods rely on single-frame facial image features, which limits their ability to capture temporal differences between frames. Current methods also exhibit limited generalization capabilities, struggling to detect content generated by unknown forgery algorithms. Moreover, the diversity and complexity of forgery techniques introduced by Artificial Intelligence Generated Content (AIGC) present significant challenges for traditional detection frameworks, which must balance high detection accuracy with robust performance. To address these challenges, we propose a novel Deepfake detection framework that combines a two-stream convolutional network with a Vision Transformer (ViT) module to enhance spatio-temporal feature representation. The ViT model extracts spatial features from the forged video, while the 3D convolutional network captures temporal features. The 3D convolution enables cross-frame feature extraction, allowing the model to detect subtle facial changes between frames. The confidence scores from both the ViT and 3D convolution submodels are fused at the decision layer, enabling the model to effectively handle unknown forgery techniques. Focusing on Deepfake videos and GAN-generated images, the proposed approach is evaluated on two widely used public face forgery datasets. Compared to existing state-of-the-art methods, it achieves higher detection accuracy and better generalization performance, offering a robust solution for deepfake detection in real-world scenarios.

Keywords

Deepfake detection; vision transformer; spatio-temporal information
  • 159

    View

  • 65

    Download

  • 0

    Like

Share Link