Improving face presentation attack detection using deep learning and generative data augmentation

Jarred Orfao

As technology improves, criminals find new ways to gain unauthorised access. Accordingly, face spoofing has become more prevalent in face recognition systems, with attackers gaining illegal access using simple, non-intrusive presentation attacks, such as replaying a video containing the victim’s face. With social media making it easy to obtain images and videos without raising suspicion, we must detect these presentation attacks to prevent attackers from causing harm. Traditional face antispoofing methods used humanengineered features, and due to their limited representation capacity, these features created a gap which deep learning has filled in recent years. However, these deep learning methods still need further improvements, especially for presentation attack detection in the wild. In this study, we use generative models as a data augmentation strategy to improve the face antispoofing performance of a vision transformer. Furthermore, we propose an unsupervised keyframe selection process to remove near-duplicate frames and increase the variation among the samples. More specifically, we trained StyleGAN3-R models for each attack vector and used them to generate candidate samples. We implemented two generative data augmentation approaches: one trained on all the available frames (GAN3) and the other trained with only the keyframes (KFGAN3). We used traditional data augmentation methods to generate candidate samples to compare our generative approach. We preserved each candidate sample’s label by only using the following geometric transformations: random horizontal flips, rotations (within 15 degrees) and enlargements (within 20%). We selected a ViT-B/32 Vision Transformer, pre-trained on the ImageNet dataset, as our baseline face antispoofing model. We constructed our face antispoofing pipelines and distinguished them according to the candidate samples used in each data augmentation approach. We conducted our experiments on the Spoof in the Wild (SiW) dataset and CASIA Face Antispoofing Database (CASIA-FASD) using the following data augmentation percentages: 5%, 10%, 20%, and 30%. Our GAN3 approach performed the best on SiW protocol 2, achieving an Average Classification Error Rate (ACER) of 3.29%, and our KFGAN3 approach performed the best on protocol 3, achieving an ACER of 7.37%. As for CASIA-FASD, our GAN3 approach achieved the best Equal Error Rate (EER) of 1.72%, and our KFGAN3 achieved the best ACER of 1.34%. We conducted an ablation study using dependent frame analysis to classify each video. Our KFGAN3 approach achieved an ACER of 0% on both SiW protocols, using a window size of 15 frames. Furthermore, our GAN3 approach achieved an ACER of 1.11% on CASIA-FASD protocol 7, using a window size of 7 frames. Accordingly, we achieved the state-of-the-art-performance on both datasets in terms of ACER. We found that the keyframes were essential for improving the performance of unknown presentation attack detection. Our results suggest that GAN-based data augmentation is an effective method for enhancing face antispoofing performance, especially when the models are trained using keyframes.

Improving face presentation attack detection using deep learning and generative data augmentation

Abstract

Files and links (1)

Metrics

Details