Physiological-physical feature fusion for automatic voice spoofing detection
Biometric speech recognition systems are often subject to various spoofing attacks, the most common of which are speech synthesis and speech conversion attacks. These spoofing attacks can cause the biometric speech recognition system to incorrectly accept these spoofing attacks, which can compromise the security of this system. Researchers have made many efforts to address this problem. But existing voice spoofing detection methods only consider the physical features of speech, resulting in poor detection performance.
To solve the problem, a research team led by Junxiao Xue published their new research on March 27, 2023 in Frontiers of Computer Science.
The team proposed a voice spoofing detection method based on physiological-physical feature fusion. The method included a feature extractor, a densely connected convolutional neural network with squeeze and excitation blocks (SE-DenseNet), and a feature fusion strategy. Compared to existing methods, the tandem decision cost function and equal error rate scores improved by 5% and 7% respectively.
Specifically, physiological features in the audio were first extracted from a pre-trained convolutional network. SE-DenseNet was then used to extract the physical features. Such a densely connected model had high parametric efficiency and squeeze and excitation blocks enhanced the efficiency of feature transmission. Finally, the two features were integrated into the classification network for voice spoofing detection.
They compared the proposed model with some of the best single systems. The experiments showed that their proposed model performs better on both EER and t-DCF. To validate the effectiveness of the face features, they also evaluated the performance of some baseline models that introduced face features. It was found that different baseline methods showed different degrees of performance improvement when combined with the face features, proving that the face features are practicable for the baseline models.
Future work may attempt to extract more accurate face features and study more effective feature fusion strategies to detect spoofing attacks.
More information: Junxiao Xue et al, Physiological-physical feature fusion for automatic voice spoofing detection, Frontiers of Computer Science (2022). DOI: 10.1007/s11704-022-2121-6