|
Anti-spoofing using ResNet50 with linear discriminant analysis for automatic speaker verification |
|---|---|
| รหัสดีโอไอ | |
| Title | Anti-spoofing using ResNet50 with linear discriminant analysis for automatic speaker verification |
| Creator | Peemapot Uparakool |
| Contributor | Waree Kongprawechnon, Advisor |
| Publisher | Thammasat University |
| Publication Year | 2568 |
| Keyword | Anti-spoofing, Automatic speaker verification, Linear discriminant analysis, Principal component analysis, ResNet50 |
| Abstract | Deep-learning-based models have shown significant potential in speech spoof detection, which is crucial to ensuring the authenticity of speech signals. This work aims to expand the knowledge about deep learning-based spoof detection by integrating ResNet50 with linear discriminant analysis (LDA) to reduce the dimensionality. Using the logical access (LA) subset from the ASVspoof 2019 dataset, we generated mel-spectrogram and gammatone spectrogram representations of the speech signals. ResNet50 was used to extract deep features from these spectrograms, and subsequently LDA was applied to reduce feature dimensionality and improve classification accuracy. Our method significantly outperformed the baseline ResNet50 model by reducing the equal error rate (EER) by 43.55% and increasing balanced accuracy by 48.59% for duplicated mel-spectrogram tensor, 8.95% and 15.52% for differentiated mel-spectrogram tensor, and 44.14% and 44.77% for differentiated gammatone spectrogram tensor, respectively. These results demonstrate the effectiveness of combining ResNet50 with gammatone spectrograms and LDA, providing a more robust solution for audio spoof detection.To further investigate our approach, we extended the evaluation by applying traditional classifiers such as Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes (NB) on the deep features extracted by ResNet50 and reduced by LDA or PCA. Among all combinations, the LDA-reduced features paired with Naïve Bayes classifier achieved the best result, reaching 88.18% balanced accuracy and 2.80% EER. These findings confirm that our proposed framework not only improves spoof detection performance under a threshold-based scheme but is also compatible with various machine learning classifiers, making it a flexible and effective solution for audio spoof detection tasks. |