The Application of Generative Artificial Intelligence Technology in Voice Conversion
รหัสดีโอไอ
Creator Anon Bangsan
Title The Application of Generative Artificial Intelligence Technology in Voice Conversion
Contributor Payap Sirinam
Publisher Navaminda Kasatriyadhiraj Royal Air Force Academy
Publication Year 2568
Journal Title NKRAFA Journal of Science and Technology
Journal Vol. 22
Journal No. 2
Page no. 135-157
Keyword Generative Artificial Intelligence, Generative Adversarial Network, Voice Conversion, Cyber Warfare
URL Website https://ph02.tci-thaijo.org/index.php/nkrafa-sct
Website title NKRAFA Journal of Science and Technology
ISSN 3057-0913
Abstract This research aims to 1) explore the appropriate application of artificial intelligence (AI) technology for voice spoofing, 2) develop a generative AI-based voice spoofing model and investigate optimization strategies to enhance its suitability for cyber domain applications, 3) evaluate the performance and deception potential of synthetic voices generated by the model, and 4) propose practical applications of generative AI technology in offensive cyber operations.The findings indicated that MaskCycleGAN-VC was a highly effective generative artificial intelligence model suitable for voice spoofing in the Thai language. This model could generate synthetic voices that closely resembled the original in terms of naturalness, including rhythm, intonation, and emotional expression. A key feature of the model was its ability to be developed and trained within just one day, using only moderate computational resources. The synthetic voices generated by the model could deceive listeners into believing they were genuine voices with an accuracy of up to 56%, while genuine voices were misclassified as synthetic in up to 59% of cases. This highlighted the challenges of distinguishing between genuine and synthetic voices in noisy environments. Performance metrics included a Mean Opinion Score (MOS) score for naturalness of up to 3.9 and similarity of up to 4.2, with a minimum Mel Cepstral Distortion (MCD) of 5 dB and Kernel Deep Speech Distance (KDSD) of 15.9 mKDSD. This model demonstrated significant potential for applications in security and offensive cyber operations, including support for intelligence activities, confusion in emergency scenarios, and simulated training exercises. However, its usage should be approached with caution to prevent misuse in unethical contexts.
Navaminda Kasatriyadhiraj Royal Air Force Academy

บรรณานุกรม

EndNote

APA

Chicago

MLA

ดิจิตอลไฟล์

Digital File
DOI Smart-Search
สวัสดีค่ะ ยินดีให้บริการสอบถาม และสืบค้นข้อมูลตัวระบุวัตถุดิจิทัล (ดีโอไอ) สำนักการวิจัยแห่งชาติ (วช.) ค่ะ