「Making sense of AI-generated speech: challenges and opportunities for phonetics and speech science research」
日時 | 2025年10月27日(月)16:00~17:15 |
---|---|
開催方法 | ①対面 *11号館4階第3会議室にお越しください。 ②Zoom*お申込み完了の自動返信メールにて、参加用URLをお知らせいたします。 |
対象 | 学生・教職員・一般 |
講演者 | James Tanner 氏 (Research Associate, University of Glasgow)https://waseda.box.com/s/ewre42iszgni3qj47povzbskc2v6t0sl |
要旨 | Recent years have seen the substantial increase in the use and adoption of Artificial Intelligence (AI) across many spheres of daily life. For speech, AI models are capable of producing fluent and natural speech of a given speaker with only limited input data (‘voice cloning’), raising a wide range of risks for security, privacy, and personal identity. In spite of these risks, however, there remains little scientific understanding of the properties of AI-generated speech, including how AI speech models learn linguistic information from the speech signal and the extent to which AI-generated speech patterns similarly or differently to human speech. Decades of research within linguistics and phonetics, however, has uncovered how speakers manipulate patterns of acoustic variability to signal linguistic structure, social indexicality, and speaker-specific properties, and so provides a unique perspective with which to explain the behaviours of AI speech models. In this talk, I will explore how methodologies from phonetics and speech science research help provide a window into understanding the properties of AI-generated speech, and how these properties may systematically differ from human speech. By examining patterns of acoustic variability in AI-generated English and Japanese stops in both the same language (e.g. Japanese-Japanese) and opposite language of the target speaker (e.g., Japanese-English), it is found that AI-generated speech both differs from the expected human speech patterns and exhibits preference for the language-specific phonetic implementation of stops when synthesising a speaker into the opposite language. These initial findings demonstrate the potential of acoustic-phonetic approaches to the study of AI-generated speech and point the way towards phonetically-motivated and interpretable methods for AI speech detection. |
世話人 | 篠原 靖明(早稲田大学商学学術院 准教授) |
参加申し込み方法 | 参加はこちらからお申込みください。※10月23日(木)17:00締切 |
共催 | 早稲田大学商学部・産業経営研究所・森田彰分科会・早稲田大学総合研究機構ことばの科学研究所 |
後援 | 日本音声学会 |