As part of the ImPACT Tough Robotics Challenge Program1, an initiative of the Cabinet Office of Japan, a Japanese research group has developed the first system worldwide that is able to detect acoustic signals such as voices from victims needing rescue, even when they are difficult to find or are in places cameras cannot be used. This system was developed using three technological elements: a microphone array technology2 for the “robot ears,” an interface for visualization of invisible sounds, and a microphone array that is easily connected to a drone, even in rainy weather.
“Robot audition” is a research area that was proposed to the world by Adjunct Professor Kazuhiro Nakadai of Tokyo Institute of Technology (Tokyo Tech) and Professor Hiroshi G. Okuno of Waseda University in 2000. Until then, robots had not been able to recognize voices unless a microphone was near a person’s mouth. Research to construct “robot ears” began advancing under the idea that robots, like humans, should hear sound with their own ears. The entry barrier for this research area was high since it involves a combination of signal processing, robotics, and artificial intelligence. However, vigorous activities since its proposal, including the publication of open source software, culminated in its official registration as a research area in 2014 by the IEEE Robotics and Automation Society (RAS), the largest community for robot research.
The three keys for making “robot ears” a reality are (1) sound source localization technology to estimate where sound is coming from, (2) sound source separation technology to extract the direction from which the sound originates, and (3) automatic speech recognition technology to recognize separated sounds from background noise, similar to how humans can recognize speech from across a noisy lot. The research team pursued techniques to implement these keys in real environments and in real-time. They developed the technology that, like the legendary Japanese Prince Shotoku3, could distinguish simultaneous speech from multiple people. They have, among other projects, demonstrated simultaneous meal ordering by 11 people and created a robot game show host that can handle multiple contestants answering simultaneously.
Overview of research achievements
This technology is the result of extreme audition research performed as a research challenge from the Japanese Cabinet Office initiative ImPACT Tough Robotics Challenge and led by Program Manager Satoshi Tadokoro of Tohoku University. A system that can detect voices, mobile device sounds, and other sounds from disaster victims through the background noise of a drone has been developed to assist in faster victim recovery.
Assistant Professor Taro Suzuki of Waseda University provided the high-accuracy point cloud map data, an outcome of his research on high-performance GPS. The group performing the extreme audition research, Nakadai, Okuno, and Associate Professor Makoto Kumon of Kumamoto University, were central in developing this system, the first of its kind worldwide.
This system is made up of three main technical elements. The first is the microphone array technology based on the robot audition open source software HARK (HRI-JP Audition for Robots with Kyoto University)4. HARK has been updated every year since its 2008 release, and exceeded 120,000 total downloads as of December 2017. The software was extended to support embedded use while also maintaining its noise robustness. Researchers then embedded this version of HARK on a drone to decrease its weight and take advantage of high-speed data processing. They realized that microphone array processing could be performed inside a microphone array device attached to the drone—it is not necessary to send all of the captured signals to a base station wirelessly. The total data transmission volume was dramatically reduced to less than 1/100. This made it possible to detect sound sources even through the noise generated by the drone itself.
The second element is a three-dimensional sound source location estimation technology with map display. This made it possible to construct an easily understood visual user interface out of invisible sound sources.
The final element is an all-weather microphone array consisting of 16 microphones all connected by one cable for easy installation on a drone. This makes it possible to perform a search and rescue even in adverse weather.
It is generally accepted that the survival probability is drastically reduced for victims that are not rescued within the first 72 hours after a disaster. Establishing technology for a swift search and rescue has been a pressing issue.
Most existing technologies using drones to search for disaster victims make use of cameras or similar devices. Not being able to use them when victims are difficult to find or are in areas where cameras are ineffective, such as when victims are buried or are in the dark, has been a major impediment in search and rescue operations. Since this technology detects sounds made by disaster victims, it may be able to mitigate such problems. It is expected to become promising tools for rescue teams in the near future as drones for finding victims needing rescue in disaster areas become widely available.
The research group will continue to work toward improving the system to make it even easier to use and more robust by continuing to perform demonstrations and experiments in simulated disaster conditions. One goal is to add a functionality for classifying sound source types, instead of simply detecting them, so that relevant sound sources from victims can be distinguished from irrelevant sources. Another goal is to develop the system as a package of intelligent sensors that can be connected to various types of drones.
Explanation of Technical Terms
- 1 ImPACT Tough Robotics Challenge (TRC): an R&D Program from the Japanese Cabinet Office’s Impulsing Paradigm Change through Disruptive Technologies Program.
- 2 Microphone array technology: a technology that uses a microphone array to estimate the direction of sound or to isolate and extract specific sounds; can be effective even in noisy conditions.
- 3 Prince Shotoku: a member of the imperial family of Japan in the seventh century. Legend has it that when ten people vying for him to hear their petitions all talked at once, he understood all the words uttered by each person and was able to give an appropriate reply to each.
- 4 HARK: the abbreviated name for Honda Research Institute Japan Audition for Robots with Kyoto University. It is open source software for robot audition developed by Honda Research Institute Japan Co., Ltd. (HRI-JP) and Kyoto University. “Hark” is a Middle English word for “listen.”
- Impulsing Paradigm Change through Disruptive Technologies Program
(ImPACT) by the Cabinet Office http://www.jst.go.jp/impact/
- Program Manager: Satoshi Tadokoro
- R&D Program: Tough Robotics Challenge
- R&D Challenge: Search and identification of sound sources using microphone arrays installed on UAV(R&D Manager: Kazuhiro Nakadai, Research period: 2014–2018)
- * Corresponding researcher’s mail : email@example.com
Tokyo Institute of Technology stands at the forefront of research and higher education as the leading university for science and technology in Japan. Tokyo Tech researchers excel in a variety of fields, such as material science, biology, computer science and physics. Founded in 1881, Tokyo Tech has grown to host 10,000 undergraduate and graduate students who become principled leaders of their fields and some of the most sought-after scientists and engineers at top companies. Embodying the Japanese philosophy of “monotsukuri,” meaning technical ingenuity and innovation, the Tokyo Tech community strives to make significant contributions to society through high-impact research.
About Kumamoto University
Kumamoto University is a globally active research university with roots in local communities. We are one of the oldest universities in Japan and now have nearly 8000 undergraduate students and 1300 graduate students, including 500 international students from 49 countries. We were recently selected by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) for three projects; the Program for Promoting the Enhancement of Research Universities, the Top Global University Project and the Center of Community Project. Consequently, we have increased international exchange and collaboration programs with top universities from around the world. We strive to contribute to the harmonious coexistence of humans and the environment with sustainable societal development.
About Waseda University
Waseda University is a leading private, non-profit institution of higher education based in central Tokyo, with over 50,000 students in 13 undergraduate and 20 graduate schools. Founded in 1882, Waseda cherishes three guiding principles: academic independence, practical innovation and the education of enlightened citizens. Established to mold future leaders, Waseda continues to fulfill this mission, counting among its alumni seven prime ministers and countless other politicians, business leaders, journalists, diplomats, scholars, scientists, actors, writers, athletes and artists.
Waseda is number one in Japan in international activities, including number of incoming and outgoing study abroad students, with the broadest range of degree programs taught fully in English, and exchange partnerships with over 600 top institutions in 84 countries.
Japan Science & Technology Agency (JST), an advanced network-based research institute that promotes the state-of-the-art R&D projects, will boldly lead the way for co-creation of innovation for tomorrow’s world together with society.