Nowadays, we often hear the word AI on a daily basis. The word took wings among society, however many of us may not know what AI actually is or have a firm understanding of it. In this article, we will explain AI through an interview with a key figure in the field of AI.
For the first round, we had IT journalist Hiromi Yuzuki interview Yoshitaka Ushiku of OMRON SINIC X Corporation*1. Ms. Yuzuki asked Dr. Ushiku, a researcher and leading authority on image caption generation for computer vision and natural language processing, about AI from past to the present.
*1 A strategic base that creates OMRON's vision of "near-future design."
Ubiquity Means It Is No Longer Labeled as "AI"
Hiromi Yuzuki (Yuzuki): In what kind of products are AI currently used for in our daily lives?
Yoshitaka Ushiku (Ushiku): That question sounds simple, but it's actually difficult to answer. The moment a machine can do something it could not do before, it becomes AI. For example, when you use a smartphone you can take a perfectly focused picture of a person's face; when you enter a text into a search engine words and sentences come up as candidates in search bar; when you use video streaming services, it gives recommendation such as "Do you want to watch this content?", all of these things are done by making the most of the core technology of today's AI known as "machine learning."
Nowadays, facial recognition is a standard feature of cameras and when things that machines can do become a widely common part of daily life, they cease to be called AI. The development of machine learning created the internet search, a service that we can find what we most want to know; however, such technology is called a "search engine" not AI. A smart speaker's voice recognition is also the result of machine learning that came out of the current third AI boom, now it is simply referred to as a "voice recognition engine."
When machines can do what they could not before, they are labeled as AI, and when they become commonplace, they lose the AI label. This phenomenon is called the AI effect.
During the second AI boom in the 1980s, many AI-powered home electrical appliances such as rice cookers have been produced. Those features became widespread in the course of time and are now positioned as something that gives us a bit of intelligent control. Things around us equipped with the fruit of the third AI boom's machine learning, including cameras and smart speakers, are now in the same category.
AI Perceives More than What Humans Feel
Yuzuki: We generally have an image of AI as something that responds to what we say, like Apple's Siri, Amazon's Alexa, and Google's Google Assistant, but I guess that's not all.
Ushiku: That's right. In terms of recognizing sound, such services are definitely labeled as AI. AI technique is rapidly evolving not just in the field of sound but also in perceiving what humans see, hear, touch and feel.
For instance, in online shopping, the data showing "other people also bought these products" increases automatically as people use this service. Meanings and interpretations (information about search queries others searched/what kind of products were purchased) also automatically added and generated as big data. With this background, the AI engines have been developed along with the spread of the Internet.
The research now being done is to create an AI engine from a state in which massive numbers of photos are uploaded to Facebook and Google Photos. Since the user does not label what is in the photo when they upload, data is not converted into text or symbols.
Because of that, the machine itself has to understand the meaning of something that is not accompanied by text or symbols to reach an interpretation. However, the ability of machines to understand such data on their own has showing a significant technological progress over the past ten years.
In case of automated driving, multiple sensors such as cameras and LiDAR installed in cars are used to identify the surrounding environment of the car in three dimensions. In other words, the technology for reproducing something close to what the human eye can see is still in progress. In another instance, OMRON SINIC X is doing research on how AI can intelligently understand human perception. This research involves attaching sensors that can reproduce humans' tactile senses onto robotic hands so that AI can understand the tactile sensation of the hands when grasping a thing and handling it to assemble parts.
Yuzuki: Does AI being able to cleverly understand human perception mean that machines can do nearly everything that humans can do?
Ushiku: Yes. Moreover, there are things machines can do that humans cannot. For example, humans can't see well in the dark or during torrential rain, but machines with sensors can find cars and people in the area using infrared rays. I think it would be even better if machines can do such things that humans can't.
Yoshitaka Ushiku of OMRON SINIC X Corporation
The Third AI Boom Has Arrived
Yuzuki: You mentioned the term machine learning previously, but is machine learning used for nurturing AI, or is it a one field of AI?
Ushiku: That's a good question to learn AI. In fact, the term 'AI' and the specialized field of 'machine learning' came together in the last decade. Until the current third AI boom, machine learning and AI were separate things.
Research conducted during the first AI boom from the 1950s to the 1960s concerned searching program such as mazes. In 2015, AI defeated a professional Go player in a game of Go, and this also used machine learning in addition to searching. This was the result of a development spurred by a breakthrough in search algorithms that evolved from the first AI boom.
During the first AI boom, it became possible for machines to explore mazes using intelligent algorithms by focusing exclusively on searches and to answer mathematical proof problems. At that time, it was thought that if we left the search algorithm--namely, the theoretical thinking--up to AI, it could understand everything that human intelligence could.
However, more research and development made it clear that intelligent algorithms alone were insufficient. In the second AI boom, researchers thought that if they create software that replicates what human experts do, and implement it in machines, it will be AI.
For example, let's say there is a work that a doctor collects mucous membranes from a human mouth to identify the bacteria there. First, researchers interview the doctor this work procedure. The researchers ask what type of test is done first, and if the test comes back positive, the doctor narrows down the bacteria to several candidates, and then conducts another test to narrow down the culprit still further. The final step is to ultimately determines the bacteria to be pseudomonas pneumonia.
The doctor's procedure is programmed using a programming language with "if" and "then" rules, in the manner of "if this, do this." The resulting product is called an expert system (because it responds to specific problems like an expert).
Besides doctors, various experts, such as mining experts skilled in finding coalmines and experts in efficiently harvesting farm crops, are called in to build "if-then rules." This is one of the things that characterizes the second AI boom.
Another point about the second AI boom was storing and utilizing such human knowledge in database form. At that time, there was no machine learning at all.
AI and machine learning of specialist fields merged with the advent of deep learning during the third AI boom.
Let AI Do the Same Learning Design as Humans Have Done
Yuzuki: What is deep learning as you mentioned earlier?
Ushiku: Up to this point, humans designed the information called feature vector, a data required for machine learning. For instance, if we wanted to use machine learning to recognize the type of cat shown in a photo, we had humans look at many cat pictures and think about the feature vector. These people looked at the overall shapes, colors, coats, and edge components (the border where the color changes). Engineers thought about how to extract these elements skillfully and digitalize the colors and edge components.
We had machines learn to identify a cat as a Persian breed when they were given a cat photo based on such numbers. Until now, machine learning required to learn in advance so as to automatically identify the type of cat when a specific pattern came up.
With machine learning, feature vector design is vital. In particular, the field of image recognition requires the digitalization of image data, which is time-consuming. If we were to identify apples after identifying cat, it is incredibly demanding because we have to come up with a new feature vector design. With deep learning, everything from feature vector design is automated through machine learning.
The key characteristic of deep learning is that it enables machines to automatically learn all the processes from collecting images to leaving the feature vector design up to AI and identifying the type of cat from the images.
Yuzuki: It sounds like to learn the language, machine learning is taking a lecture from a teacher using a textbook, while deep learning is going to study abroad and learn by yourself?
Ushiku: That's a clear analogy. I might use that explanation, too.
Hiromi Yuzuki, IT journalist
Dr. Ushiku explained half a century of AI research. In the next article, he will talk about the latest AI, focusing primarily on his research of "integration of visual and natural language through deep learning."
OMRON SINC X Corporation
Dr. Ushiku completed his doctorate at the Graduate School of Information Science and Technology of the University of Tokyo in 2014 and joined NTT Communication Science Laboratories. After working as a lecturer at his alma mater in 2016, Ushiku was appointed as a principal investigator at OMRON SINC X Corporation in October 2018. He has been the Chief Research Officer of Ridge-i Inc. since 2019. Dr. Ushiku primarily studies cross-media understanding by machine learning, such as image caption generation.
Ms. Yuzuki pens Apple-related articles, including tips for using iPads for work, and produces video reports on overseas tech information. She has appeared on "The World Unknown to Matsuko" as an iPhone case expert. Her YouTube channel is called Gadgetouch.