The self-taught machine that beat the world Go champion

Do you always learn from your experiences? Google has designed AI that learned to play the Chinese board game Go by making mistakes — and it beat the world champion earlier this month.

“Damn machines!”

That’s what one of the members of my local Go club said on Wednesday evening, the day after professional Go player Lee Sedol first lost to a computer. The feeling was shared among many of the others. Just a few months ago, we all believed that it would take years, maybe decades, for a computer program to play Go at a professional level. And in just a few weeks, we humans had lost our title as masters of Go as first the European champion Fan Hui, then the world champion Lee Sedol were soundly defeated by Google DeepMind’s Go-playing AI, AlphaGo.

It was a bit of a shock, to put it mildly.

Go, the ancient Chinese board game, has been around for thousands of years. Go is played on a 19x19 board where black and white stones can be placed at intersections to try to gain, and then control territory. Computers on the other hand have only been able to play the game at a competitive amateur level for a little over a decade. Games like chess were conquered easily by machines through brute-force methods because the total number of possible moves is more limited. Chess has around 10⁴⁰ possible board positions — which is no small feat — but small enough that a computer can just calculate most of them.

Men playing Go in Shanghai. Currently about 60 million people play Go worldwide. J.A.G.A/Flickr (CC BY-NC-ND 2.0) — *Men playing Go in Shanghai. Currently about 60 million people play Go worldwide.* J.A.G.A/Flickr (CC BY-NC-ND 2.0)

Go, on the other hand, is much more complex, with about 10¹⁷⁰ possible board positions. That’s more than the number of atoms in the universe. It’s about a googol times as many as chess which makes it impossible for a computer to calculate all the combinations directly. Computer scientists needed to develop new techniques to teach computers Go. “We needed to invent a way to teach computers intuition so that they can play Go on the highest level,” said computer scientist Petr Baudis. Baudis is the creator of Pachi, another Go program that plays at a strong amateur level.

To teach intuition, AlphaGo developers used a technique called reinforcement learning. This technique involved pitting slightly different versions of AlphaGo against one another millions of times in a kind of endless knockout match. The strongest version would play match after match until it lost, at which point the new strongest version would take its place. As Baudis puts it, “classic data mining and machine learning is all about learning from data, but AlphaGo used the technique of reinforcement learning to learn from experience instead.” AlphaGo could teach itself the game, which made for much faster development and a much stronger AI.

AlphaGo first made waves in late January, when developer DeepMind announced that it had defeated reigning European champion Fan Hui 5-0 in an even match. Most experts believed it would take years for AlphaGo to defeat a professional player without a handicap. “About six to ten,” said Baudis. “Most people believed that some more fundamental changes than just adding neural networks would be required to start beating pros.” When DeepMind announced AlphaGo’s victory in January, it was seen as a massive leap forward for reinforcement learning.

Lee Sedol starts the first of five games against AlphaGo on March 9th. Game One: AlphaGo vs. Lee Sedol (March 9)/Google Press — *Lee Sedol starts the first of five games against AlphaGo on March 9th.* Game One: AlphaGo vs. Lee Sedol (March 9)/Google Press

Reinforcement learning has been used before, for instance to teach this helicopter to perform aerial maneuvers or teach a neural network to beat this level of Super Mario World. But AlphaGo represents perhaps the most significant achievement of reinforcement learning to date. It’s also one of the most far-reaching, because machine learning experts often use Go as a sandbox to test their more cutting-edge ideas. Go has both huge complexity and straightforward victory conditions, so it’s often easier to develop techniques with Go and later adapt them for use in real-world applications.

There are many future technologies that could be explored with reinforcement learning, such as medical diagnosis and self-driving cars. Any problems that require some form of decision making and are too complex to brute-force data mining, stand to benefit from the solutions worked out by the AlphaGo team. “Our hope is that in the future we can apply these techniques to other challenges — from instant translation to smartphone assistants to advances in health care,” said DeepMind CEO Demis Hassabis in a press release. In that sense, AlphaGo is a huge victory for humanity.

DeepMind’s work isn’t done yet either. Google’s reinforcement learning approach requires that AlphaGo play tens of millions of games with itself in order to gain professional strength, which requires an immense amount of computing power. “It is enough for humans to play a small fraction [of] games to gain similar experience, can we make computers do the same thing?” asked Baudis. If a computer can learn after only playing a few thousand games, it could save a great deal of time and money for others using reinforcement learning approaches.

As for the future of Go, opinions are a bit mixed. “Professional Go tournaments [will] not be as popular as before,” said Myungwan Kim, one of the professional commentators of the AlphaGo match. “They became a minor league.” When there are machines that can play Go at a much higher level, who would waste their time watching humans? Personally, I take a more optimistic view. I believe that humans can learn from and improve with AI, and Go will become a much more interesting game in the future.

There’s an question that Go players sometimes like to ask each other. The question is: “How much of a head start would a top professional need to win a game against a perfect player?” A perfect player would always play the best possible move in every situation, with no mistakes. It’s an interesting question, but of course since nobody’s ever tried it it’s remained more a philosophical question than a scientific one. But with the advent of ultra-strong AI that can play better than any human, an answer may not be far off. It’s not difficult to imagine an even stronger version of AlphaGo, with a few more years of development, playing a perfect game.

Many people have given their thoughts on this question over the years. The consensus seems to be that a top pro would need a head start of about three moves to beat a perfect player. I can’t wait to see if they’re right.

Edited by Tessa Evans and supported by Natalie Bedini