AlphaZero AI system able to teach itself how to play games, play at highest levels

**AlphaZero AI system able to teach itself how to play games and then play at highest levels
Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess) as well as Go. Credit: DeepMind Technologies Ltd

A team of researchers with the DeepMind group and University College, both in the U.K., has developed an AI system capable of teaching itself how to play and master three difficult board games. In their paper published in the journal Science, the group describes their new system and explain why they believe it represents another big step forward in AI systems development. Murray Campbell with the T.J Watson Research Center in the U.S. offers a Perspective piece on the work done by the team in the same journal issue.

It has been over 20 years since a supercomputer known as Deep Blue beat world chess champion Gary Kasparov, showing the world just how far AI computing had come. In the years since, computers have grown ever smarter and now beat humans at such games as chess, shogi and Go. But such systems have all been tweaked to make them really good at just one . In this new effort, the researchers have created an AI system that is not only good at more than one game, but gains such expertise on its own.

The new system, called AlphaZero, is a reinforcement learning system, which, as its name implies, means it learns by repeatedly playing a game and learning from its experiences. This is, of course, very similar to how humans learn. A basic set of rules is laid out and then the computer plays the game—with itself. It does not even need to play with other partners. It plays itself repeatedly, noting which plays constitute good moves and thus winning, and which constitute bad moves and losing. Over time, it improves. Eventually, it becomes so good it can beat not just humans, but other dedicated board game AI systems. The system also used a search method known as the Monte Carlo tree search. Combining the two technologies allows the system to teach itself how to get better at game playing. The researchers gave their test system a lot of power, as well, by employing 5000 tensor processing units, which puts it on a par with large supercomputers.

**AlphaZero AI system able to teach itself how to play games and then play at highest levels
Tournament evaluation of AlphaZero in chess, shogi, and Go, as games won, drawn or lost from AlphaZero’s perspective, in matches against Stockfish, Elmo, and AlphaGo Zero (AG0) that was trained for three days. Credit: DeepMind Technologies Ltd

Thus far, AlphaZero has mastered , shogi and Go—games that are particularly well suited to AI applications. Campbell suggests the next step for such systems might be to branch out into games such as poker, or even popular video games.

In chess, AlphaZero first outperformed Stockfish after just 4 hours; in shogi, AlphaZero first outperformed Elmo after 2 hours; and in Go, AlphaZero first outperformed the version of AlphaGo that beat the legendary player Lee Sedol in 2016 after 30 hours. Note: each training step represents 4,096 board positions. Credit: DeepMind Technologies Ltd
**AlphaZero AI system able to teach itself how to play games and then play at highest levels
AlphaZero searches only a small fraction of the positions considered by traditional chess engines. Credit: DeepMind Technologies Ltd

Explore further

AlphaZero algorithm can pick up victory moves in chess

More information: David Silver et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science (2018). DOI: 10.1126/science.aar6404
Journal information: Science

Provided by Science X Network

© 2018 Science X Network

Citation: AlphaZero AI system able to teach itself how to play games, play at highest levels (2018, December 7) retrieved 17 October 2019 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors

User comments

Dec 07, 2018
This is, of course, very similar to how humans learn.

It makes random iterations of the games to find the winning strategies, computing the equivalent of thousands of years of human playing time to randomly walk around the problem space.

The Monte Carlo search engine is also about using probabilities - throwing a random number to pick which one of the previously identified winning strategies to use instead of evaluating all of them. That allows a speed-up in searching, which allows the designers to increase the database size to include millions of pre-computed games more in the computer's memory.

This isn't at all like how humans learn or operate. This is a blind robot chicken pecking really fast until it finds most of the seeds.

We need a new term to say, "Doing a dumb thing fast enough that nobody notices you're doing it.". The term "Artificial Intelligence" gives the wrong impressions about it.

Dec 07, 2018
Many strong chess programs can be beaten by contemporary openings that post date their programming. For example, I've beaten some with the "Grand Prix" opening when it was first developed. So, it's probably more accurate to say these are programs that play like us.

When the "AI" starts developing its own openings, then that will be something to call intelligent.

Dec 07, 2018
Several games, chess included are considered solved because every possible move and reaction can be computed. Go has more possible board states, but was considered solved some time ago as well by that Google AI. Brute force goes a long way.

Dec 08, 2018
I would love to see a time when such a system can actually win money playing poker.

Dec 08, 2018
When the "AI" starts developing its own openings, then that will be something to call intelligent.

AlphaZero did develop all the standard chess openings itself. It started with nothing but the rules.
It quickly developed the French Defence and it was its most used opening in the first 2h of playing itself.
It then found the Caro-Kann which became it's most used for the next while. Later it discovered the English Opening and the Queens Gambit. The English Opening and the Queens Gambit became it's most used (favourite, ha, ha) openings.
It does have an 'alien' style of play.
See some of the games here:

Dec 08, 2018
You nailed it!
It would be a huge disappointment for most people to realize that virtually all higher forms (nth degree) of AI are just brute-force number-crunching algorithms; Nothing much more elaborate than a Commodore 64 could have performed, given the processing power. (Including the 'breakthrough' Kasparov/Deep Blue match)
...albeit the approach is clever disguise, and may well pave the way for much refinement in actual AI, as inelegant as it is beneath the surface.
Thoughts anyone?

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more