December 8, 2017 weblog
AlphaZero algorithm can pick up victory moves in chess
The BBC said that details published on arXiv stated that algorithm AlphaZero was able to outperform Stockfish only 4 hours after being given the rules of chess and being told to learn by playing simulations against itself.
The team paper describing the work is on arXiv. They reported that software had been generalized and was able to learn other games.
The authors wrote about the AlphaZero algorithm achieving, "tabula rasa, superhuman performance in many challenging domains," not just in chess. With no knowledge other than game rules, the algorithm achieved in 24 hours what the authors said was "a superhuman" level of play in chess, shogi (Japanese chess) and Go, "and convincingly defeated a world-champion program in each case."
"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" is the title of the paper, which was submitted December 5.
What's remarkable, though, goes beyond the chess win alone. James Vincent in The Verge found the true remarkable feat was that "in less than 24 hours, the same computer program was able to teach itself how to play three complex board games at superhuman levels. That's a new feat for the world of AI." [Go, chess, shogi.]
University of Oxford's Prof. Michael Wooldridge was quoted by the BBC. "The general trajectory in DeepMind seems to be to solve a problem and then demonstrate it can really ramp up performance, and that's very impressive." At the same time, Wooldridge observed that the three games were fairly "closed" in the sense they had limited sets of rules to contend with. "In the real world we don't know what is round the corner," he explained. "Coping when you don't know what is coming is much more complicated, and things will get even more exciting when DeepMind moves on to more open problems."
AlphaZero wasn't specifically designed to play chess. James Vincent in The Verge: "In each case, it was given some basic rules (like how knights move in chess, and so on) but was programmed with no other strategies or tactics. It simply got better by playing itself over and over again at an accelerated pace—a method of training AI known as "reinforcement learning.""
The authors said that the AlphaZero algorithm was "a more generic version" of the AlphaGo Zero algorithm that introduced in the context of Go. "It replaces the handcrafted knowledge and domainspecific augmentations used in traditional game-playing programs with deep neural networks and a tabula rasa reinforcement learning algorithm."
The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
© 2017 Tech Xplore