DeepCube: A new deep reinforcement learning approach solves the Rubik’s cube with no human help

Humans have been excellent players in most of the gameplays be it indoor or outdoors. However, over the recent years we have been increasingly coming across machines that are playing and winning popular board games Go and Chess against humans using machine learning algorithms. If you think machines are only good at solving the black and whites, you are wrong. The recent achievement of a machine trying to solve a complex game (a Rubik’s cube) is DeepCube.

Rubik cube is a challenging piece of puzzle that’s captivated everyone since childhood. Solving it is a brag-worthy accomplishment for most adults. A group of UC Irvine researchers have now developed a new algorithm (used by DeepCube) known as Autodidactic Iteration, which can solve a Rubik’s cube with no human assistance.

The Erno Rubik’s cube conundrum

Rubik’s cube, a popular three-dimensional puzzle was developed by Erno Rubik in the year 1974. Rubik worked for a month to figure out the first algorithm to solve the cube. Researchers at the UC Irvine state that “Since then, the Rubik’s Cube has gained worldwide popularity and many human-oriented algorithms for solving it have been discovered. These algorithms are simple to memorize and teach humans how to solve the cube in a structured, step-by-step manner.”

After the cube became popular among mathematicians and computer scientists, questions around how to solve the cube with least possible turns became mainstream. In 2014, it was proved that the least number of steps to solve the cube puzzle was 26.

More recently, computer scientists have tried to find ways for machines to solve the Rubik’s cube. As a first step, they tried and tested ways to use the same successful approach tried in the games Go and Chess. However, this approach did not work well for the Rubik’s cube.

The approach: Rubik vs Chess and Go

Algorithms used in Go and Chess are fed with rules of the game and then they play against themselves. The deep learning machine here is rewarded based on its performance at every step it takes. Reward process is considered as important as it helps the machine to distinguish between a good and a bad move. Following this, the machine starts playing well i.e it learns how to play well.

On the other hand, the rewards in the case of Rubik’s cube are nearly hard to determine. This is because there are random turns in the cube and it is hard to judge whether the new configuration is any closer to a solution. The random turns can be unlimited and hence earning an end-state reward is very rare.

Both Chess and Go have a large search space but each move can be evaluated and rewarded accordingly. This isn’t the case for Rubik’s cube!

UC Irvine researchers have found a way for machines to create its own set of rewards in the Autodidactic Iteration method for DeepCube.

Autodidactic Iteration: Solving the Rubik’s Cube without human Knowledge

DeepCube’s Autodidactic Iteration (ADI) is a form of deep learning known as deep reinforcement learning (DRL). It combines classic reinforcement learning, deep learning, and Monte Carlo Tree Search (MCTS).

When DeepCube gets an unsolved cube, it decides whether the specific move is an improvement on the existing configuration. To do this, it must be able to evaluate the move.

The algorithm, Autodidactic iteration starts with the finished cube and works backwards to find a configuration that is similar to the proposed move. Although this process is imperfect, deep learning helps the system figure out which moves are generally better than others.

Researchers trained a network using ADI for 2,000,000 iterations. They further reported, “The network witnessed approximately 8 billion cubes, including repeats, and it trained for a period of 44 hours. Our training machine was a 32-core Intel Xeon E5-2620 server with three NVIDIA Titan XP GPUs.”

After training, the network uses a standard search tree to hunt for suggested moves for each configuration.

The researchers in their paper said, “Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves — less than or equal to solvers that employ human domain knowledge.”

Researchers also wrote, “DeepCube is able to teach itself how to reason in order to solve a complex environment with only one reward state using pure reinforcement learning.” Furthermore, this approach will have a potential to provide approximate solutions to a broad class of combinatorial optimization problems.

To explore Deep Reinforcement Learning check out our latest releases, Hands-On Reinforcement Learning with Python and Deep Reinforcement Learning Hands-On.

How greedy algorithms work

Creating a reference generator for a job portal using Breadth First Search (BFS) algorithm

Anatomy of an automated machine learning algorithm (AutoML)