An Indonesian translation by Jordan Silaen
A Turkish translation by Zoltan Solak
DNA image courtesy of Paul Thiessen (I have modified the image - the originals are beautiful. Take a look!)
I enjoy playing with genetic algorithms - watching them grow and evolve. In my experiments, which always entailed organisms that wander around my screen looking for food while overcoming obstacles, I have watched fascinated as great civilizations grew and decayed. I rediscovered so many of the basic principles that have probably been known to biologists for decades, and just maybe, a few that weren't.
One day I speculated as to whether a genetic algorithm could evolve to master a simple game. I thought perhaps the game of TicTacToe might be a good experiment to begin with. I knew from an analyses I had attempted years ago that it may not be a simple task. Have you ever sat down and tried to analyze the game? You start drawing little diagrams of the game board, which soon become trees that grow exponentially as you work through all the possible permutations of game play. You will probably give up within a short time as your desk becomes littered with little diagrams that you can't follow anymore. I did. Life is too short!
It occurred to me that there must be a way to automate the task of analyzing the game. I wrote little algorithms to play games with each other by making random moves. The idea was that if they played millions of games, they were bound to play every possible permutation of the game eventually. I would only need to capture every board position generated, looking it up in the developing table to see if it already had a copy, adding it to the table if not. In the end then, I would have a table representing every possible situation. But before I began, I considered a few simple rules to narrow down the complexity.
At each move, I decided, a 'filter' would analyze the board from the point view of the current player, and complete an obvious win pattern for him or block an obvious win attempt by his opponent. I decided that such a trivial task would not be part of the organism; instead he would have a far more difficult skill to learn - strategy. I divided the tables in two. I assigned the X player as the one that always moves first, and he would have a table of every possible board he can be presented with, beginning with an empty board awaiting his first move. I did the same for O, who also would develop a database of every position he could ever encounter, beginning with a board with a single X on it.
Finally, to narrow down the number of possible boards, I developed code to rotate and/or 'reflect' each board as necessary during each comparison to eliminate any duplications that were simply due to the same board rotated in space.
Now that I had cut the task down to a manageable size, I allocated tables to hold thousands of boards if necessary, and let the games begin. I ran this 'brute force' algorithm for a million games or so, and was amazed to discover that in the end, under the rules established above, there are only 57 possible 'boards' that X has to be able to respond to, and only 38 possible boards that O will ever encounter. This was far simpler than I had imagined!
X has only 3 unique choices for the first move, if you rotate and/or reflect the board as required to eliminate duplications. He can play a corner move, a side move, or take the center. I discovered that he has 7 choices on the 2nd move, 5 choices on 3rd move, 3 choices on 4th move, and only one choice on 5th move. Then we can establish an upper limit of 3 * 7 * 5 * 3 * 1 = 315 possible games, though in reality only a small fraction of these possible paths are taken beyond a few moves. Many paths lead to early termination of the game, and others lead to the same positions when rotated or reflected in space. Filtering out all obvious wins or obvious needs to block eliminates many other positions from the database. In the end I found that as few as just 17 boards covered every situation a highly evolved organism could ever encounter.
The game board itself, having 9 squares, was assigned to a simple one-dimensional array with 9 elements. It didn't matter to the algorithm how those squares were laid out in the real world. It only mattered that he was able to evolve an appropriate response to any particular pattern handed to him, this being a number 0 through 8 representing his move in response to the pattern presented.
Now it was possible to create genetic algorithms to play the game. The X player would have 57 'genes', in reality a simple one-dimensional array of 57 elements, each which can take on a value 0 through 8, in combination with his data table of the 57 possible situations he could encounter in any game.
An algorithm would look up the current board position in his table. When found, the index of its position in the board table becomes the index into his gene table to look up his response. Similarly, the O player algorithm would have a table of the 38 possible situations he could encounter in a game, and an array of 38 genes. -Here- is an example. Now finally we could begin to evolve a civilization of tic tac toe players.
To begin then, I had an array of a thousand players. Each of those 1000 players was assigned a random (legal) value for each gene, his choice for a move to one of the unoccupied squares on the board. I developed X players or O players separately. Whichever the player, he played against the random number generator.
Each of the 1000 organisms plays a 1000 games, and accumulates a score. Only the 10 best players are selected to become the fathers of the next generation. Each of those 10 best players has 99 offspring. Each of those offspring suffers a single random mutation to one of his genes (though I have found from experience that it pays to give a small number more than one mutation). So now we have a whole new generation to put to the test, consisting of the original 10 winners and all their mutant offspring. Each of this new generation plays a thousand games, then once more the best 10 players become the fathers of the subsequent generation.
What a marvel it was to behold! Within 6 to 10 generations they had reached their peak, and I began to experiment with the points awarded for each win, draw, or loss. I discovered that if they were awarded 1 point for a win, 0 points for a draw, and -1 point for a loss, they evolved to increase winning at an astonishing rate. In games between two players making random moves, X (the one I assign to always go first) always wins against O about 2 to 1. Clearly it is a strong advantage to have the first move. However, by the time my organisms had evolved several generations, O was beating X by a ratio of 12 to 1. (X making random moves when not blocking or taking simple wins)
After this first success, I wondered if I could evolve organisms that never lose. I succeeded in this endeavor by increasing the punishment for losing drastically - such as deducting 100 points for every loss. After a few generations, nobody ever lost a game anymore. But the number of wins also dropped dramatically.
I discovered the right ratio of reward and punishment to get the very best possible performance out of my organisms: rewarding them 1 point for every win and about negative 6 or 7 points for every loss.
I was very pleased with the results. Not only had I evolved a race of champion players, but what they had learned to play was strategy. Remember, all simple wins or blocks were filtered out before ever reaching the organisms, and their opponents were also given the ability to unfailingly block a simple win or play their own simple win before generating a random move.
One could say that the genetic organisms had to have developed the capacity to 'anticipate' more than one move ahead, in order to be able to prevent their opponents from ever being able to set a trap such as the dreaded 'two way split' for them. They also of course had to be able to 'think' far enough ahead to lay their own traps, or they would never win. Since their opponents were given the ability to always block a simple win, the only way to win a game was by setting up a strategic play at least one move in advance.
I now had an invincible organism, a champion of champions. What could I do with it? Perhaps one could put a pigeon in a Skinner box with 9 levers to peck at. If he pecked the correct lever in response to the board pattern presented, he would be rewarded with a grain of corn, and thus eventually learn to play TicTacToe. Or, I could use it to train -A Neural Network that plays TicTacToe-!