PROBABLE ADVANTAGE IN ROCK, PAPER, SCISSORS

I took a class at the UCLA Anderson School of Management called Introduction to Complexity Science. I learned that people use agent based computer simulations to predict real world situations and then make decisions based on that information. I was a little concerned that important decisions such as how banks should invest money was in part being decided by a simulation that can only be as accurate as the subjectivity of the person who designed the model. For my final projet in that class I decided that I would try to prove something impossible using my own version of a subjective agent based model.

What if there was a “winningest” strategy for Rock, Paper, Scissors? When it comes to various discrepancies between friends, you could use that information to make sure that you get your way the maximum number of times in your life. I wanted to find that information so I built an agent based computer simulation where thousands of players were pared up at random for thousands of games using all possible combinations of strategy. This is a report on what I found out.

previewrps

There are many situations in life where decisions have to be made by chance operators. Situations where multiple interests are in conflict and the decision has to be made by a impartial decider. These devices are many, including the coin flip, drawing straws, names in a hat, and many more. One of the most common and readily available is the game Rock, Paper, Scissors (RPS). The use of RPS in decision making between friends is common because of it’s simplicity and it’s availability. It is commonly thought of as a truly fair and impartial decision maker, but unlike the other devices listed above, RPS has certain characteristics that make advantage strategies possible. If it is reasonable to imagine the need to make many decisions in one’s lifetime through the use of a impartial mechanism of some sort, it would be beneficial to have some strategies for advantage in one of the most common readily at hand.

People have been using the game of RPS as a random decision mechanism for several thousand years. It is a derivative of the Japanese game Jan-ken-pon [ http://en.wikipedia.org/wiki/Janken ] which was invented in the late 19th century and acquired popularity world-wide throughout the 20th century. One of the most notable features of the game is it’s simple and balanced symmetry. A game for two players, each player may choose one of three symbols, both players reveal their choice at the same time, the player with the superior choice wins. Games are often played in sets of three throws where two out of three wins the match.

From a mathematical perspective all three choices are in non-transitive relation to one another. That is to say, rock beats scissors, scissors beat paper but paper beats rock. In this relationship it should be impossible for there to be one universally superior choice. This is what gives the game it’s balanced tension. But on closer inspection, other factors become relevant.

First, the symbols that players choose from are not arbitrary or abstract. They are references to tangible cultural objects that have connections in many societies around the world. These symbols, while probably having meaning to most people who play the game, do not have absolute meaning to all people who play the game. One person’s connection to the symbol of rock is likely to be different from the next person’s. These aspects lead to various tendencies towards cultural, sexual, and individual bias in the subconscious perception of the inherent balance between the three symbols. For instance it could be possible that males have a higher probability of choosing rock on their opening move where females have a higher probability of choosing scissors.

Also it is important to notice that a RPS does not usually happen in a blind exchange. People usually play the game face to face and there aren’t rules against talking to each other while playing. This introduces the normal competitive advantages and weaknesses found in many forms of gambling, especially card games like poker. Many people have unconscious tendencies that can give away their next move, such as body language, hesitation, or recurring patterns in iterative games. Players also may try to skew the odds in their favor through attempted psychological manipulation. This would include tactics like projecting where a player would suggest to their opponent what would be a good next move or inform them of what they themselves intend to choose. This aspect shares many of the same considerations as the iterative variation of prisoners dilemma.

The considerations that I have noted above contribute to the existence of competitive strategies in rock, paper, scissors. The idea of a strategy in a game that is commonly understood as a random decider is usually met with skepticism and curiosity at first, but a history of tournaments for both human and computational algorithm adds confidence to the idea that competitive strategy could be effective in some situations.

There are three dominant strategies that persist across many forms of competitive rock, paper, scissors. The first is projection and detection where the same basic concepts in competitive card playing apply. The second is random, where the player’s primary goal is to maintain a total absence of any recognizable pattern in the belief that over the long run random (under the assumption that random will provide a stochastic distribution) will render the highest frequency of wins (33%). And the third is gambit play. A gambit is combination of three throws decided in advance of a best two out of three match. With three throws in a game and three states for each throw, there are 27 possible combinations. Inevitably everyone will play one of these 27 combinations regardless of their strategy. The idea behind the gambit play, is that given that some combinations culturally are not as effective as others, and that given that a common and effective competitive strategy is projection/detection, it makes sense to decide in advance on all three throws before the game begins. This has a couple of advantages, one is that it allows the player to focus more concentration on projection/detection during the game. Second, based on the belief that randomness is exceedingly difficult for a human player to maintain dynamically, a gambit strategy is preferable to an accidental improvised pattern.

The aspect I found particularly interesting in gambit play is that there are eight of the 27 that are regarded as more effective than the rest. The idea that one combination could be probabilistically more likely to win is interesting to me because it’s possible culturally but impossible mathematically.

According to The Official Rock Paper Scissors Strategy Guide:

The mathematically inclined will quickly realize that there are only twenty-seven possible Gambits. All of them have been used and documented in tournament play. Each has several names from a variety of locales. There is no such thing as a “new” Gambit.
The Great Eight Gambits are the eight deemed to be the most historically significant and widely employed. They also happen to be the only eight Gambits where there is near unanimous consent upon the names. [ The Official Rock Paper Scissors Strategy Guide; Fireside Books, November 9, 2004 by Douglas Walker and Graham Walker ]

The first task that I undertook was to build a simple model that would spread a random distribution of the 27 combinations Gambits across 1,000 players and have each player randomly paired up for 4,000 games of Rock, Paper, Scissors where a game is best two out of three throws there is no communication between players, no access to player histories and no cultural preferences were made between symbols. What I found was that there was no Gambit strategy that would persistently dominate the system. There were some marginal leaders, but those would change every time the trial was run. This test is not completely representative to an actual lifetime of RPS decision making because there is no preferential attachment to the cultural concepts that the symbols represent, and there was no possibility for bluffing or learning but from a strictly mathematical sense, it should not matter what Gambit you choose from game to game because ultimately it’s a wash.

The problem of cultural preference is exceedingly difficult to model with any realistic accuracy without a more comprehensive survey of games played in various cultures around the world. For instance if there is a theory that Rock is the most common opening throw in a game of best two out of three and that it’s that is attributed to subconscious cultural or sexual influences, there might also be the possibility that it’s strictly a physiognomical reflex to the way the game is played (with three beats of a closed fist leading in to the players revealing their throws), and that’s assuming that it actually is a trend and not just a perception of a trend.

In my study I chose to focus on the strategy of projection and detection and left other theoretical strategies out of the model. I wanted to find out if there was a competitive strategy that constantly did better than other’s in casual games of best two out of three played between strangers who have no information on the other’s playing history or preferences. This is how most games of RPS are played, and especially when it’s used for a random decider.

For every player in the system that I designed there is randomly chosen a preference to project their gambit strategy or to detect the opponent’s gambit strategy:

What I found when I ran this system is that at any number of players and any number of games, there is never a consistently advantageous gambit, but there is every time an advantageous project/detect preference. Players who have the preference to detect and the preference to trust consistently showed better results than the other three possible combinations of project, detect, trust and distrust. What I mean by better results is not however that their average or total wins were much better than the others but that their total losses were significantly lower. Appendix A shows the typical results that this configuration produces. The image below shows a screen shot of a visualization of the system in process:

In this visualization a player is represented by a circle. It’s radius is a representation of the accumulation of it’s wins and losses. When a player wins a best two out of three match it’s radius increases by 0.2 pixels, when it losses it’s radius decreases by 0.2 pixels. It’s color is a representation of it’s specific strategy, if two players have exactly the same gambit and project/detect strategies then they will also have the same color. As the system is running all players have the freedom to move on the X and Y axis. They have a grouping preference with a threshold of 0.5 pixels. The idea behind this kind of visualization was that if islands of homogenous color would emerge in groups, that would indicate that there is a specific strategy that is showing consistent results. As you can see in the image above there is not any significant grouping of color taking place. In the image below however there is, the image below is from the same system with the same number of players with the same rules but this time the color coding was changed to only represent the project/detect strategies and not the gambit strategies:

As you can see, in this version there are clear separations between the colors indicating very consistent performances from all of the the strategies being observed. I found it quite surprising that these two images are from the same exact system. It’s an eye opening experience to be looking for a certain correlation that simply does not exist and then to suddenly discover that there was a correlation all along, but it just wasn’t the one that you expected.

From these results it’s possible to say that there is no specific gambit strategy that will give you a consistent advantage over your opponent in a random pairing with no profile or history on the other player. It is advantageous to choose a gambit for the other benefits discussed earlier in this paper, but the specific gambit you choose shouldn’t matter unless additional information about your opponent is available. You can have a probabilistic advantage if your opponent is trying to project by trusting their projection and playing the gambit that beats it.

1,000 players randomly select one of 108 possible combinations and play 4,000 games of best two out of three in randomly paired partners. The results show the trends of project and detect strategies with no regard to individual throw strategies.