Potential Applications of Game Theory in S&D

Dr. Doug Liebe
6 min readSep 27, 2021

In the game of Rock, Paper, Scissors, there are only 3 actions a player can take, the same with the player’s opponent. Each choice wins in 1 of 3 scenarios, ties in 1, and loses in the final scenario. Since 1 of 3 scenarios is net-neutral for both players, playing randomly will result in a win percentage of 33.3%, or an expected value (EV) of +0 wins per game. We can show this in a table:

The Expected Value (EV) = the value of every possible event x the chance of each event happening

If we are a rational player, we will change our strategy to extract maximum EV from our opponent. But now consider that your opponent is also a rational player. Just as you want to change your strategy to take advantage of them, they also want to do that to you! As soon as you start to play differently, they must change their strategy too. Let’s say you start playing more Paper because your opponent plays more Rock. Your opponent will just start playing more Scissors! This seems like an endless cycle, but is it? No, because we can reach a Nash equilibrium.

A Nash equilibrium refers to the strategies that lead to no improved EV from changing strategy.

In a zero-sum game like RPS, the Nash equilibrium is pretty clear: just play each strategy with equal probability. If either player moved away from the optimal strategy, their opponent would attempt to take advantage of them! If you always play Rock, I will always play Paper.

In RPS, if either player plays randomly, the EV = 0, always. But in more complicated games, it can be hard to know when you’re making the right moves. How can we quantify the strength of our strategy?

Regret Minimization

Regret is defined as the value lost by making your choice as opposed to your other choices. If both you and your opponent play Rock, you have no regret for playing Rock, since you DID pick Rock. If you had played Paper, though, you would have won. We don’t like making plays that lead to regret. Your regret = Play Value-Potential Value= 0-1 = -1. The regret of playing Scissors would be = +1. Let’s ignore positive regret for now and focus on negative regret, the times we wished we’d done something else.

On any given game of RPS, you will either win or have some regret. In the long run though, our best strategy will minimize average regret. Let’s look at the average regret of each move playing randomly against a 40/30/30% player. Our opponent plays more Rock, so we should have some regrets here not taking advantage of him.

Just as we thought, playing Scissors leaves us with regret more often than not, since our opponent is playing more Rock. Instead of fixing one of the player’s strategies, let’s consider two real people changing their minds as they play.

We can make a simple simulation to try and optimize our strategy against this stubborn Rock-thrower. Regret minimization has been written about extensively, you can read this well-written paper✎ EditSign on how to create a simulation that minimizes regret instead of maximizing expected value, but the method is pretty simple: any time a strategy has negative regret, we should try to make that play less often in our next game until we cannot make our regret less. By iterating this process many times, we will converge towards a “least-bad” strategy, even in the case of imperfect information.

Finally, some Cod

A game mode like Search and Destroy (SND) in Call of Duty sometimes can feel a bit like Rock, Paper, Scissors. Depending on the map, some require the defense and offense to allocate forces on 1 of the 2 bombsites. If the defense picks correctly, they can drastically decrease the potency of an attack. Obviously, SND is much more complicated than this, but in games of incomplete information, it can be helpful to abstract the game to decrease complexity, as long as conclusions are still somewhat helpful. Consider a poker solver trying to figure out what to bet on a specific hand. If you have $1 million dollars in chips, you have 1 million + 1 options to bet. Your opponent also has so many answers. But what if you only considered bets in $100-dollar increments? The solution would surely be close to optimal, despite reducing the complexity 100-fold.

Make a complicated game more simple, solve it, and try to apply the answer to the complicated game (Sandholm, 2010)

Express SND

Playing Express SND in Cold War, I think, felt the most like RPS. Defenses would often stack B on defense and, if the offense attacked B, would lead to a bloodbath 5 seconds into the round. Attacking A was more difficult based on its location. If a defense stacked B on A attacks, however, the attacking team could easily plant the bomb, leading to a higher win percentage.

Win Percentage based on what the offense and defense choose to do

Based on @KarsenCLS data from the first 2 stages with Express SND, teams won 42% of rounds attacking B. Teams won attacking A 56% of the time. A and B hits made up the majority of offenses; teams played a neutral setup on offense only 15% of the time. Using these numbers, I created an RPS-esque win probability chart based on what the offense and defense choose to do.

This is just an estimation but will work for our purposes. This is basically the same as an RPS simulation if, instead of Rock always beating Scissors, Rock beat Scissors 80% of the time, etc. Let’s try to find the optimal strategy for offense and defense, allowing both teams to adjust each iteration.

Optimal Offense = 40/60% A/B; Defense = 59/41% A/B

Let’s look at the results in a table like before. The only difference being instead of a win/loss/tie system, each pair of choices results in a Win% for the offense (refer to the above table).

Freq = Optimal Offense; Opp Freq = Defense. Both playing optimal, the offense would win 49.7% of attacking rounds.

The graph begins to even out at the offense attacking A 40% of the time, B the other 60% of the time. This leads to the offense winning 49.7% of the rounds.

Takeaways

If the “optimal” strategy for the offense is to attack A 40% of the time, what did teams actually do? This season teams attacked A 46% of the time, B 54% of non-neutral rounds, and won 47% of rounds overall.

Caveats: Defense setups weren’t tracked here so it’s hard to estimate their true frequencies. It’s also important to note that this is not the *best* scenario for the offense. If the defense followed 59/41% no matter what, the offense should just go B every time, leading to a win% = 55.9% (you can do that math yourself). BUT the defense strategy is not fixed. If you went B every round, the defense would certainly do something about that. A dynamic equilibrium is just doing the best that you can control.

I think there are two cool things about running a simulation like this:

  1. Optimal moves under uncertainty can be counter-intuitive and abstracting the game (thinking of SND like Rock, Paper, Scissors) can help us reach better strategies.
  2. The best strategy in any zero-sum game with two players depends just as much on your opponent as you. There is NO fixed strategy that will be better than adjusting to your opponent, and vice versa.

Follow me on Twitter if you enjoyed and want to see more articles! Cheers

--

--