How To Create A Random Dice Simulator In Excel (Simulating Catan)
Are you ready for a really really (really) nerdy post? Yeah, I thought so.
So recently, on this blog, we took a look at probability in the board game Catan. The outcome was to say that, although 7 is statistically speaking the most likely single number to come out in a game of Catan, there is a high chance that it will not be the most likely result to come out across a series of dice rolls. This was simulated by me rolling 240x 2D6 and recording the results. There are, on average (according to some sources), around 80 rolls of the dice in a game of Catan, so that was a simulation of three games.
In that blog, I said that this is an incredibly fallible way of doing things as ultimately the dice results would even out and become more of a bell curve the more results we have.
Chatting with a couple of ace bloggers in the comments, Dave (from the blog Faith&Steel) and Faust (from the blog Double Down Dice), they suggested that I actually use something like Excel to simulate dice instead. This eliminates any wonky dice, as well as saves time, effort, and provides more accurate results. It allows for the generation of thousands of results rather than hundreds.
So, with that in mind, I took their advice and created a random dice generator in Excel. As we all know, there are a hundred thousand ways of doing things in Excel, so I thought I would show you my simulator, how I made it, what the formulas are, and what the results are. This is how I simulated 500 games of Catan, in a matter of minutes, using Excel. Coincidentally, by writing this article we are going to cover how to create a random dice simulator in Excel, so feel free to build along if you so wish.
How To Create A Random Dice Simulator In Excel
Step 1: Decide How Many Games To Simulate
Using the premise that a Catan game is around 80 rolls of the dice (I’m assuming 4 players, and 20 rounds looking at this as a secondary source and erring on the side of potential mathematics) and wanting to simulate 100x games to begin with we will use Columns B through CW. This is 100 cells across and then rows 2 through 81. This highlights 8000 cells across the middle. What we are looking for is figuring out the most rolled numbers across the games.
Step 2: Using A Basic 2D6 Formula
In cell B2 we are going to use the following formula:
To explain that a bit – Randbetween generates a random number between the two numbers in the brackets. In this case it is 1 and 6 since we are simulating a D6. The Randbetween is added to itself to simulate 2D6.
It is worth noting that, if you ever want to add on an additional dice, so you want to simulate 3D6 for example, you just add another “+(RANDBETWEEN(1,6))” on the end. If you ever want to simulate a different type of dice, you turn it to “RANDBETWEEN(1,x)” where “x” is the number of sides on the dice.
This is dragged down to cover B2 through B81, and then highlighted and dragged across to CW81, filling in all the space in between.
Why start at B2? Well, I was debating if I would need a column at the side, so started with B2. You can do A2 to CV81 instead if you so wish.
Just be warned that, due to the =randbetween, the cells will randomly change whenever you edit a cell from now on. You can get over this, if you want to, by copying and pasting the cells onto themselves; however, I wouldn’t recommend it at this moment in time.
Step 3: Finding the Mode
Since, in this case, we are wanting to know the most common number in each set, to get a modal average, we need to look at creating a mode calculator for each column. This means applying the following formula to space beneath each “simulated game”. I put it in the 84th row.
This provides the most commonly rolled number across the game in that cell. Dragging it across will generate the mode for each set of games.
For me, those modal formulas were written in B84 to CW84. This is important for later.
Step 4: Collate the Modes
The next step is to collate the modes together to find how many there are of each dice roll to find out what number comes up the most in each game. For this, we will need 11 different cells and formulas, one for each combo there is when rolling 2D6 (you can’t roll a 1 on 2D6). These are laid out in a spare space like as shown on the right.
To explain this, there were, for instance, 2x games out of the first 100 where the most rolled number was 4. There were 3x games where the most rolled number was 5.
In the number column next to the Avg. 2, you need to write the following formula:
This is going to take some explaining.
Effectively, the COUNTIF function in Excel will count the number of results. The way to think of it is COUNTIF(range,result), so, next to the Avg. 2 we want to count the modal results that are of 2. The messy looking $B$84 (etc.) just says that, when dragging those cells, keep that part of the formula the same. That is what the $ does in Excel.
So, for next to the Avg. 3 we want to count the number of times 3 comes up in the modal row (B84 to CW84 in this case). Thus the formula is:
You can see the digit on the end has now changed to 3. Depending on your version of Excel, you may be able to drag this down and have the formula change automatically, or you may need to do it by hand.
Side Note: Why the Mode?
When we talk about averages we usually talk about three different types. These are Mean, Median, and Mode.
Mean is where we take all the rolls, add them together and divide by the number of rolls. This gives an average roll value; however, in most cases, it will be a decimal. This isn’t hugely what we want.
The Median gives us the middle roll so it will tell us, if we sorted all the rolls in numerical order and lined them up, what the roll around position 40 is.
The Mode, instead, gives us the most rolled number. This is what we are interested in as it gives us a full, tangible, and real number to use.
Step 5: Graph the Results
Take the results, of which you should now have 11, one for each roll possible on 2D6, into a line or bar graph. Personally, I prefer a line graph/chart in this case, but it is up to you.
The graph will be a representative of 100x games of Catan and the modal rolls for each one. Save the graph as a picture or take a snapshot or whatever you want to do to keep a record.
Step 6: Re-Enter the Formula
As mentioned near the top of this guide, the RANDBETWEEN will change every time you change something in the form. Now you want to do it intentionally, so take one box and drag it onto another. The formula is identical, but the results for all 8000 simulated rolls will change. This will change everything automatically, including the chart. Clever, eh?
So, I did it five times, giving me 40,000 simulated rolls in total, which kicks the ass of the 240 I did manually before. This is approximately 500 games (ish, depending on how long you take a game to be).
These are the results.
Simulating Catan – The Results
So, for those who read my other article (you can here) you will know that the determined result was that 80 rolls may not be enough for any probability pattern to take hold. Over the course of one game, seven may not be the most common number. This led to a graph like this, showing the results of 240 different rolls across three games.
Game 1, for instance, barely had a 7 come up at all (only 6 times). Across the three games we actually saw 7 be the third most common result, with 8 being most common, followed by 6. This isn’t all that unusual.
That being said, when we start collating results across games, rolling a lot more games, and taking the mode result rather than each dice result, we tend to see a different story.
We start to see more of a pattern that keeps arising.
And the pattern keeps on and on, only varying slightly each time.
In fact, we start to see some incredibly interesting results.
So what we are seeing is two-fold. The first is that 7 is indeed the most common number to come up. That is point number one. There is no doubt about it – across enough rolls it is the most likely of all the numbers.
That being said, across all the 500 games there are also games where 4 has been the most rolled number. There are games where 10 has been. Where statistically unlikely, it is not impossible. Curiously, there appears a need for the number to have at least a 3/36 possibility of coming up for it to really stand a chance (using 500 games sample) of dominating a game.
This means that although 7 is the most likely, according to those 500 games, it still only has under a 50% of being the most dominant number in any given game. This means it is more likely that the mode will not be 7 than that it will be.
Over 10,000 Games (or 800,000 Rolls)
So…never one to back down from a challenge, let’s look at 10,000 games (which goes up to Column NTQ, if we leave A blank, in Excel). This almost crashed my computer…
Needless to say, I needed to alter the final set of formulas to incorporate row NTQ into account.
Interestingly, when looking at really large numbers of games (no, I am not going to go higher than 10,000 simulated games in this article) we start seeing games where the most rolled number is 3 or 11. Part of me is curious to see how many games we would need to simulate to have two or twelve come up as the most likely number, but to be honest, I’m not sure my computer will take it.
To put all of this into a graph then it will look like this.
That feels fairly definitive, and a much better method than rolling each individually. Judging how long it took me to simulate three games, it would have taken me around 104 days to simulate 10,000.
So there you have it – an incredibly geeky and nerdy and mathy article about probability in Catan. Weirdly, I really enjoyed writing this and had to work out a lot of Excel functions to write this post, so now I’m wondering what else I can simulate to this level.
Let’s find out!
Once again, a special thanks to Dave (you can read his latest article here) and Faust (you can read his latest article here) for their suggestions and in schooling me in how I should be doing things. Part of what this blog is about is learning as we go along, so if you have any ideas on how to improve this analysis please let me know in the comments below.
There we go – now you know how to create a random dice simulator in Excel. Now you too can simulate 10,000 games of Catan. I don’t know why you should unless you want to do analysis like this, but the important thing is you can.
Anyway, for the time being, what are your thoughts. Is there a more efficient way? How would you calculate the rolls? Let me know in the comments below.