__Introduction:__

For a very long time, many users have questioned the integrity of the game's RNG, claiming that it is broken in one way or another. These complaints range from broken coin flips, to excessive mulligans, to bad shuffling. These claims have for the most part have had little to no data, which means that the only forms of proof we had of the RNG working correctly were the words of the Staff and common sense. Of course, some people simply refuse to believe in the authorities for some reason, so I've decided to gather a significant amount of data myself to shed some actual light on this perceived issue.

I'm going to be frank, I pretend to burry the issue of the RNG once and for all eventually. For now I will begin with a small experiment on the simplest for of RNG (coin flips) to get this rolling, but I intend to tackle the other forms of RNG when I have time, which might not be anytime soon.

The reasons why mulligans, deck shuffling and other types of random events are more complex to investigate are mainly that 1) Many time doing so requires knowledge of what ended up in your deck and what ended up in your prizes. It's possible, not too hard to do, but time consuming to do repeatedly, as I do need significant data pools to make a valid study, and 2) Perhaps more importantly than the first reason, is that the math behind calculating the probabilities of some of these is an absolute nightmare. I do not have the time to spend on doing these calculations at the current time.

This experiment will cover 2 parts mainly:

- A comparison of 100 coin flips in each AI games and PvP games in order to show there is no significant difference between them. Most importantly, to show anyone who might be skeptical of data collected in AI games (which to be honest is a necessity, given that collecting hundreds or thousands of coin flips in PvP in a short time span is tough and very time consuming) that you do not get worse coin flips in PvP than you would in AI.
- A collection of 500 coin flips in AI games, including the initial 100. This is to show that the spread of data is in fact more than less even at higher amounts of data.

__Some Concepts I'll Use:__

Before starting I'd like to point out some things that are necessary to understand my mini study.

**Sampling Error and Acceptable Range**

While the values should tend to an average in the long run, there DOES exist a way of telling if small data pools correspond to a working RNG. As you'd expect, the more data the closer we get to a 50-50 split and the less data the opposite is true, and there is a mathematical way of determining if the deviation from a 50-50 split is within the expectations of a normal, functioning RNG, which is to say the Error of a sample. The Error gives us a range of values where the distribution of heads and tails is considered to be acceptable (meaning it indicates an RNG that works correctly).

To determine the error of a sample where the outcome is binary (yes or no, heads or tails, etc.), we must divide 1 by the square root of the sample size, giving us a percent of acceptable error.

**Error = 1 / (S Size) ^{1/2}**

The range of acceptable heads/tails distributions is the expected average (meaning 50%, which I will be referring to in its decimal for, 0.50) ± the error.

**Range: Lower End = 0.50 - Error; Upper End = 0.50 + Error**

For the most part I took samples in the AI games in groups of 10 or similar, and in the PvP games in groups of 5 or similar. To give everyone an idea, the ranges from sample sizes between 4 and 12 are the following:

- 4: ±0.50 [0.00-1.00]
- 5: ±0.45 [0.05-0.95]
- 6: ±0.41 [0.09-0.91]
- 7: ±0.38 [0.12-0.88]
- 8: ±0.35 [0.15-0.85]
- 9: ±0.33 [0.17-0.83]
- 10: ±0.32 [0.18-0.82]
- 11: ±0.30 [0.20-0.80]
- 12: ±0.29 [0.21-0.79]

As you can see, this means that any group of 4 (or less) consecutive coin flips can have absolutely any result at all and it would still be within expectations. For this reason, it's extremely hard to spot any anomalies in the short run with any sampling size that is of 4 or lower.

While a 5% difference might not seem like a lot, it really does make a visible difference. For any sample size that has a range other than 0% to 100% you can tell that the vast majority of the samples will not exceed this mathematically determined range.

For the 2 overall sample sizes, 100 and 500, the ranges are as follows:

- 100: ±0.10 [0.40-0.60]
- 500: ±0.04 [0.46-0.54]

**Aberrants**

What I will determine as an aberrant is any sample that does not fall within the range that I established using the previous formulas. I want to note that all of my samples are of 5 consecutive coin flips or higher, meaning that I will always have a small probability of obtaining an aberrant.

Basically, I have 2 checks for telling if the RNG is working properly or not:

> __Long Run:__ The overall sample must be close to a 50-50 spread and within the expected range.

> __Short Run:__ The vast majority of the small samples must be within expectations. If the aberrants are very few and far between, chances are that the RNG is perfectly normal. Sadly I do not have a formula to determine how many aberrants would constitute a "broken RNG" and how many would constitute "exceptionally good/bad luck" which could be written off by a small sample size, so this is slightly subjective. However, if the total number of aberrants is blatantly abysmal, then we can all conclude that the RNG is fine.

__The Decklist and the Collection Method:__

I determined that the easiest method for collecting coin flips was using AOR 20/98 Gyarados. For this purposed, I assembled a clone of my Archie's Blastoise deck with some modifications, the idea being to charge Gyarados up with Blastoise. I want to note that this decklist is not optimized for either data collection or actually playing (trying to win), and it has remnants of the original deck which do not serve any purpose here.

* 2 Victini

* 2 Exeggcute

* 1 Jirachi-EX

* 3 Magikarp

* 3 Gyarados

* 1 Manaphy-EX

* 2 Blastoise

* 1 Computer Search

* 1 Rough Seas

* 4 Trainers' Mail

* 3 Acro Bike

* 1 Startling Megaphone

* 4 Puzzle of Time

* 4 Battle Compressor Team Flare Gear

* 4 Ultra Ball

* 2 Archie's Ace in the Hole

* 4 VS Seeker

* 3 Superior Energy Retrieval

**Part 1: AI RNG V.S. PvP RNG**

I will be listing the samples (numbered in the order they were registered), the spread of each one and the total at the end. If there are any aberrants I will explicitly mark them so in a bright red text note. I want to note as well that for PvP I couldn't get a perfect 100 samples, because it's harder to manipulate the number of energies I can attach and how many times I can attack when I'm playing against actual people, assuming that they are trying to win (most of them are). For this reason I ended up with 104 flips in PvP instead of 100, but it doesn't make too much of a difference.

__AI Games Data:__

1) H:6; T:4

2) H:4; T:6

3) H:6; T:4

4) H:5; T:5

5) H:5; T:5

6) H:2; T:8

7) H:4; T:6

8) H:3; T:7

9) H:5; T:5

10) H:6; T:4

**Totals Flips: 100 **

**Total Heads and Tails: H:46; T:54**

**Number of Aberrants: 0**

**Range: 0.40-0.60**

**Total Within Expectation: Yes**

__PvP Games Data:__

1) H:3; T:2

2) H:2; T:3

3) H:2; T:5

4) H:6; T:1

5) H:4; T:3

6) H:4; T:3

7) H:6 T:2

8) H:6; T:2

9) H:4; T:1

10) H:3; T:2

11) H:2; T:4

12) H:2; T:4

13) H:4; T:3

14) H:3; T:2

15) H:3; T:2

16) H:3; T:4

17) H:6; T:1

18) H:0; T:7 **ABERRANT**

**Totals Flips: 104 **

**Total Heads and Tails: H:63; T:41**

**Number of Aberrants: 1**

**Range: 0.40-0.60**

**Total Within Expectation: Yes**

As you can see, in both scenarios the total is within expectations (AI was in the lower quarter while PVP in the upper end), and there was exactly 1 aberrant sample in all of them, although there were a couple more that cut it close (6 and 1 is close to the limit but still within expectations).

From these samples I can conclude two things:

- RNG in AI and PvP should be about the same. In fact, my luck in PvP was better than in AI, which is not what I am trying to prove but it does serve to show that people aren't getting screwed over in their games at least. While some may still claim they do, I have mathematically proven that any streaks of good and bad luck I may have gotten (with exactly 1 exception) are within expectations, and both averages are also within expectations. This means that any data I collect in AI will be considered as a valid representation of what happens in PvP.
- Initially, it would seem that the RNG is completely fine. To further prove this I proceded to collect 400 more samples in AI to gather a total of 500 coin flips.

__Part 2: Collection of 500 Coin Flips__

For the second part, I took the initial 100 flips and added 400 more. This took me a total of 50 different samples of varying sizes. The following is the list of the samples (note, once more, that the first 10 samples are the same 10 I postes in the first part):

1) H:6; T:4

2) H:4; T:6

3) H:6; T:4

4) H:5; T:5

5) H:5; T:5

6) H:2; T:8

7) H:4; T:6

8) H:3; T:7

9) H:5; T:5

10) H:6; T:4

11) H:7; T:3

12) H:6; T:4

13) H:2; T:8

14) H:4; T:6

15) H:6; T:4

16) H:6; T:4

17) H:5; T:5

18) H:7; T:3

19) H:6; T:4

20) H:4; T:6

21) H:4; T:4

22) H:6; T:2

23) H:5; T:4

24) H:2; T:7

25) H:5; T:4

26) H:3; T:6

27) H:7; T:4

28) H:5; T:6

29) H:7; T:4

30) H:8; T:3

31) H:6; T:5

32) H:7; T:4

33) H:6; T:3

34) H:3; T:6

35) H:5; T:5

36) H:6; T:4

37) H:5; T:5

38) H:4; T:6

39) H:6; T:4

40) H:4; T:6

41) H:5; T:5

42) H:4; T:6

43) H:5; T:5

44) H:4; T:6

45) H:3; T:7

46) H:5; T:5

47) H:2; T:9** ABERRANT**

48) H:8; T:3

49) H:6; T:5

50) H:5; T:6

**Totals Flips: 500 **

**Total Heads and Tails: H:250; T:250**

**Number of Aberrants: 1**

**Range: 0.46-0.54**

**Total Within Expectation: Yes**

Much to my surprise, the final result was an exact 50-50 split. Anything between 230 heads and 270 heads would have been within expectations, but this time the total was dead in the middle of the range. I think it's also important to note that there was exactly 1 aberrant in 50 samples, which I think could be considered blatantly abysmal compared to the total.

__Conclusions:__

With this, I believed I have proven once and for all that coin flips are nowhere near to be as completely broken as some people claim. I would like to explore some more cards that require coin flips and the start of the game flip as well, but I'd like to point out that many people have claimed (incorrectly) that their coin flips for attacks and similar things are way off, which this small study completely disproves.

While we may never know the RNG algorithm (and would probably not be able to interpret it even if we did), I have now empirically proven that coin flips have shown absolutely no indication at all of being broken.

**Thus, the ultimate conclusion of my experiment is that the RNG is working as expected.**