GWENT COMMENTARY
RNG
Introduction:
RNG is an abbreviation for random, randomness, highly random, and the like. It is a phenomenon of most games, and probably all card games. On the positive side, it creates excitement, requires deeper analysis, demands adaptation, and creates variety. On the negative side it reduces the impact of player decisions, creates frustration, and can even solely determine the outcome of a match – trivializing strategy. In this article, I look at RNG in Gwent: where it arises, how it can be controlled by players, situations where it improves the game, and situations where it is destroying the game. Ultimately, I want to try to quantitatively answer the question, “Is there too little, too much, or just the right amount of RNG in Gwent”?
First, I can think of four places where RNG arises: the order of cards in the decks, variable effects of cards (which divides into random targets/choices and random values), the start of game coinflip, and the opponent match-up. Let me look at each of these in turn.
Order of Cards in the Deck.
A core feature of most card games (including Gwent) is the process of shuffling cards to draw a random hand from the deck. It is the variety of possible hands that makes each game unique, and this variety creates strategic challenge as players must decide an optimal order of play based upon which cards are held. There are a few games (such as chess) which hold players interests through strategy alone (with no random influences), but these games are few and generally only interest a small proportion of the gaming population. On the other hand, when cards drawn are so imbalanced in quality between players that no reasonable play decisions can impact the outcome of the match, the game become frustrating, boring, and pointless.
Gwent does offer at least four ways that players can mitigate the effects of draw order: mulligans, tutors, thinning, and deck manipulation. Mulligans allow you to replace undesired cards in hand with new-drawn cards. This allows you an opportunity to replace “bad” cards with something you hope will be better. They allow you to include cards that tech against certain matchups and can be avoided in matchups where they are useless. Moreover, they encourage certain deck-building strategies like polarization. (With polarization, a player deliberately chooses several very low provision cards with the intention of never playing those cards from hand, but always mulliganing them. This allows a higher average provision allocation to cards one does intent to play from hand. On the negative side, it also augments the impact of bad draws.) Tutors help to call highly desired cards from a deck. This allows players to build a deck around a card (or a small collection of cards) with substantially reduced risk of not drawing those cards. It also allows significantly increased chance of being able to use extremely high provision cards. And some tutors allow a bit of flexibility in response options to opponent plays. On the other hand, tutors reduce the need for players to adapt play to account for cards drawn. Thinning removes cards from a deck – thereby increasing the probability of drawing the remaining cards. It can be used to either remove low provision cards (giving higher average provision value on remaining cards), or to utilize more cards (and hence, more total provisions) from the deck.
If all cards were identical, card order would be irrelevant. When all cards are different, but tend to play for similar values, draw order RNG has no negative consequences on the need for players to adapt to the draw and the role of chance in determining the match winner. But if all card played for similar value, the strategy of the provision system is lost. Moreover, all cards playing for roughly equal value does eliminate most of the significance of the strategic manipulation of deck order. Do note that much of this is at the deck-building level rather than the play level, however. When many players appear to basically copy from a few well-designed decks built by others, one might question the true value of this strategic impact on the game.
On the other hand, when typical playing value differs too significantly between high and low provision cards, player impact on deciding the winner of the game is essentially eliminated. Let me look at some mathematics to determine how significant this can be.
To be able to reasonable make the necessary probability calculations, I will assume an unrealistically simplified situation. Although simplified, for reasons that become clear after the analysis, I don’t think the simplifying assumptions change the essence of the conclusion. Suppose a deck is completely polarized between 12 four-provision cards and 13 nine-provision cards. (Note that this is a total of 165 provisions – a typical allotment for many leader abilities.) A typical Gwent game consists of playing 16 cards from a 25-card deck. Assuming no draw-manipulations (like mulligans), the number of 9 provision cards a player draws follows what is known as a hyper-geometric distribution. For these parameters, the following can be computed:
- probability of 6 or fewer nine-provision cards is about 0.06
- probability of 7 nine-provision cards is about 0.18
- probability of 8 nine-provision cards is about 0.31
- probability of 9 nine-provision cards is about 0.28
- probability of 10 nine-provision cards is about 0.13
- probability of 11 or more nine-provision cards is about .003
This means the average difference in the number of 8-provision cards drawn between the players is about 1.28. Moreover, the probability of a difference of 2 or more is about 0.36, and the probability of a difference of 3 or more is about 0.13. If you want the average game to be decided by player skill and not luck of the draw, you need the difference in value of 9 provision cards and 4 provision cards multiplied by 1.28 to be less than the amount of difference good and bad play makes. To have less than 30% of games decided by “luck of the draw”, you need the difference in value of 9 provision cards and 4 provision cards multiplied by 2 to be less than the amount of difference good and bad play makes. For less than 10% of games decided by “luck of the draw”, you need the difference in value of 9 provision cards and 4 provision cards multiplied by 3 to be less than the amount of difference good and bad play makes.
So let’s look at some cards and see how the current design is working by this standard. Typical “good” 4 provision cards probably have an average value of about 7. It is certainly possible to get 13 value form 9 provision cards. If we accept a difference in value of 6, player agency must make at least 7.5 points of difference through a game if we want the average game decided by skill. If we want 2/3 of all games decided by skill, we need 12 points determined by player choices. This increases to 18 points if we want 90% decided by player choices. If we consider an average turn to be worth about 10 to 11 points, the difference in play in the last two cases must be more significant than squandering an entire turn! Examined another way, if a player loses one point of value every turn by bad play, they will still win over 13% of their games through lucky draws. This level of misplay is not uncommon with weak or inexperienced players. But how often do good players make this level of bad (or outstanding) plays? To me, this is convincing evidence that the point gap between low and high provision cards is too great, creating bad levels of draw RNG. And 9 provision cards are NOT really what I would consider high-end.
Let me redo this computation on a deck with 17 four-provision cards and 8 twelve-provision cards (164 total provisions). Omitting details, the average difference in number of 12 provision cards drawn is 1.25, the probability that one player draws at least 2 more 12-provision cards than the other is about 0.37, and the probability one player draws at least three more 12-provision cards is about 0.12. If 12 provision cards tend to play for 16 points, the situation is significantly worse. In 12% of all matches, draw luck would be worth 27+ points – almost 3 full turns. This is not acceptable at any level of play.
Now, in these calculations, I did ignore tools players have to reduce draw RNG (without doing so, the probability calculations become horrible complex and conditional). For the same reason, I assumed decks completely polarized between two different provision level cards. But based upon the relatively small difference between when I used a 4-provision / 9-provision split and a 4-provision / 12-provision split, I think it is reasonably safe to say that, on the average, one player will draw roughly 1 and ¼ more “good” cards than the other. In about 1/3 of all matches, one player draws 2 more “good” cards. And in about 1/10 of all matches, one player draws 3 more good cards. It may be more controversial to state that at present levels of average difference between top-level and bottom-level this presents more point swing than can be made up by quality of play – but it is certainly my belief (and one that I am by no means alone in holding). This can be fixed only by significantly reducing the impact of drawing poorer cards by chance, by reducing the typical difference in quality of draws, or by increasing the effect of good vs. bad plays. I think it is almost impossible to make player choices at top levels of play have more significance. The only way to substantially reduce the difference in probability of one player drawing significantly better than another would almost require re-writing the game rules and might eliminate the strategically interesting aspects of a random draw. Thus, I think a high priority of developers should be in narrowing the play-value gap between high-provision and low-provision cards.
Variable Card Effects:
Many cards in Gwent have random effects. For instance, Wheel of Fortune deals 1 to 10 points of damage, Runestones offer a choice of playing three randomly chosen bronze units, and Mage Assassins, when moved to the top of a deck, are summoned to the board and deal two damage to a random target. In all cases, this randomness can have significant value difference depending upon circumstances.
This randomness is also not always bad. If strategy can be used to favorably influence the likelihood of a favorable outcome, this randomness adds richness to the game. For instance, some cards give random choices that are influenced by player choices (for instance, Portal summons random 4-provision units from deck. Thus, limiting the types of 4-provision units in the deck can guarantee a good outcome. Pyrotechnician has orders to damage self and a random enemy unit by 4. Timing its play to be able to likely hit desirable targets is part of the game strategy – as is playing a poor target in response. Sometimes randomness just adds strategic richness to a game. Suppose an engine has 5-power. If you have a card that does 5 damage, the only strategic considerations in playing that card would be whether you have a better play, or whether that damage card will have a better target in the future. Now suppose your card randomly does 4, 5, or 6 damage. Your decision making now needs to account for the 1/3rd change you fail to destroy the target. Certainly, more thought will be required in deciding to play the card. But some players enjoy taking chances. And there is excitement in seeing the unusual.
But, again, it is highly undesirable to have randomness like this determine the outcome of a match too often. So, the main question becomes, “How much random fluctuation is too much in a card?” A second question is, “Are there types of random outcomes that are inappropriate?”
Regarding the main question, two mathematical concepts are critical: expected value and variability. Expected value simply means the average value obtained by the randomness. Variability refers to the extent to which the value of randomness can vary from instance to instance. For example, a card which does anything from 1 to 10 damage has the same expected value as a card which does either 1 damage or 10 damage with equal probability. But the latter always swings wildly (either 1 or 10) while the former will often do four or seven. The first card is less variable. Higher variability gives randomness more impact.
Another relevant mathematical principle is that when a random result is rolled multiple times, the expected value of those rolls is the same as for a singly roll, but the variability in the expected value is reduced by the square root of the number of random rolls made. Thus, if randomness occurs over many trials, the average result is not highly variable, even though the individual results are. Thus, randomness will be far less likely to determine the match when lots of random decisions are made than when one random decision is made.
Finally, it should be noted that high variability always favors the weaker player. Suppose the stronger player always scores 100 points, while the weaker player always scores 90. The stronger player always wins. If we add variability – the weaker player now scores from 70 to 110 points (with equal probability). On the average, the weaker player still scores 90 points. But that player now wins 25% of the time. And if variability is increased to the point where the weaker player scores 0 to 180 points, that player’s chance of winning increases to 44.4%. If we want matches determined by skill and not chance, either variability must be small, or highly variable cards must have a cost (in expected value) to help offset that variability.
Further mathematical analysis requires use of two parameters: the average difference in points generated by “good” vs. “bad” play, and our tolerance for randomness to affect the match outcome (i.e., the probability that the poorer player wins the match). The first significantly depends upon the quality of the players and is hard to measure. At top levels of play, from observation, I would estimate it at about 10 points per match – but this is definitely arguable. The second is subjective – I am willing to accept 20%.
Now I must turn to some tedious statistics. For those not interested in the details, simply skip the next 3 paragraphs.
Finally, for simplicity, I am going to assume that the difference in points with equal quality play and balanced cards/decks is normally distributed with mean zero. (This assumption is reasonable if most plays have roughly equal variability, it will likely underestimate the role of variability if there are few, highly variable plays since individual plays tend to have either uniform distributions or distributions with one of two possible outcomes and both these distributions have higher probabilities of extreme results.)
If a weaker player is to win 20% or less of the time, I need a standard normal distribution z-score to be such that proportions of scores less than z = 0.8. This score (from a standard normal table) is z = 0.84. Because, under my assumptions, I want random scores to vary by less than 10, I need 10/sigma < .84, or sigma to be less than 11.9.
In a standard game of Gwent, 32 cards are played (16 by each player). If each were to have equal variability, each card would need to have standard deviation less than 2.1 (since the standard deviation of a sum of n equally variable random numbers is square root n times the variability of one such number). If 16 cards vary, and the rest are fixed, the standard deviation of each must be less than 2.975.
I will take 2.975 points of value as the maximal allowed standard deviation for a card, as that would prevent any one player from being able to select cards with sufficient random effect to sway a match. Thus, since the standard deviation of a uniform distribution is given by (b – a) / Sqrt (12) where b – a is the difference between largest and smallest possible values, top and bottom values of a single card can reasonably vary by slightly more than 10 points. Since the standard deviation of a binary random variable that takes on one of two values with probability p (and the other value with probability 1 – p) is given by difference in values * Sqrt (p (1 – p)), a card equally likely to have a good and a bad outcome can make a difference of slightly less than 6 points to be in this range, where a card where the probability of a good outcome is only .1 can have a difference between good and bad outcomes of slightly less than 10.
Let’s look at certain Gwent cards to see how they fare on this basis. The previously mentioned Wheel of Fortune card (one that is rarely played) does damage from 1 to 10. This is within a reasonable variability range. And at 5 provisions for 5.5 points of damage, it is slightly above average in performance for removal cards, which probably average one provision for 1 card. (But I think its variability is unattractive even to players who like risky cards.)
Golden Nekker is frequently criticized as being either too powerful or too random. Let us examine these complaints. Note that I am not accounting for all implications of its deck building restriction. I am also assuming that it does not brick – that it can draw a card of each allowed type. Because it draws three cards, and its net effect is the sum of the three, it follows neither of the distributions I described earlier, but I think we can still work with it. Let us suppose that each of the three cards drawn is uniformly chosen from cards that have a uniform distribution of provision costs ranging from 4 to 9, and that each card plays for roughly provision cost + 3 points of value. (This is a figure I have found to be fairly in line with Gwent cards.) It would then play for an average of 19.5 points – which is way out of line for a 9-provision card. The standard deviation of its value would be Sqrt (3) * 5 / Sqrt (12) = 2.5. On the surface, this is reasonable variability. But in computing value, I have not discounted the value of playing cards that are not chosen but randomly assigned. And in computing variability, I have discounted the variability in randomly picking cards that may or may not be appropriate for the situation, not to mention the randomness that might already be present in those cards. I don’t think these assumptions are significant enough to change an appraisal that Golden Nekker is badly over-powered. They may be significant enough to make me question whether the card is too variable.
Kingslayer is another hated card. For 6 provisions, its average value is 4 plus the average value of the card destroyed – the average value of the card that would be played in its place. Since, in general players mulligan lower provision cards, it is probable that the cards in deck are of lower average value than the cards in hand. Thus, the average value of a card destroyed is likely to be less than the average value of a card destroyed. There is no justification for claiming that Kinslayer is over-powered. Some do complain about its variability. Let’s take a polarized deck, with 8 twelve-provision cards and 17 4-provision cards. Again, assuming cards play for 3 points over their provision cost, we can figure 12-provision cards are worth 15 and 4-provision cards are worth 7. However, we can assume that a 4-provision card destroyed would have been mulliganed, so Kingslayer onto this card is worth essentially no additional points, while Kingslayer onto a 12-provision card will deprive an opponent of 15 – (15*7/24 + 7*17/24) = 5.67 points of value. This card is binary, with p=8/25 of a good result. The variability for such a card can be computed as 5.67*Sqrt(8/25*17/25) = 2.64 which is within the tolerance for card variability of 2.975. I conclude that Kingslayer does not introduce excessive variability. Moreover, I would argue that a card which plays for substantially under the average value can be granted more variability.
Another card worth mentioning here is Aerondight. But the randomness of Aerondight is really due to the card obtaining far more value on blue coin than on red coin. Although my methodology may be similar, I would prefer to deal with that card under coin-flip variability.
Finally, I will mention Ornate Censor. Censor is a card that can vary widely in value. But since that variance is due more to the matchup than the card itself, I will discuss it there.
Although there may be rare exceptions, I believe the developers have kept individual card variability within reasonable bounds.
Regarding the question of types of variability that are unacceptable, I think we would all agree that a card which read, “Win the match with 50% probability, otherwise lose” would be horrible. Even if it read, “Win the match with 10% probability, otherwise lose.” Even if it cost 50 provisions. Thus, one additional reasonable condition on randomness is
No single random result should determine the outcome of a match. I cannot think of any others, but I could be overlooking something.
An important recap of this section are two principles:
- The maximal standard deviation from random effects should not exceed about 12. The maximal standard deviation of individual card should not exceed 3.
- No single random effect should be so powerful as to win the match regardless of other considerations.
I will use these principles in the following sections.
Coin-Flip Variability:
At the start of any game of Gwent, one player must go first. That player is chosen by coinflip, which is random. Based both upon playing and (presumed) developer statistics, the stratagems accurately compensate the player who goes first for the disadvantages associated with that role. So, when I refer to coin-flip RNG, I refer to the ability of decks to take advantage of coinflip – to play either better on red or on blue coin. Because players have control over the design of their own decks (and could create decks equally good on either coin), I will consider the randomness of getting the wrong coin against a player who has deliberately teched to take advantage of a certain coin-flip.
Because deck design is very wide open, it is difficult to discuss every case – I will group cases – often making general, crude estimations (in most cases, these estimates are good enough).
First, why does coin flip matter in the performance of a deck? There are three possibilities: 1. The deck has a weak first round. Usually, these decks have some form of carryover that helps with later rounds, or the deck’s bronze units are particularly weak and gold units must be conserved. 2. The deck strongly prefers either first or final say. 3. Particular cards in the deck gain more power under one coin than under the other.
If you have a weak first round, on red coin, you can always simply pass, giving up round control, but never card advantage. But on blue coin, you risk losing on even. I will simplify this case by supposing the RNG is so strong that you always lose on even on blue coin and have no ill-effects on red coin. I have earlier observed that a turn is generally worth about 10 to 11 points. Losing on even effectively costs you one turn. A coin flip thus has a 50% chance of costing nothing and a 50% chance of costing, say, 11 points. The 11 points is borderline, but within a range that can be reasonably made up by quality play. I would not consider this a problematic level of RNG – especially since you don’t have to play a deck that has a weak first round.
Decks that strongly prefer first or final say tend to prefer blue coin. Winning the first round with reasonable commitment is essential to gaining round-control to enable final say. The five-point average stratagem value allows those decks a better chance of winning the first round with equal commitment. If your deck strongly favors either first or final say, and you don’t like the coin-flip RNG, again, I say simply change your deck. But here you could be facing an opponent whose deck does. But the 5-point stratagem value is well within reach of points possible by superior play. Thus, again, I don’t the coin-flip RNG as problematic.
Finally, certain cards can be chosen to exploit coinflip. Most striking here is Aerondight – an echo special card that begins at 0 damage and adds one point of damage each time you end your turn with a lead. On blue coin, at turn end during round 1, you will typically have played 1 card and 1 stratagem more than your opponent. That should almost always allow you to be ahead. On red coin, at your turn end, you will typically have played 1 stratagem less than your opponent. If your deck is high tempo (Which Aerondight decks should be), but your opponent’s is not, it is reasonable to catch a 5-point stratagem in 2 turns. If both decks are high tempo, you might never catch up. Thus, on blue coin, Aerondight will gain value equal to the number of turns in round 1. Let’s assume that to be four. Generally it is to an opponent’s disadvantage to push an unwinnable first round deep. That makes the difference between a good and a bad Aerondight about 4 points after round 1. But if you win round 1 (it is safe to expect the blue-coin Aerondight player will always do so and a red-coin Aerondight player will do so only if the opponent plays a low tempo deck), you can push round 2 as deep as you wish (I would assume about another 7 rounds). Because you go first this round as well, I would assume a high tempo deck can maintain a lead (having played an extra card) for about 5 rounds. That gives another 5 points of added value to Aerondight. Thus, your Aerondight plays for about 9 points by the end of round 2. If you go second on a 7 turn round 2 against another high tempo deck, your odds of holding a lead at the end of a turn are roughly even. So, a blue coin Aerondight against another high tempo deck only gains another 3.5 points of value – after as little as zero in round 1. This makes a 5.5 point difference by the end of round two (an amount likely to close a little by the end of round three as you would go first on round 3). Thus, with echo, Aerondight under blue coin will play for about 11 points more than Aerondight under red coin – if it is playing another high tempo deck. Against low tempo decks, the difference will be significantly less. As argued in the previous section, an individual card should have standard deviation no more than about 3. To obtain an estimate on the variability of Aerondight, we need to know the statistical distribution of the difference between its value on blue vs. red coin. And this is very complex. In situations like this, a three estimate approach often used: a maximum, minimum, and most likely value. Only the maximum and minimum value are used to estimate the standard deviation – and it is the difference in these values over 6. From our previous analysis 11 would be the difference between ideal (blue coin) conditions and awful (red coin conditions). 11/6 is only 1.83. This could very well be an under-estimate (I’m not convinced that a Beta distribution, which the three-point approach estimates, is appropriate here), but it does not suggest that coin-flip makes Aerondight too random.
Again, with coinflip variability, I believe the developers have done a good job.
Match-up Variability:
Rock-paper-scissors is a perfectly balanced game. It is also no fun. The outcome is totally determined by the matchup (paper always beats rock, etc.).
Matchup is another very complex variable. I am OK with a good deck defeating a bad deck 100% of the time. I am not OK with good deck #1 defeating good deck #2, good deck #2 beating good deck #3, and good deck #3 defeating deck #1 all with 100% certainty.
Match-up RNG can be caused by the general deck, or by a specific card. Discussing general decks at this point seems very involved, and this article is already much longer than I intended – I will look only at individual cards – in much the same context as I did Aerondight in the previous section. I have chosen three cards that I believe (from playing) are particularly bad.
Let me start with my original, despised card – Cahir. Most good players do not include him in decks because he is easily countered. But that assumes either a deck able to get through a defender and able to dish out tall punish, or a deck that does not rely upon boost. If immediately countered (or if no boosts are played, Cahir is worth 5 points. If uncountered (e.g., against something like a Viy deck), Cahir could easily play for well over 65 points. Let’s suppose these are the only two options, and the probability of 65 points is 0.05. Note that Cahir’s expected value is (0.95)(5) + (.05*65) = 8 which is not very good for a 9 provision card. But the standard deviation of Cahir’s value is 60 Sqrt((0.05)(0.95)) = 13.1 If, as I have previously argued, the variability of a single card should not exceed 3, this is outrageous. One must either argue that no deck unable to counter Cahir is a bad deck (and I think this is a difficult argument to make because certain very successful decks such as VIY or Priestesses cannot do so), or one must acknowledge that Cahir is a horribly binary card capable of single handedly winning matches against good decks.
Another card I believe to be very binary is Sihil. Sahil can be held to 1 point of value if it can never hit a one-point target. Decks with self-damage or consume can often manage this. Decks with substantial boos can manage this. Other decks may have no recourse against decks able to spawn one-point units on their side of the board. Against these decks, the value of Sihil is often only limited by the number of bronze cards brought to hand. If 12 bronzes can be used, Sihil has value 1 + 2 + 3 + … + 13 = 91. Worst case, it has value 1. Here I think a Beta distribution is not a horrible approximation to Sihil’s point value distribution. The standard deviation is estimated to be 90/6 = 15. This argues that Sahil is even more binary than Cahir. But there is an important difference. Let’s suppose that it’s most likely that Sihil gets used 4 times (which I think is an underestimate). It’s most likely value is then 10. Using the three-estimate approach, its expected value is (1 + 4(10) + 90) / 6 = 21.8. This is well above the expected value of typical 12-point cards. I admit, Sihil requires considerable deck building sacrifices, but unlike Cahir, one could argue that Sihir is OP as well as binary.
Finally, I want to examine a new card that I think has been overlooked in the flurry of complaints about the Forgotten Treasures card drop. Ornate Senser has insane potential value – at least in decks that can naturally use 1 power units and that generally don’t go tall. If a 26-power enemy unit trades power with a 1-power ally, the value of Senser is 50 points! Let’s consider that our optimistic estimate of value. Against an opponent that doesn’t go tall and that has its own 1 provision units, Ornate Senser may need to be discarded for 0 value. Supposing a typical deck has a tallest unit that gets to 10 power. Call this the most likely estimate. Then using the three-point estimate scheme, Sensors average value is approximately (0 + 4(10) + 50) / 6 = 15, which is really too good for an 8 provision card with no set up. And its variability 50 / 6 = 8.33, which is also far too large. I am led to the conclusion that Censer is also both OP and binary. And badly so.
Conclusion:
I realize that, as I wrote this article, it morphed into something far different than my original intent – and far longer. For that, I am sorry. But I think the analysis here is of significant value. It not only illustrates how statistics and probability can be used to analyze certain Gwent related questions, it provides a methodology for considering how much randomness is appropriate in a quantitative way. And despite reliance upon very crude estimates – I find certain conclusions very concerning. For those who have been vehemently complaining, they are no surprise. For those of us taking a more moderate stance, I hope they are enlightening.