Wordle has taken the world by storm as it gained popularity among people from all walks of life. For those who haven’t yet had a chance to be introduced to this curious word-guessing game, there are just a few rules to know. Each player is given six attempts to guess a five-letter word. While it is improbable that the first guess will be correct – the first attempt can determine further success in the game, as every guess has the potential to uncover letters that can help get to the correct answer.
There are 2,315 five-letter words in the English vocabulary. So far, 418 words have been played. That leaves another 1,897 possible answers. The odds of getting the correct word on the first try are poor – only one in 1,897. That’s why we decided to raise our odds by relying on our knowledge of data.
Our method of acing the word-guessing game is somehow similar to the card counting commonly used in Blackjack. The idea behind card counting is to count the cards already played, allowing the player to know the cards left in the deck. While the deck gets smaller – the player is more likely to predict what card will be played next. So the game might not start with a win, but it might finish with one.
We started with counting all the possible five-letter words and all the words that had already been played. This way, we evaluated which letters were under-represented and which over-represented. We presume that the under-represented letters are more likely to appear in the upcoming Wordle puzzles.
The analysis showed that, after 418 puzzles, consonants N and S were the most under-represented, meaning that every time these letters do not appear in the puzzle, the probability that the next puzzle will contain them rises. In comparison, the consonants R and T were over-represented, which could mean that they are less likely to appear in the games to come.
The three vowels I, U and Y have been under-represented throughout the game. The vowel I appears in 28% of possible answers but has only occurred in 26%, while the semi-vowel Y is found in 18% of all possible five-letter words but so far occurred in only 16% of the words. This is a good reason to expect both vowels to appear in the upcoming games.
However, there is more nuance when picking the correct vowels at the start of the game. There are only five vowels in the English alphabet. The five vowels are A, E, I, O, U, and one semi-vowel – Y. For simplicity, we will keep Y with the vowels. All 2,315 five-letter words have at least one vowel, but most have two or more. The most common vowel to appear in any five-letter English word is E. It appears in 1056 words, meaning that around 47% of the five-letter words have at least one letter E, and 167 (about 7%) of the words have a double E. The second most common letter is A – it appears in 909 words, nearly 40%. The least common vowels among five-letter words are U only 20% and Y, occurring in 18% of the possible answers. If numbers are scary and you only wish to guess letters, then the quick takeout is that having the letter E in your first guess is a good strategy as you would have a nearly 50% likelihood of getting at least one letter correctly.
The data shows that in the first Wordle puzzles, the letters that have been over-represented were more likely to occur in consequent puzzles. However, as more words are used, the line between the likelihood of under-represented and over-represented letters is equalising. With more words being used, the possibility of under-represented letters appearing in the next puzzle will only increase.
Same as in counting cards in Blackjack, it is impossible to predict with precision what cards will fall out at the start of the deck, but the fewer cards are left in the deck, the better the predictions. Currently, only 418 five-letter words have been played in Wordle puzzles, which makes only 18% of all possible five-letter word variations. As the possible options exhaust, counting under-represented letters will increase the odds of getting the correct answer.
The chart shows the current state of letters under-represented (red) and over-represented (green) and by what degree: