Every professional sport has its analytics and statistics department because player and team analysis translates into money. Curiously the sport of professional rugby has resisted the applied data science to performance despite the efforts of myself and some others. The owners or coaches simply do not believe that extreme analytics play a part in their view that rugby is essentially a stochastic game that cannot be predicted on a macro level. In the olden days, the coach used to take a seat, sometimes in the stands once the game began. Of course they are wrong if they believe that performance on every level can't be measured and predicted.
Even the simplest of metrics and analyses can benefit a team, or benefit someone betting on a team. I have developed a software package called RugbyMetrics to digitize a game from video so that analytics and data mining can be run on the game record. Here is a screen shot of the capture mode:
The degree of granularity goes right to the player and events. One of the biggest advantages to using analytics, is to use it to determine level of compensation based on ability in terms of peer standings. But the current crop of owners and coaches are leery of using deep-dive analytics.
The field of analytics and the mathematical underpinnings have evolved greatly in the past twenty years. As an example, a mini-tutorial and a prediction for final standings at the end of the 2015-2016 campaign, I will demonstrate the Pythagorean Analytics Won Loss Formula for evaluating team performance and ultimate standings at the end of the season.
In the early parts of the season, as it is now in the Aviva Premiership Rugby season, only 8 matches have been played. The top team, the Saracens have 36 points and the bottom team, the London Irish have 4 points. While this seems like a lot, each win garners 4 points so the chasm in between the bottom and the top doesn't seem that drastic to the casual observer. Let me quote the league rules on how points are amassed:
During Aviva Premiership Rugby points will be awarded as follows:
• 4 points will be awarded for a win
• 2 points will be awarded for a draw
• 1 point will be awarded to a team that loses a match by 7 points or less
• 1 point will be awarded to a team scoring 4 tries or more in a match
In the case of equality at any stage of the Season, positions at that stage of the season shall be determined firstly by the number of wins achieved and then on the basis of match points differential. A Club with a larger number of wins shall be placed higher than a Club with the same number of league points but fewer wins.
If Clubs have equal league points and equal number of wins then a Club with a larger difference between match points "for" and match points "against" shall be placed higher in the Premiership League than a Club with a smaller difference between match points "for" and match points "against".
OK, so you see the way that a team amasses points after the game is played. During the game, the chief method of scoring is called a try where the ball is downed behind the goal line. That gives you five points, and like football, if you kick a conversion, you get 2 extra points. You can also get a penalty kick or have a drop kick and those are worth three points. In the past the points for these various scoring methods have varied.
The current league table looks like this:
As you can see, each team has played 8 games. The Saracens have won them all and the London Irish have won only one. The Sarries, in their first place have amassed a total of 218 points for themselves across the 8 games and have yielded 81, for a difference of 151. In terms of the major scoring of trys, they have made 24 trys and 5 have been scored against them. Impressive. The Chiefs have scored more trys but are in second place. This would indicate that their defense capabilities doesn't match their offense and the 12 trys (compared to the 5 against the Sarries) proves it.
So one would look at the table and say that a few teams still have a shot at winning or getting into the top 4. For example, I admire the Saracens, but I like the Tigers very much, and RugbyMetrics was developed with Bath in mind. So what are their chances? How will it finish when it is all said and done? That's where the predictive power of the Pythagorean Analytics Won Loss Formula.
So what does this Pythago-thing-a-majig do? I am going to talk technical for a minute. If your eyes glaze over, skip this paragraph. Mathematically, the points that a team scores and points scored against a team, are drawn from what is known as independent translated Weibull distributions, In statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull. What is a probability distribution? A probability distribution assigns a probability to each measurable subset of the possible outcomes of a procedure of statistical inference. And we are going to infer the win ratio based on how often the team scores and how often they are scored against. That is the end goal of what we are doing. Here is an infographic of the proof of Goals (trys) for and Goals Against are calculated as a Weibull distribution. Through my math trials I have found a proprietary exponential power to calculate the Pythagorean Won Loss formula.
OK, so all of the theory aside, what do the results look like? Who will rise to the top and who will sink to the bottom when it is all said and done?
Currently the standings, early in the season look like this:
When we extrapolate winning performance based on points scored for and against using the Pythagorean Won Loss Formula, the predicted outcome changes the order. The Won/Loss numbers are the decimal places. Using their scoring record, the Sarries are expected to win 86.9 times out of a hundred against their Premiership rivals.
Note that the Northampton Saints jumped from 5th place to 3rd dropping the Harlequins by one, and the Tigers drop to 5th, while the Bath rises to 6th. The top two and the bottom three teams are where they should be. Poor old Gloucester drops to 8th place.
Lets take this analysis a little deeper though. The above Pythagorean Won Loss formula is based on total points. The main offensive ability results in trys worth either 5 or 7 points depending on whether the conversion is made. Luckily we have the stats for trys for and against eliminating the points for drop kicks and penalties. This reflects the upper boundaries of where the team would end up based on try scoring ability. Calculating the Pythagorean Won Loss formula using the try data alone, again gives hope to certain fans.
The top three remain the same as the total points analysis. Based on trys alone, Bath Rugby, currently in 8th has the ability to finish in 4th place. Worcester climbs up one over Sale, and the bottom two remain the same.
So if you are a betting man, you have a decent chance of taking this list to Ladbrokes or some other wagering shop, and make a couple of quid on the good old Pythagorean Won Loss formula.
Most of the rest of RugbyMetrics is player-centric for the benefit of measuring individual performance. I predict that the first team that adopts this, and brings home the silverware, will open the flood gates for performance analytics in professional rugby.