All Things Techie With Huge, Unstructured, Intuitive Leaps
Showing posts with label performance analysis rugby. Show all posts
Showing posts with label performance analysis rugby. Show all posts

Performance Analytics Guide To Aviva Premiership Rugby Standings


Every professional sport has its analytics and statistics department because player and team analysis translates into money.  Curiously the sport of professional rugby has resisted the applied data science to performance despite the efforts of myself and some others.  The owners or coaches simply do not believe that extreme analytics play a part in their view that rugby is essentially a stochastic game that cannot be predicted on a macro level.  In the olden days, the coach used to take a seat, sometimes in the stands once the game began.  Of course they are wrong if they believe that performance on every level can't be measured and predicted.

Even the simplest of metrics and analyses can benefit a team, or benefit someone betting on a team.  I have developed a software package called RugbyMetrics to digitize a game from video so that analytics and data mining can be run on the game record.  Here is a screen shot of the capture mode:


The degree of granularity goes right to the player and events.  One of the biggest advantages to using analytics, is to use it to determine level of compensation based on ability in terms of peer standings.  But the current crop of owners and coaches are leery of using deep-dive analytics.

The field of analytics and the mathematical underpinnings have evolved greatly in the past twenty years.  As an example, a mini-tutorial and a prediction for final standings at the end of the 2015-2016 campaign, I will demonstrate the Pythagorean Analytics Won Loss Formula for evaluating team performance and ultimate standings at the end of the season.

In the early parts of the season, as it is now in the Aviva Premiership Rugby season, only 8 matches have been played.  The top team, the Saracens have 36 points and the bottom team, the London Irish have 4 points.  While this seems like a lot, each win garners 4 points so the chasm in between the bottom and the top doesn't seem that drastic to the casual observer.  Let me quote the league rules on how points are amassed:

During Aviva Premiership Rugby points will be awarded as follows:
• 4 points will be awarded for a win
• 2 points will be awarded for a draw
• 1 point will be awarded to a team that loses a match by 7 points or less
• 1 point will be awarded to a team scoring 4 tries or more in a match
In the case of equality at any stage of the Season, positions at that stage of the season shall be determined firstly by the number of wins achieved and then on the basis of match points differential. A Club with a larger number of wins shall be placed higher than a Club with the same number of league points but fewer wins.

If Clubs have equal league points and equal number of wins then a Club with a larger difference between match points "for" and match points "against" shall be placed higher in the Premiership League than a Club with a smaller difference between match points "for" and match points "against".

OK, so you see the way that a team amasses points after the game is played. During the game, the chief method of scoring is called a try where the ball is downed behind the goal line. That gives you five points, and like football, if you kick a conversion, you get 2 extra points.  You can also get a penalty kick or have a drop kick and those are worth three points. In the past the points for these various scoring methods have varied.

The current league table looks like this:

As you can see, each team has played 8 games. The Saracens have won them all and the London Irish have won only one. The Sarries, in their first place have amassed a total of 218 points for themselves across the 8 games and have yielded 81, for a difference of 151.  In terms of the major scoring of trys, they have made 24 trys and 5 have been scored against them. Impressive. The Chiefs have scored more trys but are in second place. This would indicate that their defense capabilities doesn't match their offense and the 12 trys (compared to the 5 against the Sarries) proves it.

So one would look at the table and say that a few teams still have a shot at winning or getting into the top 4.  For example, I admire the Saracens, but I like the Tigers very much, and RugbyMetrics was developed with Bath in mind. So what are their chances? How will it finish when it is all said and done?  That's where the predictive power of the Pythagorean Analytics Won Loss Formula.

So what does this Pythago-thing-a-majig do?  I am going to talk technical for a minute.  If your eyes glaze over, skip this paragraph.  Mathematically,  the points that a team scores  and points scored against a team,  are drawn from what is known as independent translated Weibull distributions,  In statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull. What is a  probability distribution?   A probability distribution assigns a probability to each measurable subset of the possible outcomes of a procedure of statistical inference.  And we are going to infer the win ratio based on how often the team scores and how often they are scored against.  That is the end goal of what we are doing.  Here is an infographic of the proof of Goals (trys) for and Goals Against are calculated as a Weibull distribution.  Through my math trials I have found a proprietary exponential power to calculate the Pythagorean Won Loss formula.


OK, so all of the theory aside, what do the results look like?  Who will rise to the top and who will sink to the bottom when it is all said and done?

Currently the standings, early in the season look like this:

1 Saracens
2 Exeter Chiefs
3 Harlequins
4 Leicester Tigers
5 Northampton Saints
6 Gloucester Rugby
7 Sale Sharks
8 Bath Rugby
9 Wasps
10 Worcester Warriors
11 Newcastle Falcons
12 London Irish

When we extrapolate winning performance based on points scored for and against using the Pythagorean Won Loss Formula, the predicted outcome changes the order.  The Won/Loss numbers are the decimal places.  Using their scoring record, the Sarries are expected to win 86.9 times out of a hundred against their Premiership rivals.

 Saracens 0.869224
Exeter Chiefs 0.756876
Northampton Saints 0.609814
Harlequins 0.60633
Leicester Tigers 0.563826
Bath Rugby 0.517414
Wasps 0.505625
Gloucester Rugby 0.492264
Sale Sharks 0.399586
10  Worcester Warriors 0.357453
11  Newcastle Falcons 0.206142
12  London Irish 0.193901

Note that the Northampton Saints jumped from 5th place to 3rd dropping the Harlequins by one, and the Tigers drop to 5th, while the Bath rises to 6th. The top two and the bottom three teams are where they should be. Poor old Gloucester drops to  8th place.

Lets take this analysis a little deeper though.  The above Pythagorean Won Loss formula is based on total points.  The main offensive ability results in trys worth either 5 or 7 points depending on whether the conversion is made.  Luckily we have the stats for trys for and against eliminating the points for drop kicks and penalties. This reflects the upper boundaries of where the team would end up based on try scoring ability.  Calculating the  Pythagorean Won Loss formula using the try data alone, again gives hope to certain fans.

1 Saracens 0.943933
2 Exeter Chiefs 0.811483
5 Northampton Saints 0.639505
8 Bath Rugby 0.636054
3 Harlequins 0.585137
4 Leicester Tigers 0.535957
9 Wasps 0.5
6 Gloucester Rugby 0.418684
10 Worcester Warriors 0.349619
7 Sale Sharks 0.32523
12 London Irish 0.212348
11 Newcastle Falcons 0.1612
The top three remain the same as the total points analysis.  Based on trys alone, Bath Rugby, currently in 8th has the ability to finish in 4th place. Worcester climbs up one over Sale, and the bottom two remain the same.

So if you are a betting man, you have a decent chance of taking this list to Ladbrokes or some other wagering shop, and make a couple of quid on the good old Pythagorean Won Loss formula.

Most of the rest of RugbyMetrics is player-centric for the benefit of measuring individual performance. I predict that the first team that adopts this, and brings home the silverware, will open the flood gates for performance analytics in professional rugby.

The Nate Silver of Rugby and Bitcoin


Gotta say that I scored a fairly big one in the predictions department with my RugbyMetrics.  I did a running analysis of how the teams were playing, scoring and defense stats and came up with a regression formula.  In the latest fixtures of the Aviva Premiership, Bath the number three team was playing the Saracens.  My analysis showed that the Sarries would triumph over Bath by a score of  23-16.  I tweeted that prediction a day before.

On game day, the Saracens did prevail with a score of 23-10.  That is an amazingly accurate prediction and my regression formula warrants a trip to the bookie shop.

On another tack, in a previous post, I put up a video (done by someone else) saying that BitCoin was in a bubble.  Well, Mt. Gox failed a few days later.  I predicted a Super Bowl win for the Seattle Sea Hawks earlier this year and they upset the Broncos big time.

  I should start backing my predictions with money.

RugbyMetrics Queries

I have been getting some queries via comment postings about RugbyMetrics. Some people have even been trying to find a trial download. I will be posting some sample results and white papers here shortly. In the meantime, if you have any queries, please drop me a line at:

rugbymetrics-at-gmx.com (substitute "@" for "-at-").

Line Formation Elasticity -- RugbyMetrics

A lot of objective information is falling out from the results of my Software tool called RugbyMetrics. While doing extensive statistical data mining on actual professional rugby games in the Aviva Premiership, an incredible statistic fell out of the exercise, and that was line formation elasticity.

Rugby is a game where the defense lines up across the field to defend against a similar line of offence. When a player carrying the egg finds that his forward progress is blocked, he passes left or right down the line to his team mates. If there is a hole in the line somewhere on either the defense or offence, then there is a problem.

So I decided to measure line elasticity -- how quickly the line forms or reforms after it is distorted from a play. This analysis fell out of another analysis where I did a ratio of jersey counts between attackers and defenders at the time of tackle, which had a very interesting result.

What the line elasticity measure showed, was that the more efficient that the line was at reforming, the more successful the play (both in offence and defense). This is especially evident when the team with possession grinds away for a long time with very little field gained. The opposing defensive line is very elastic at reforming and very efficient.

What frame-by-frame video also showed, was the laggards who were late at assuming their position, thus leaving holes in the line. It was very interesting.

From there, when we saw that we could identify the defensive laggards, we saw that we could assign a numeric co-efficient of line efficiency, both at a team level, and at a player level.

From there, it was a short step to rating the roster of a team, and let the results settle into a hierarchy of the best players. There are many developed measures of a players worth coming out of RugbyMetrics. The thought struck me, that if a player is negotiating a raise in his contract, one of the bargaining chips could be a RugbyMetrics analysis to show that he is in the company of the best of the breed in the Premiership. Conversely, a team could use RugbyMetrics to prove that a player asking for a raise tends more to a journeyman than a star.

Its all fascinating stuff, and is opened by the doors of data mining and performance analysis.

Toby Flood Reduced To An Equation

Toby Flood is a fly-half for the Leicester Tigers, and a rugby star in the Aviva Premiership. This is his photograph from Wikipedia:


It's almost sad, but true that Toby's running game whilst playing rugby can be reduced to a mathematical equation. If you had to describe Toby's running game performance mathematically, you would do it this way:

Obviously I am not going to tell you what x and y stand for, because it came from digitizing and sifting through mounds of data to come up with the mathematical model using predictive analytics and linear regression.

However, if you wanted to choose a player with Toby's prowess, this formula would be incredibly helpful. It was derived using my software package called RugbyMetrics which adds objective knowledge of the game through data-mining and sifting through mounds of statistics.

Click on the video below to watch Toby kick a conversion after a Tiger try. The fly-half is really good!!

Regress to Success -- RugbyMetrics

So let's suppose that you run a rugby team in the Aviva Premiership or any other professional rugby club. So you haven't qualified for the Heineken Cup and your team is full of journeymen players and you consistently sit in the cellar of the standings table. And let's suppose that you don't have a Daddy Warbucks owner that can buy you a Dan Carter and you want to create a competitive team.

So what are you going to do? You have to find young untried players who will eventually turn into Thomas Waldrom, Schalk Brits, Chris Aston or Tom Wood. How are you going to identify them when they haven't had a chance to prove themselves and amass some statistics to prove that they have the stuff of the egg-chasing gods.

You turn to the geeks, that's how you do it. How so? You regress your way to success. You would use my RugbyMetrics tool (click on this LINK to see all of the articles on RugbyMetrics). Then you would take a game film of your targeted acquisition and using the tool, digitize that player's performance. From there you would use advanced statistics to create a mathematic model (using regression and Bayesian inference) to determine if your player has the right stuff.

How does it work? The seeds of athletic greatness are sown early. However they may not become manifest because the player is not on a team that enhances his skillset, or he is blindside oriented on a team that is predominantly openside oriented. There are many many reasons, however that player will demonstrate the subtle qualities that shows that he has the key performance indicators that tend to greatness.

So what are these KPI's or key performance indicators? They are a new set of statistics that are gleaned from data mining every aspect of the game. These are proprietary knowledge to the users of the system. But as a trivial example, one finds that an Olly Barkley will average x amounts of carries, gaining y amounts of yards, in a certain ratio to the opposition yards gained. This is objective, scientific knowledge of the game of rugby that comes from the field of predictive analytics.

So once you have the three mathematical formulas gleaned from going through mountains of statistics, you can eliminate the pretenders and give yourself a roster of possible stars. This is not meant to replace the years of coaching and scouting, but rather it is meant to give the teams a scientific, valid starting point when scouting for new team members.

The interesting aspect is that the front 8 will have different formulas than the back seven, and each position will have different regression parameters in the models. Also style of play comes into effect as well. If you like a Tom Wood style of play, you would determine the mathematical model by analyzing his performance and looking for players who have similar numbers to him. It sure beats the shot in the dark method of a player that "looks good".

If you have any questions, please leave a comment and I will answer them.

RugbyMetrics Performance Analysis Themes


Data mining and predictive analytics is a wonderful thing. It gives objective insight into whatever comes under the data microscope. In this case it is rugby. The insights are fascinating. For example, take a look at this equation:



It's called a Hurst exponent, and it was derived by a mathematics God named Benoit Mandelbrot. It is a Mandelbroatian math element of fractal geometry. It was originally developed to determine how big a dam to build on the Nile River. What does it have to do with rugby?

Let's look at the analogy of the Nile River. In the case of the Nile River, one expects to find ebbs and flows of the amount of water flowing through the river based on rain and drought. The series is infinite as long as the Nile does not run dry. The Hurst exponent is used to estimate variability of the flow over time.

In rugby there are ebbs and flows during the game in terms of meters gained on the pitch by a particular team. Of course, the time is not infinite, but it is 80 minutes. During those 80 minutes the game flows back and forth. If I calculate the variability of meters gained per play during a game using the Hurst exponent, it infers different things about the teams.

The Hurst exponent is defined in terms of the asymptotic behaviour of the rescaled range as a function of the time span of a time series.

Let's suppose that I analyze a video of a rugby game and just for fun, determine the Hurst exponent of the opposing team. Let's suppose that their variability in meters gained on the field is higher than my team. There are a few reasons why a team is highly variable in terms of meters gained in play. Finding that reason shows a vulnerability and something for the opposition to exploit.

If I take this same concept and apply it to a finer degree on granularity at the player level, I can determine by comparative analysis if a player is ready to play or still not up to snuff after an injury.

Analysis and number crunching of this kind yields an amazing amount of objective knowledge about the game that was previously unknown. And this is the type of knowledge that gives teams an incredible advantage over mere human coaching.

As for software, the neat thing about this stuff, is that SQL stored procedures and views are the input from the data mart to determine these things. One needs the game dissected very finely and then non-jagged data for the math transforms to operate on the returned cursors from the data cubes. Data can be made to spill its guts.

RugbyMetrics Performance Analysis Software

(click on pic for a larger image)

I am about to test my new rugby video and performance analysis software on an Aviva Premiership professional rugby team.

This entire exercise is akin to Sabremetrics in North American professional baseball where one uses data-mining and intense statistical analysis to discover scientific, objective knowledge about the game that was previously unknown.

For example, in baseball a Seattle-based team with a limited budget of $40 million had to compete with a team like the New York Yankees who could afford marquee players with a salary budget of $125 million. It was found that conventional wisdom about assembling a team of good players was neither conventional, nor wisdom.

In the old convention, all of the scouts looked for a young pitcher with a blistering fastball and a high strikeout count. Through data analysis, it was found that not only was a pitcher with a high ground out ratio more valuable, but he was also cheaper on the open market.

This intensive analysis has been done with baseball, football and basketball. No one has yet done it with rugby. Rugby is a very dynamic game and stats abound, but they are all conventional stats that really do not measure what a player does minute by minute and apply that to the greater context of the play. Some software packages give you a minute by minute report, but it is meaningless unless you take it in the context of inputs, outputs and results, both on a micro and macro level. There is tons of minute-by-minute data, but very little pattern knowledge of what that data means.

I developed the piece of video analysis and annotation software that models and digitizes the entire game second by second. It is all put into a database in a proprietary format that allows extensive data-mining and statistical analysis of every second of every game.

There are a few software packages that give you a real wow-factor re-simulation of the game, and a whole pile of individual statistics about every facet of the game, but these are not very helpful in real life. What is helpful is garnering new knowledge -- objective knowledge about the sport and about your team. The way that it is done, is by taking statistics, facts, data and factors, and integrating that data into knowledge by making inferences, testing those inferences and endless cutting and re-dimensioning of data until statistically significant knowledge about the team and the game pops out. Nobody is doing this out there, and I will be the first. Like Sabremetrics, this will add objective knowledge about the sport of rugby on the macro level.

The most important aspect of this, is that the tool can isolate and chart characteristic on-field behaviour of the opposition, thus allowing the planning of a defense for a formidable opponent. Teams that are well coached consist of human beings with ingrained play patterns, and it is a valuable tactical advantage for an opposing team to know this. If you know the attack vector that is coming, you can plan for it. But it is not enough to know the attack vector, you have to know the probabilities and how it will play out, and how it changes with different parameters.


A model is being developed using Bayesian Predictive Analytics that will predict and spot the future Dan Carter superstars of rugby while they are pimply-faced teenagers.

This is a quantum leap forward compared to something like Opta stats, which dissects every game into component parts but just posts flat statistics that do not show why these events that they have dissected happen the way they do. RugbyMetrics is taking rugby performance analysis to the next level.