Post a reply

For all you stat nerds...

Postby acesinc

...and maybe re-ignite the never ending war of the Ronnie/Stephen/Judd fanboys. And let me add, if you don't wish to read all my drivel below, just click on the document linked at the end and look at pages 15 and 19, and perhaps read the Conclusion section to understand it better.

I rarely originate a thread, usually just poke my nose in probably often where it's not wanted. But this one is something of a source of pride for me that I had no idea about until just a couple weeks ago. Any who have read my posts may remember that my main playing partner is my son, who isn't home much these days as he is off at his fourth year at University. His major at school is Statistical Analysis, which is mostly like a foreign language to me, in that I don't understand it all that well. As he is getting closer to finishing Uni and going out to make his mark in the world, he decided he wanted to do a project for his CV, something to demonstrate his talent, but unique, separate from his schoolwork. As I said, I had no idea about this, but he decided this project would involve snooker.

My (limited) understanding of statistical analysis is that it is the collection and manipulation of data in order to understand current trends and potentially make accurate predictions for future trends. Working with a lot of numbers, and understanding what numbers are important, and why those numbers are important. This field is already huge and growing with practically unlimited applications. I am sure he will do well after school, but he is currently deciding if he wants to stay in school a while longer to pursue a Masters Degree.

So for anyone who understands this sort of thing (and I am guessing there are a few of you), he developed a modelling program using Python (I think that is accurate....I don't really know), and his model looks at available stats for the players going back as far as he could find records. The big problem is that record keeping is very limited, especially as we go further back in time. But using this extremely limited data, he has developed a surprisingly accurate model (in my humble, subjective opinion at least). For the paper I will direct you to, he just went back as far as 1988 and looked at the stats of all of the qualifiers leading up to the World Championship in order to rank the likelihood of each of the players to win it all that year. In my opinion, the most interesting thing in this whole document (20 pages total) is a chart on page 15 where you will see a bunch of yellow and blue highlights. What this chart shows is the model's predictions in order of the top 5 players statistically most likely to win the World Championship in that particular year. Three takeaways of this chart are: 1) The incredible dominance of Steve Davis, Stephen Hendry, and Ronnie O'Sullivan as being the number one seed every single year except one. We all instinctively knew this, but here is numerical proof. And 2) The model's pick for the top seed to be winner was correct in 14 out of 32 years (yea, okay, makes sense.....those of course are the summation of ROS, SH, and SD wins, but again, here is numerical proof for it). But more impressive I think is that the model had the actual winner placed within the top five most likely to win in 27 out of the 32 years, good information to have if you're a gambler. Yes, many winners are obvious in hindsight, but this model picked out for instance the likes of Ken Doherty, Peter Ebdon, and Shaun Murphy as being likely to win the WC. And 3) Strange how the number 2 and 3 seeds rarely end up winning the tournament; only 3 times in 32 years. It's nearly always the favourite or one of the underdogs to win, not often one of the "wannabe" close contenders. Why is that? Probably don't want to bet on numbers 2 or 3 to win it.

Of course, most people will not want to read through this whole document, but there is surprise twist at the end that I think a lot of people on Snooker Island will be interested in and will have an opinion about. Essentially, the model allows for the players from different years to "play against each other" in a manner of speaking by using their known stats to calculate the statistical likelihood of the outcomes of head-to-head match ups. And so the model runs a hypothetical "Super Championship" and so ranks the players across the generations in the order of their likelihood to win it. That winners' chart is on page 19 but to prepare for it, you may wish to start reading at the beginning of the section on page 17.

The actual paper can be found here: ... alysis.pdf

Feel free to offer any opinions you have about the hypothetical result.

Re: For all you stat nerds...

Postby chengdufan

I really enjoyed reading this. Thanks for sharing! The analysis is clear and easy to follow. The strength of the essay I'd say is in the quality of the writing as much as the actual stats and analysis.

This takes me back to my uni days when statistical analysis was my main focus. I wish I'd been able to do something half as good as this back then :emb:

Re: For all you stat nerds...

Postby lhpirnie

Yes, it's a fine piece of work. I've also been involved in maths and stats for nearly 30 years, although I mainly do algorithm/quant stuff for banks these days.

The problem is that many people will interpret stats results subjectively. Let's just make one thing clear: you cannot use stats to definitively compare players from different eras by using result data. You might be able to achieve partial success by analysing shots, perhaps even biomechanical data. So the 'GOAT' debates (in any sport really) can't be answered by stats.

It might be possible to do it in something like darts, but even then I would argue that without direct matches between the strongest players, it's not possible to see who handles the pressure better.

Aces: if your son wants a snooker-related project, he could implement an Elo rating system for snooker, to replace the stupid 'money list' nonsense we currently are stuck with. There are just about enough results available via's API to do this for a wide class of players (amateurs and professionals). I haven't ever found the time to go deep enough into it myself. A global ranking system would completely transform the amateur/pro-am circuit and potentially save the game of snooker, even if it were 'unofficial'.

Anyway, best of luck to your talented son!

Re: For all you stat nerds...

Postby acesinc

Thanks for your feedback guys, I will pass it on. Especially thanks for your input LHP even though I don't understand the technical side of it at all, but Sam will. I had to look up what an Elo rating system is and I found a James Grimes video on it (love that guy!).


On a different note, a personal interest of mine in starting this thread was to see about the agreement/disagreement with the final rankings in the Super Champion list. Now, I don't understand exactly how this modelling thing works, but Sam has promised to show me the details when he next gets home. As I understand it, he ran the model by randomly selecting a draw of the players from the 32 year sample against each other (humorously, two "Stephen Hendry"s knocking out two other "Stephen Hendry"s in the first round). And the model plays out until we determine the pinnacle of a performance within the sample. Of course, I do watch a bit of professional snooker, but that is mainly to incorporate what I learn into my own game. And that is the genesis of my own technical writing with Provisional Colours and Club 74. But I am not a great spectator of the game; I don't have the names and years and performances memorized like so many here on the Island do.

So it seems to me that in running the model multiple times, it should in theory rank the performances in order from the greatest to the "least greatest" (I could not bring myself to say "worst" such thing when speaking of a WC winner). So here we have the first run of the model and I wonder if the expert spectators agree with the results. So to save the trouble of having to look up the list in the paper, I will repeat it here and people can chime in if they agree or disagree with the top five, or the bottom five, or any other input. So the list of top performances from 1988 to 2019 per the model:

Year Name
2008 Ronnie O'Sullivan
2012 Ronnie O'Sullivan
2019 Judd Trump
2004 Ronnie O'Sullivan
2013 Ronnie O'Sullivan
1995 Stephen Hendry
2017 Mark Selby
1994 Stephen Hendry
1996 Stephen Hendry
1993 Stephen Hendry
1992 Stephen Hendry
2016 Mark Selby
2001 Ronnie O'Sullivan
2005 Shaun Murphy
1989 Steve Davis
2014 Mark Selby
1990 Stephen Hendry
2011 John Higgins
1998 John Higgins
2009 John Higgins
2000 Mark Williams
1999 Stephen Hendry
1988 Steve Davis
2015 Stuart Bingham
2007 John Higgins
2010 Neil Robertson
2002 Peter Ebdon
1997 Ken Doherty
2003 Mark Williams
2018 Mark Williams
2006 Graeme Dott
1991 John Parrot

This list is not in any way anyone's "opinion" of the best is the results generated by the computer model. And me, here in the USA, I rarely get opportunity to watch snooker live and only just began watching some maybe 7 or 8 years ago when the internet started finally getting better for this so I have no opinion with any authority. I ask in complete the general consensus that Ronnie O'Sullivan in 2008 was indeed the pinnacle of a World Championship showing? If yes, I will need to check it out on YouTube I suppose.

Re: For all you stat nerds...

Postby Andre147

Definately not 2008. He did play great v Hendry in the Semis.

But he was a lot better in 2004, which I believe to be his best one.

Re: For all you stat nerds...

Postby chengdufan

I think rather than doing a random draw and then playing it out (wasn't sure from reading the essay if the draw was played out once or multiple times), I'd be interested in seeing the proportion of wins for each 'player' after running say 100 iterations of each possible draw.

Re: For all you stat nerds...

Postby chengdufan

chengdufan wrote:I think rather than doing a random draw and then playing it out (wasn't sure from reading the essay if the draw was played out once or multiple times), I'd be interested in seeing the proportion of wins for each 'player' after running say 100 iterations of each possible draw.

The kind of analysis I suggest is used on espn in predicting likely nba champions. Your son is probably aware of that. I think Kevin Pelton is the stats guy who runs those numbers.