For all you stat nerds...
...and maybe re-ignite the never ending war of the Ronnie/Stephen/Judd fanboys. And let me add, if you don't wish to read all my drivel below, just click on the document linked at the end and look at pages 15 and 19, and perhaps read the Conclusion section to understand it better.
I rarely originate a thread, usually just poke my nose in probably often where it's not wanted. But this one is something of a source of pride for me that I had no idea about until just a couple weeks ago. Any who have read my posts may remember that my main playing partner is my son, who isn't home much these days as he is off at his fourth year at University. His major at school is Statistical Analysis, which is mostly like a foreign language to me, in that I don't understand it all that well. As he is getting closer to finishing Uni and going out to make his mark in the world, he decided he wanted to do a project for his CV, something to demonstrate his talent, but unique, separate from his schoolwork. As I said, I had no idea about this, but he decided this project would involve snooker.
My (limited) understanding of statistical analysis is that it is the collection and manipulation of data in order to understand current trends and potentially make accurate predictions for future trends. Working with a lot of numbers, and understanding what numbers are important, and why those numbers are important. This field is already huge and growing with practically unlimited applications. I am sure he will do well after school, but he is currently deciding if he wants to stay in school a while longer to pursue a Masters Degree.
So for anyone who understands this sort of thing (and I am guessing there are a few of you), he developed a modelling program using Python (I think that is accurate....I don't really know), and his model looks at available stats for the players going back as far as he could find records. The big problem is that record keeping is very limited, especially as we go further back in time. But using this extremely limited data, he has developed a surprisingly accurate model (in my humble, subjective opinion at least). For the paper I will direct you to, he just went back as far as 1988 and looked at the stats of all of the qualifiers leading up to the World Championship in order to rank the likelihood of each of the players to win it all that year. In my opinion, the most interesting thing in this whole document (20 pages total) is a chart on page 15 where you will see a bunch of yellow and blue highlights. What this chart shows is the model's predictions in order of the top 5 players statistically most likely to win the World Championship in that particular year. Three takeaways of this chart are: 1) The incredible dominance of Steve Davis, Stephen Hendry, and Ronnie O'Sullivan as being the number one seed every single year except one. We all instinctively knew this, but here is numerical proof. And 2) The model's pick for the top seed to be winner was correct in 14 out of 32 years (yea, okay, makes sense.....those of course are the summation of ROS, SH, and SD wins, but again, here is numerical proof for it). But more impressive I think is that the model had the actual winner placed within the top five most likely to win in 27 out of the 32 years, good information to have if you're a gambler. Yes, many winners are obvious in hindsight, but this model picked out for instance the likes of Ken Doherty, Peter Ebdon, and Shaun Murphy as being likely to win the WC. And 3) Strange how the number 2 and 3 seeds rarely end up winning the tournament; only 3 times in 32 years. It's nearly always the favourite or one of the underdogs to win, not often one of the "wannabe" close contenders. Why is that? Probably don't want to bet on numbers 2 or 3 to win it.
Of course, most people will not want to read through this whole document, but there is surprise twist at the end that I think a lot of people on Snooker Island will be interested in and will have an opinion about. Essentially, the model allows for the players from different years to "play against each other" in a manner of speaking by using their known stats to calculate the statistical likelihood of the outcomes of head-to-head match ups. And so the model runs a hypothetical "Super Championship" and so ranks the players across the generations in the order of their likelihood to win it. That winners' chart is on page 19 but to prepare for it, you may wish to start reading at the beginning of the section on page 17.
The actual paper can be found here: http://www.acesmachinery.com/league/Sno ... alysis.pdf
Feel free to offer any opinions you have about the hypothetical result.
I rarely originate a thread, usually just poke my nose in probably often where it's not wanted. But this one is something of a source of pride for me that I had no idea about until just a couple weeks ago. Any who have read my posts may remember that my main playing partner is my son, who isn't home much these days as he is off at his fourth year at University. His major at school is Statistical Analysis, which is mostly like a foreign language to me, in that I don't understand it all that well. As he is getting closer to finishing Uni and going out to make his mark in the world, he decided he wanted to do a project for his CV, something to demonstrate his talent, but unique, separate from his schoolwork. As I said, I had no idea about this, but he decided this project would involve snooker.
My (limited) understanding of statistical analysis is that it is the collection and manipulation of data in order to understand current trends and potentially make accurate predictions for future trends. Working with a lot of numbers, and understanding what numbers are important, and why those numbers are important. This field is already huge and growing with practically unlimited applications. I am sure he will do well after school, but he is currently deciding if he wants to stay in school a while longer to pursue a Masters Degree.
So for anyone who understands this sort of thing (and I am guessing there are a few of you), he developed a modelling program using Python (I think that is accurate....I don't really know), and his model looks at available stats for the players going back as far as he could find records. The big problem is that record keeping is very limited, especially as we go further back in time. But using this extremely limited data, he has developed a surprisingly accurate model (in my humble, subjective opinion at least). For the paper I will direct you to, he just went back as far as 1988 and looked at the stats of all of the qualifiers leading up to the World Championship in order to rank the likelihood of each of the players to win it all that year. In my opinion, the most interesting thing in this whole document (20 pages total) is a chart on page 15 where you will see a bunch of yellow and blue highlights. What this chart shows is the model's predictions in order of the top 5 players statistically most likely to win the World Championship in that particular year. Three takeaways of this chart are: 1) The incredible dominance of Steve Davis, Stephen Hendry, and Ronnie O'Sullivan as being the number one seed every single year except one. We all instinctively knew this, but here is numerical proof. And 2) The model's pick for the top seed to be winner was correct in 14 out of 32 years (yea, okay, makes sense.....those of course are the summation of ROS, SH, and SD wins, but again, here is numerical proof for it). But more impressive I think is that the model had the actual winner placed within the top five most likely to win in 27 out of the 32 years, good information to have if you're a gambler. Yes, many winners are obvious in hindsight, but this model picked out for instance the likes of Ken Doherty, Peter Ebdon, and Shaun Murphy as being likely to win the WC. And 3) Strange how the number 2 and 3 seeds rarely end up winning the tournament; only 3 times in 32 years. It's nearly always the favourite or one of the underdogs to win, not often one of the "wannabe" close contenders. Why is that? Probably don't want to bet on numbers 2 or 3 to win it.
Of course, most people will not want to read through this whole document, but there is surprise twist at the end that I think a lot of people on Snooker Island will be interested in and will have an opinion about. Essentially, the model allows for the players from different years to "play against each other" in a manner of speaking by using their known stats to calculate the statistical likelihood of the outcomes of head-to-head match ups. And so the model runs a hypothetical "Super Championship" and so ranks the players across the generations in the order of their likelihood to win it. That winners' chart is on page 19 but to prepare for it, you may wish to start reading at the beginning of the section on page 17.
The actual paper can be found here: http://www.acesmachinery.com/league/Sno ... alysis.pdf
Feel free to offer any opinions you have about the hypothetical result.
- acesinc
- Posts: 538
- Joined: 20 October 2014
- Location: Crystal Lake, IL USA
- Snooker Idol: Alex Higgins [on table]
- Highest Break: 67
- Walk-On: Ripple-Grateful Dead https://www.youtube.com/watch?v=QmMjY6tXaEo