Two of my favorite books of all time are Baseball Between the Numbers: Why Everything You Know About the Game Is Wrong and The Book: Playing the Percentages in Baseball. I actually went into reading your book expecting something much closer to those (especially since the titles of one of them is so similar). Both of those books take one aspect of the game or topic in each chapter and then develop a mathematical model to either prove or disprove the idea. Your book is structured differently and doesn’t go into as much mathematical detail, or focus into such specific problems/aspects. It does seem to have a much stronger narrative arc though. I was wondering about that decision.
Did you avoid going into much deeper discussions and more intricate problems in order to give a more high level look at the game and avoid turning off readers with too much math?
Was that something you would have liked to have done but the complexity of what data there is, plus the simple lack of data available keeping you from doing it?
Or am I just reading way too much into this?
You're exactly right--the technical level of the book was something that we thought long and hard about and were very conscious of. In particular, we had to decide who our target reader was and what scope to give the book. Other than the furrow of Soccernomics, the public field of soccer analytics is largely unplowed, in marked contrast to baseball which has had years of statistical tilling, and so we thought that it was important to limit the technical material and increase the narrative. The larger point was to convince our target reader--an English, 30ish football fan--that numbers have always been an integral part of the game and that analytics can shed light on what happens on the pitch. That being said, there is a lot of deep analysis behind many chapters in the book (even if it's not explicit), and we hope that the most sophisticated soccer fans in the UK, US, Germany, Holland, etc. will be engaged and entertained and learn some things.
The team I write about (Fulham) has been pretty much a mid table club (other than 07/08) for their entire stay in the premier league. Their wage bill seems to accurately reflect their position. Their new owner seems to favor analytics and numbers, but I'm unsure of the effect that it will have on a club in their situation.
Given how much wages account for performance in the league and the massive jump in the payrolls between mid table and the top 6, how much difference do you believe analytics can make on a team?
Would it be possible for a top flight team to become the Tampa Bay Rays of England and consistently outperform the big spenders, or is it going to be limited to spots at the margins and finishing one or two places higher than the payroll would dictate?
We absolutely believe that analytics can make a huge difference for a club, and the Tampa Bay Rays are a great analogy. As Jonah Keri detailed in The Extra 2%, the Rays seek to accumulate small advantages in every area and in each decision of the club. This approach is even more necessary in soccer because of the role of luck in the sport and the relatively small sample size that a season represents. Unlike baseball where these 2% advantages can be realized over 162 games and 500 at bats, the advantages in soccer are more likely to remain slices of unseen probability. As a result, it is a much bigger challenge and demands more patience and more commitment from owners and management. The flipside, however, is that the club that gets it right, and no one has quite yet, is likely to enjoy an advantage for a much longer time than a baseball team since it will be much harder for the competition to mimic.
One of the most fascinating things I read in the book was the idea that passing skills were nearly identical between players. I’m not sure if you are familiar with the work that Voros McCracken did with DIPS in baseball, but this struck me as a very similar idea. Neither is something that makes sense intuitively but both are born out by the data. One of the things about DIPS is that while pitchers have little control over what happens to the ball after it is hit in MLB, in the minor leagues the pitchers do seem to exert control. The common wisdom is that any pitcher without this skill is selected against as they make their way through the system, thus only players with this skill ever make the major leagues.
Do you think that passing skill in soccer would be similar i.e. the poor passers simply don't make it to a high level?
If a similar study to the one mentioned in the book were done for weaker leagues, say MLS, League 2, or something like USL-Pro; do you think that passing skill would remain constant or would their be much more variation between players?
Great points. Selection plays a huge role in this finding. Players who can't kick the ball twenty yards to a small stationary target have only two options: goalkeeper and the midfield at Craven Cottage! (Sorry had to get one jab in against The Whites (or are we calling them Jaguars East?).) Our instinct would be the same as yours that variance would increase as you move down the pyramid to lower quality leagues. A caution that one might have to go down a bit of a distance is that the original study by Jaeson Rosenfeld of StatDNA was done with data from Brazil's Serie A, already a few notches below the EPL. One thing that American sports fans have to get their heads around is that soccer is truly global and that the talent pool is measured in the thousands of millions so the raw counts in the right hand tail of skill distribution are staggering. The US's favorite sport, football, is sourced from maybe a handful of millions with 1.1 million high school students playing the game as of a couple years ago. The world is far more full of great midfielders than it is of great quarterbacks.
Finally, what drew me to soccer analytics in the first place was the newness of it. There seems to be so little known and so much that can be discovered. I quickly came to realize just how hard the problem is. There is so much data generated during a game and so much of that data remains locked into proprietary systems that it seems daunting to even know how to tackle a problem; much less to find a problem to tackle.
Is there anything you believe a lay analyst could do to help push the understanding of the sport forward, or are the clubs already so far ahead that individuals or groups of interested fans are never going to be able to catch up?
The availability of data is a real problem at the present time in two ways. First, lay analysts are starved for numbers, and second, club analysts are drowning in them. The truth is that most clubs do not have the skills, capacity, and systems to handle the millions of data points that they are collecting. They have a long way to go and they are not nearly as far ahead as you're guessing. On the other hand, the general public has almost no access to this data hoard. What may open the floodgates are advances in computerized video coding that might be able to produce fairly reliable x-y data for the players and the ball from the public feed of a match at very low cost. There would be a sample bias around the ball since the cameras of Sky, ESPN, et al. focus on it and not on the whole pitch but there still might be loads of inexpensive data to play with.