22 Aug 2010

A good example of sample selection bias

In the last years Spain achieved a great success in sports. In fact, I’m very proud, and I’m a great supporter, of the Football national team [European (2008) and World (2010) champion], the Basketball national team [European (2009) and World (2006) champion], Rafa Nadal [5 Roland Garros (2005-2008, 2010), 2 Wimbledon (2008, 2010) and 1 Open Australia (2009)], Fernando Alonso [F1 world champion (2005, 2006)] and Alberto Contador [3 Tour de France (2007, 2009, 2010), 1 Giro di Italia (2008) and 1 Vuelta a España (2008)] among others.

Alfredo Relaño is a well renowned journalist in Spain; he writes everyday a nice editorial in the Spanish sport newspaper AS. Last July 25th, after the last winning of Alberto Contador in the Tour de France, he tries to answer the following question: “Why are we (Spain) so good at sports”?

Don’t you think that it is an interesting research question? In order to answer it he constructs a cross country dataset where the dependent variable is the performance in sports. His research is qualitative and he takes basically 5 observations: Football, Basketball, Tennis, Formula 1 and Cycling. After some analysis (I would say based on expert opinions) he found out three explicative variables that explain the outstanding level of Spain in sports: (1) public expenditure in sports, (2) the improvement in local sport facilities around the country and (3) the good weather.

If I were the referee of this article I would simply say that he has a tremendous bias in the selection of the sample. The selection of the observations is not random. One could substitute Football for Rugby, Basketball for Baseball, Tennis for Skiing, Cycling for Swimming and F1 for Athletics. Is still Spain outstanding in this sample? I would say not!!

To conclude, it’s true that currently Spain has the best Football national team, but it is also true that Canada has the best Ice hockey national team. But none of these two facts imply that one country is superior to the other in the rest of sports. Here you have a nice example of what the economist means by sample selection bias!!

Fede said...

Ferran, no estoy del todo de acuerdo con tu planteamiento. Frecuentemente se utiliza el ratio de medallas olímpicas por millón de habitantes como una medida de la calidad del deporte (de competición) de un país. Sin embargo -como señalas- este tipo de medidas debe ser corregida aunque creo que no como indicas. Para valorar el éxito deportivo los resultados deben ponderarse por el número de federados a nivel mundial (otras alternativas son impacto económico o social) y en si existen o no varias disciplinas para el mismo deporte. Si España obtiene, por ejemplo una medalla en fútbol es al menos, un indicador de éxito deportivo tan bueno como que Cuba obtenga 20 medallas en boxeo en los distintos pesos o que Estados Unidos obtenga 15 medallas en atletismo. En el caso de España los resultados son buenos porque ganamos en los deportes/competiciones más relevantes no por que ganemos muchas veces.