Sunday 24 August 2008

Games about the Olympic Games

The 2008 Olympic Games are over, and as expected, China has been the country with most gold medals (but curiously, not the country with most medals!). It might be easy to think that statistically a country with over 1.3 billion people has to have at least some good athlets. There seems to be a not surprising correlation between population and results in the Olympics.

The idea: Good, we have the following belief

There is a strong correlation between the population of a country and the results in the Olympic Games


is that something real, or am I fooling myself?

The material: To do this, I didn't need too many things, as I just took data from wikipedia about the medal count and the population.

The set-up: I calculated the following score for each country:


  • gold medal = 1
  • silver medal = 0.5
  • bronze medal = 0.25


and then I got the list you can find at the end of this post. If we order it by total score (I preferred to order it by score/million people, which is more interesting), the list is not quite different from the original medal count, which is ordered by gold medals, then by silver medals, and then by bronze medals. China is still the number one, but United States is a bit nearer.

Ordered by score/million people, we see that... Jamaica, with a score of almost 3 is leading! (It pays having good short distance runers) Then we have the Bahamas, Iceland... The first big country is Australia (which has been amazing about every sport that is related with water).

I have made the following graph showing the relation between my score and the population:



Clicking on the graph you will see the original size, with the name of each country for each cross. But anyway, you will not see too much, as most of the countries are piled up in one corner. Maybe you can see better in logarithmic scale:



I have added the linear fit calculated using Origin. If we have to trust in the Origin, the correlation coeficient is 0.398.

This means that there is a weak, direct correlation between results and population. It is what I expected... but not as much as I expected (personally I thought it would have been a correlation around 0.8).

Why is the correlation so weak? I think one reason could be the fact that in most cases the medals are just one or two, which is not enough to get good statistical results, as fluctuations can change a lot the score, so we have "good" information just about countries that have at least ~10 medals

The other reason could be the fact that population is not the only important factor here. For example, the economy of a country is also important.

It's late enough now, so I will not make a graph showing the score versus the GDP, but if somebody does, I am interested in seeing the result.But just let's make one more graph. I choose only the countries that have got at least ten medals (not because they are more important than the other, but because the statistics are more accurate), and here is the result:



Interesting, isn't it? The bigger the country, the smaller the score. And here the correlation is a rather strong: -0.807. It seems that a single athlete has more chances to win a medal if he is from a smaller country (probably because there is less internal competition to qualify for the Olympics). That makes me think of the half-Togolese half-French kayaker Benjamin Boukpeti, who preferred to defend the Togolese flag instead of the French one, because it was much more difficult to qualify as part of the French team. After all, he got the first medal for Togo!

Conclusion: So it comes out that

1) big countries have more chances to get medals, because they have more people to select and train, but the correlation is much weaker than expected.
2) for the individual, being in a big country can be counterproductive, probably because there is more internal competition to qualify for the Olympics.

Hm...

OK, here is the promised table (as I am European, I wanted to see what are the results of my "bigger country", so I have added the data for the European Union below).

For space reasons, the medals are shown in format gold/silver/bronze=total. Population is in millions. And remember, score = #gold + #silver/2 + #bronze/4. (I have made also a map that you can see on Wikipedia).



Country Pop. Medals Score Score/pop.
1. Jamaica 2.714 6/3/2=11 8 2.948
2. Bahamas 0.331 0/1/1=2 0.75 2.266
3. Iceland 0.316 0/1/0=1 0.5 1.582
4. Bahrain 0.76 1/0/0=1 1 1.316
5. Norway 4.778 3/5/2=10 6 1.256
6. Slovenia 2.029 1/2/2=5 2.5 1.232
7. Australia 21.394 14/15/17=46 25.75 1.204
8. Mongolia 2.629 2/2/0=4 3 1.141
9. Estonia 1.341 1/1/0=2 1.5 1.119
10. New Zealand 4.274 3/1/5=9 4.75 1.111
11. Belarus 9.69 4/5/10=19 9 0.929
12. Cuba 11.268 2/11/11=24 10.25 0.91
13. Georgia 4.395 3/0/3=6 3.75 0.853
14. Slovakia 5.402 3/2/1=6 4.25 0.787
15. Latvia 2.268 1/1/1=3 1.75 0.772
16. Trinidad and Tobago 1.333 0/2/0=2 1 0.75
17. Denmark 5.489 2/2/3=7 3.75 0.683
18. Netherlands 16.445 7/5/4=16 10.5 0.638
19. Hungary 10.043 3/5/2=10 6 0.597
20. Lithuania 3.361 0/2/3=5 1.75 0.521
21. Armenia 3.002 0/0/6=6 1.5 0.5
22. Great Britain 60.587 19/13/15=47 29.25 0.483
23. Czech Republic 10.403 3/3/0=6 4.5 0.433
24. South Korea 48.224 13/10/8=31 20 0.415
25. Switzerland 7.637 2/0/4=6 3 0.393
26. Croatia 4.555 0/2/3=5 1.75 0.384
27. Finland 5.317 1/1/2=4 2 0.376
28. Kazakhstan 15.422 2/4/7=13 5.75 0.373
29. Azerbaijan 8.467 1/2/4=7 3 0.354
-. (European Union) 498.248 87/101/92=280 160.5 0.322
30. Germany 82.218 16/10/15=41 24.75 0.301
31. Panama 3.343 1/0/0=1 1 0.299
32. France 64.473 7/16/17=40 19.25 0.299
33. Bulgaria 7.64 1/1/3=5 2.25 0.295
34. Ukraine 46.059 7/5/15=27 13.25 0.288
35. Russia 141.889 23/21/28=72 40.5 0.285
36. Canada 33.347 3/9/6=18 9 0.27
37. Italy 59.619 8/10/10=28 15.5 0.26
38. Romania 21.438 4/1/3=8 5.25 0.245
39. Sweden 9.215 0/4/1=5 2.25 0.244
40. Spain 46.063 5/10/3=18 10.75 0.233
41. Ireland 4.339 0/1/2=3 1 0.23
42. Kenya 37.538 5/5/4=14 8.5 0.226
43. United States 304.875 36/38/36=110 64 0.21
44. Mauritius 1.262 0/0/1=1 0.25 0.198
45. Zimbabwe 13.349 1/3/0=4 2.5 0.187
46. Poland 38.116 3/6/1=10 6.25 0.164
47. Dominican Republic 9.76 1/1/0=2 1.5 0.154
48. Belgium 10.585 1/1/0=2 1.5 0.142
49. Portugal 10.623 1/1/0=2 1.5 0.141
50. Kyrgyzstan 5.317 0/1/1=2 0.75 0.141
51. North Korea 23.79 2/1/3=6 3.25 0.137
52. Greece 11.147 0/2/2=4 1.5 0.135
53. Austria 8.341 0/1/2=3 1 0.12
54. Japan 127.69 9/6/10=25 14.5 0.114
55. Tajikistan 6.736 0/1/1=2 0.75 0.111
56. Singapore 4.589 0/1/0=1 0.5 0.109
57. Serbia 9.858 0/1/2=3 1 0.101
58. Uzbekistan 27.372 1/2/3=6 2.75 0.1
59. Tunisia 10.327 1/0/0=1 1 0.097
60. Argentina 40.302 2/0/4=6 3 0.074
61. Moldova 3.794 0/0/1=1 0.25 0.066
62. Ethiopia 79.221 4/1/2=7 5 0.063
63. Cameroon 18.549 1/0/0=1 1 0.054
64. Turkey 70.586 1/4/3=8 3.75 0.053
65. China 1325.544 51/21/28=100 68.5 0.052
66. Thailand 63.038 2/2/0=4 3 0.048
67. Chinese Taipei 22.99 0/0/4=4 1 0.043
68. Togo 6.585 0/0/1=1 0.25 0.038
69. Ecuador 13.341 0/1/0=1 0.5 0.037
70. Brazil 187.474 3/4/8=15 7 0.037
71. Israel 7.303 0/0/1=1 0.25 0.034
72. Chile 16.763 0/1/0=1 0.5 0.03
73. Morocco 31.224 0/1/1=2 0.75 0.024
74. Algeria 33.858 0/1/1=2 0.75 0.022
75. Mexico 106.683 2/0/1=3 2.25 0.021
76. Malaysia 27.17 0/1/0=1 0.5 0.018
77. Iran 70.496 1/0/1=2 1.25 0.018
78. Colombia 44.513 0/1/1=2 0.75 0.017
79. Sudan 38.56 0/1/0=1 0.5 0.013
80. South Africa 47.851 0/1/0=1 0.5 0.01
81. Indonesia 231.627 1/1/3=5 2.25 0.01
82. Afghanistan 27.145 0/0/1=1 0.25 0.009
83. Venezuela 27.954 0/0/1=1 0.25 0.009
84. Nigeria 148.093 0/1/3=4 1.25 0.008
85. Vietnam 87.375 0/1/0=1 0.5 0.006
86. Egypt 75.201 0/0/1=1 0.25 0.003
87. India 1136.75 1/0/2=3 1.5 0.001


PS: I have seen that a wikipedian has made some interesting maps showing:

Sunday 10 August 2008

Between twelve and five

Two months ago or so I spent some hours at the Hodges Figgis (a bookshop I love, which is on Dawson Street, here in Dublin). I love to go to the second floor and leaf through the books of the scientific section. It is the closest thing to the Casa del Libro in Madrid. There I found a book, The Maths Gene, by Keith Devlin. On page 19 (of that edition) there is the following test:



You have to answer as quickly as you can:
1 - 1 = ?
4 - 1 = ?
8 - 7 = ?
15 - 12 = ?

And now, quickly, choose a number between 12 and 5!



Supposedly you have chosen seven (the full story can be found also here). I can't remember now what did I choose, but it wasn't seven. So the first thing I thought was "hm, this is not quite serious". But of course, you cannot say something like that and believe it straight away. An experiment is needed... hooray!

The idea: Let's test this idea:

When you ask somebody to make subtractions and then you ask him/her for a number between two numbers, he/she unconsciously keeps subtracting.


The "material": 36 people took part in my experiment (all of them friends, relatives and colleagues).

The set-up: To do this experiment, I spent one week asking people I know for subtractions and numbers. Obviously, I didn't tell them anything beforehand, or otherwise they would have been conditioned to give an "interesting" answer. I always asked them in the office corridors, at the end of a phone call and so. This way, they didn't have a lot of time to think.

Firstly I considered two groups, twelve people each:

Group A: I made the experiment exactly like in Keith Devlin's book, with the same subtractions to do.

Group B: I told them "give me a number between 12 and 5" (not asking them for any subtractions).

The measures: I made some mistakes, like asking for a number between 7 and 12, but obviously I excluded these cases.

Here are the results for the group A:

answer 5 --> 0 people.
answer 6 --> 2 people.
answer 7 --> 7 people.
answer 8 --> 1 people.
answer 9 --> 1 people.
answer 10 --> 0 people.
answer 11 --> 0 people.
answer 12 --> 1 people.

And here are the results for the group B:

answer 5 --> 1 people.
answer 6 --> 1 people.
answer 7 --> 6 people.
answer 8 --> 2 people.
answer 9 --> 2 people.
answer 10 --> 0 people.
answer 11 --> 0 people.
answer 12 --> 0 people.

May be it is clearer with a couple of graphs:

Group A:



OK, maybe it is not a 90% of people, but slightly more than one half picked up seven. This was surprising for me.

Group B:



I would say both graphs are very similar! With subtractions seven people answered seven, and without subtractions it was six people. Yes, if you say we cannot expect great statistics out of twelve people you are right. But anyway, this is indicative that asking for subtractions is not the key point here.

What is the real reason? I would say lots of people like number seven, the "lucky number", and would have answered that even if I had asked for a number between 1 and 100. There is a group of "fans of number seven", but how large is this group?

What if the main point is the order we ask for the numbers? I kept asking all the time for a number "between twelve and five", but it would be more natural asking for a number "between five and twelve". So, I extended the experiment and considered a third group:

Group C: I told them "give me a number between 5 and 12" (in this more natural order, not asking for any subtractions).

This sub-experiment was to check the following sub-idea:

If you ask for a number "between twelve and five" most of the times you get seven; if you ask for a number "between five and twelve" you keep getting seven.


And this was the result:

answer 5 --> 1 people.
answer 6 --> 1 people.
answer 7 --> 3 people.
answer 8 --> 2 people.
answer 9 --> 4 people.
answer 10 --> 1 people.
answer 11 --> 0 people.
answer 12 --> 0 people.

Or, more graphically,

Group C:



Funny, isn't it? Seven is not any more the preferred number! It would be interesting to check this with a larger group of people, to make sure that fluctuations are not fooling us.

Conclusion: Recognising that groups of 12 people to choose between eight numbers is not a great deal, we can provisionally say that


  • asking for subtractions before making the "important" question is not the key point for the number the people has to pick up when asked for a number between twelve and five (though there can be some influence smaller that the error of this experiment)

  • it seems that it is more important here the order we ask for the numbers.


Maybe this is because are used to subtract when we have a larger number preceding a smaller number. Of course, to check this idea a bit better... an experiment with different numbers (other than 12 and 5) is needed.

Of course!