Lies, Damned Lies, and Statistics
|
Having been on both apollo reed and twinkie I can definitely say apollo reed is harder of the 2. Probably more to do with the fact that apollo reed starts off overhanging and stays that way the entire way up where twinkie is slab a 3rd of the way up. |
|
Holy crap! You guys are thinking really hard about this. |
|
Brian Adzima wrote:When I started on this analysis I was not certain that the data was nominal. Nominal data would imply the difference between a 5.10a, and a 5.10b is the same as a 5.12a and a 5.12b.No, that would be interval data. Nominal data means the values are just a bunch of names, in that case, it makes sense to use mode. Climbing grades are ordered: 5.10a <5.10b<5.10c, which cases it makes sense to use median (and a few of us lobbied mp.com to use that as the consensus grade, but mostly to counter one person significantly fluff or sandbag a route), not necessarily interval (although that was the intention, and if true, one would use the mean). |
|
The point was that the routes selected must be a *representative sample.* Which basically means, chosen at random. Bigger is better, but a sample size of 30 is kinda bare minimum. But if subdividing, I'd go with more. |
|
The bias introduced in route selection is a problem. |
|
Brian Adzima wrote:Hence, the data is highly skewed towards routes I have climbed, or at least longingly stared at.Unfortunately that is not a statistically valid selection process. Results that come from a process similar to this can't be considered meaningful. There is not enough emphasis on the sampling process when teaching statistics. Most classes use datasets that have already been gathered and are basically without flaw. That is a real problem. I apologize if I sound pessimistic. I am optimistic that statistics can be taught better. |
|
Brian Adzima wrote:However, I am not sure how to test for the significance of the skew, so I won't draw any firm conclusions here.The sampling distribution of the sample skewness (assuming normality) can be found, for instance, here. Then you can just use a Wald test. |
|
Sumbit wrote:It may be a typo but Twinkie is 12a. If its not a typo your data might be off.It's a typo. That one of the few routes o the list I have not been on. Brian |
|
reboot wrote: No, that would be interval data. Nominal data means the values are just a bunch of names, in that case, it makes sense to use mode. Climbing grades are ordered: 5.10a <5.10b<5.10c, which cases it makes sense to use median (and a few of us lobbied mp.com to use that as the consensus grade, but mostly to counter one person significantly fluff or sandbag a route), not necessarily interval (although that was the intention, and if true, one would use the mean).Yeap, another mix up on my part. |
|
michael s... wrote: Unfortunately that is not a statistically valid selection process. Results that come from a process similar to this can't be considered meaningful. There is not enough emphasis on the sampling process when teaching statistics. Most classes use datasets that have already been gathered and are basically without flaw. That is a real problem. I apologize if I sound pessimistic. I am optimistic that statistics can be taught better.So how would you parse the mountain project database (manually) for routes? Randomly pull routes from the front page, randomly try to click on areas? |
|
What i took a quick look at, i chose a state at random, then entered search for routes between .10c to .13c, then manual selection criteria for routes that only have [at least] 20 feedback items. The only problem is that the feedback is ticks, but the rationale is that those routes that have enough feedback should also have grade suggestions. |
|
Buff Johnson wrote:What i took a quick look at, i chose a state at random, then entered search for routes between .10c to .13c, then manual selection criteria for routes that only have 20 feedback items. The only problem is that the feedback is ticks, but the rationale is that those routes that have enough feedback will also have grade suggestions.That works real well now that you can filter by the number of star ratings.Thanks. |
|
Just remember whatever data you pull from a site like this is wrong and needs to be lowered by a few grades. It is human nature that most humans are alot more likely to put they did something on a site like this than to have really done it. |
|
I know 12 people personally that have things ticked they either want to do or have not yet or have never done and don't plan on it. |
|
Jake Jones wrote:Since this is a statistics thread, can you break down the word "plenty" into a number?This is certainly possible with a bit of effort and a scientifically valid data collection and analysis possible. Any good estimate of a population parameter should be accompanied by an interval indicating how close you believe your estimate is to the true value. ( the +- % points you see on political polls are an example of this.) Jake Jones wrote:How about a percentage?Good call! Much more useful than the raw count. Jake Jones wrote:What percentage of climbers that you know that use MP tick routes that they haven't done?Certainly a sample of convenience. A good conversation starter! Jake Jones wrote:More importantly, how do you get all these people to admit to you that they have lied?A fabulous point! Hints at the idea that some quantities might be unknowable. Humbling. Sampling through surveys is a serious endeavor in itself. There are all sorts of biases that a statistician needs to be on the lookout for, and people not telling the truth is definitely one of them! |
|
Brian Adzima wrote: So how would you parse the mountain project database (manually) for routes? Randomly pull routes from the front page, randomly try to click on areas?Well there are lots of sampling techniques. A "simple random sample" is the most basic. Basically every route of the type you are interested in studying (say routes with over 20 ratings) needs to have an equally likely chance of being chosen. For example, say there were 100 routes on MP with over 20 ratings. You need a way to select from these routes that is actually random. Using your mouse to click on routes and "try" to be random isn't scientific. You could do something like assign all these routes a number, get a 100-sided die (or a computer that will generate random numbers), and then roll the die (or use the computer) to select the next route to be included for you. Proper sampling is not easy! |
|
I think the best way to go this is to get all the ratings, doing this by hand is gonna be real hard. doing this by scraping all of mountainproject will also be hard, but not unreasonable (I think). |
|
How about correlating users' height and ape index with the grades they assign? This would reveal a lot about the grading system itself. Especially with boulder problems. |
|
Reggie Pawle wrote: See, no sampling necessary! Just get all the damn data.Fuck. Yes. |
|
michael s... wrote: Fuck. Yes.It's still a sample (proof: today, there are more ratings on mp.com than there was yesterday). And it's still a biased sample (the original question was about people, not just people who suggest ratings on mp.com). Increasing the sample size isn't going to help the bias. Of course, I'd be interested to see the results, but I was interested in the original post as well. Has the history of science ever actually seen a SRS taken? Thanks Brian. |