For the data nerds
|
As part of a school project I scraped 117,000 climbing routes (trad and sport only), as well as everything from the "Climbing Gear Discussion" and "Climbing Gear Review" forums and did some analysis on what I found. The goal of my project was to train a deep learning model on the (labeled) route data, and use it to analyze climbers' sentiment about climbing gear in the (unlabeled) forums. Ultimately, I was able to label the sentiment of the forum post as positive, neutral, or negative with 81.6% accuracy. I have published the data here: https://www.kaggle.com/pdegner/mountain-project-rotues-and-forums Information about how I scraped the data and created the sentiment labels, as well as a short report, is on my GitHub: https://github.com/pdegner/DL_final_project A few interesting results:
2. If, for example, 5.11-, 5.11a/b, 5.11a, 5.11a R, 5.11a A3, 5.11a WI4, etc. are all considered different ratings, then there are 1070 different ratings for routes on Mountain Project. Below is a simplified histogram of route ratings. The large jump at 5.10 is due to the fact that 5.10 encompasses 5.10a, 5.10b, 5.10c, 5.10d, and all their variants, whereas 5.9 only counts 5.9 and its variants. Interestingly, while there are 172 routes labeled 5.0 and 378 labeled 5.2, there are only 17 labeled 5.1. Side note, skip if bored: one drawback of how the data was gathered is that I used the "export routes as csv function" that is built into mountain project to obtain this data. I went state by state and selected all routes from 3rd class to 5.15d, sorted by difficulty then name. Unfortunately, I could only get 1000 results at a time. So, to get the rest of the data, I looked at the max difficulty, then started the search again with that difficulty to get the next 1000 routes. However, some large states like Colorado have more than 1000 routes at a given grade. That means more popular grades may be underrepresented here. I suspect that 5.8-5.10c are slightly underrepresented in this dataset. 3. When talking about gear, the words "parallel cracks" are considered to be a bad thing. I suspect that this is because when talking about gear in parallel cracks, it is often not good. Score for "parallel cracks": .31476843 for negative, .15041038 for neutral, .00031077862 for positive. 4. All of the climbing brands that I tested have an overall rating of neutral. This could be because if a sale was mentioned with no other information (e.g. "Black Diamond cams on sale at website.com") then I labeled the example as neutral. If you download and play with this dataset, I'd be curious to see your results. SHAMELESS PLUG: I am about to graduate from UC Berkeley with a Master's degree in information and data science. If you have a job opening in the Denver/Boulder area, or remote, please send me a message. |
|
This is so cool, thanks for sharing! Any chance you might be willing to re-run the mp route scraper for bouldering problems? |
|
Not Hobo Greg wrote: I think this is a good example of "absence of evidence is not evidence of absence". Sure, nuts and hexes suck in parallel sided cracks. That's what made cams so revolutionary. They made it possible to protect the best kind of cracks: parallel sided cracks. But why bother commenting on perfect gear? Its boring. |
|
Ryan, if you send me CSVs of the bouldering problems that you are interested in I'd be happy to run the scraper for you. You can get CSVs through the route finder, but it will only give you 1000 at a time, so if you want more than that you will have to export multiple CSVs. |
|
Hi Patti, I know it's been a while. Are you still willing to run your scraper on a list of urls for boulder problems? If so, here is a list of 56k urls: https://www.dropbox.com/s/ccqg07d5c018vlx/boulderUrls.csv?dl=0 Thanks! Ryan |
|
Hi Ryan, this CSV is missing some info that my previous scraper started with (location, avg stars, route type, etc.). I'll have to tweak the scraper to get that info, which will take some time, but it sounds like a fun puzzle so I'll try to do it for you. |
|
Hi Patti, no worries if it's a pain! Thanks for looking into it... |