Mountain Project Logo

For the data nerds

Original Post
P Degner · · anywhere · Joined Nov 2015 · Points: 242

As part of a school project I scraped 117,000 climbing routes (trad and sport only), as well as everything from the "Climbing Gear Discussion" and "Climbing Gear Review" forums and did some analysis on what I found. The goal of my project was to train a deep learning model on the (labeled) route data, and use it to analyze climbers' sentiment about climbing gear in the (unlabeled) forums. Ultimately, I was able to label the sentiment of the forum post as positive, neutral, or negative with 81.6% accuracy.

I have published the data here: https://www.kaggle.com/pdegner/mountain-project-rotues-and-forums

Information about how I scraped the data and created the sentiment labels, as well as a short report, is on my GitHub: https://github.com/pdegner/DL_final_project 

A few interesting results:

  1. There are a lot of mixed feelings about hexes (0=negative, 1=neutral, 2=positive). "Hex" and "Hexes" are the most commonly mislabeled climbing-specific words in my dataset, despite the topic being not that popular. 

2. If, for example, 5.11-, 5.11a/b, 5.11a, 5.11a R, 5.11a A3, 5.11a WI4, etc. are all considered different ratings, then there are 1070 different ratings for routes on Mountain Project. Below is a simplified histogram of route ratings. The large jump at 5.10 is due to the fact that 5.10 encompasses 5.10a, 5.10b, 5.10c, 5.10d, and all their variants, whereas 5.9 only counts 5.9 and its variants. Interestingly, while there are 172 routes labeled 5.0 and 378 labeled 5.2, there are only 17 labeled 5.1.

Side note, skip if bored: one drawback of how the data was gathered is that I used the "export routes as csv function" that is built into mountain project to obtain this data. I went state by state and selected all routes from 3rd class to 5.15d, sorted by difficulty then name. Unfortunately, I could only get 1000 results at a time. So, to get the rest of the data, I looked at the max difficulty, then started the search again with that difficulty to get the next 1000 routes. However, some large states like Colorado have more than 1000 routes at a given grade. That means more popular grades may be underrepresented here. I suspect that 5.8-5.10c are slightly underrepresented in this dataset. 

3. When talking about gear, the words "parallel cracks" are considered to be a bad thing. I suspect that this is because when talking about gear in parallel cracks, it is often not good. Score for "parallel cracks": .31476843 for negative, .15041038 for neutral, .00031077862 for positive.

4. All of the climbing brands that I tested have an overall rating of neutral. This could be because if a sale was mentioned with no other information (e.g. "Black Diamond cams on sale at website.com") then I labeled the example as neutral. 

If you download and play with this dataset, I'd be curious to see your results.

SHAMELESS PLUG: I am about to graduate from UC Berkeley with a Master's degree in information and data science. If you have a job opening in the Denver/Boulder area, or remote, please send me a message.

Ryan Angus · · Unknown Hometown · Joined Jan 2021 · Points: 0

This is so cool, thanks for sharing! Any chance you might be willing to re-run the mp route scraper for bouldering problems?

Petsfed 00 · · Snohomish, WA · Joined Mar 2002 · Points: 989
Not Hobo Greg wrote:

How can a parallel crack have bad gear? Are people placing nuts at the creek? Perhaps this is a great example of why you can’t quantify certain things in life.

I think this is a good example of "absence of evidence is not evidence of absence".

Sure, nuts and hexes suck in parallel sided cracks. That's what made cams so revolutionary. They made it possible to protect the best kind of cracks: parallel sided cracks.

But why bother commenting on perfect gear? Its boring.

P Degner · · anywhere · Joined Nov 2015 · Points: 242

Ryan, if you send me CSVs of the bouldering problems that you are interested in I'd be happy to run the scraper for you. You can get CSVs through the route finder, but it will only give you 1000 at a time, so if you want more than that you will have to export multiple CSVs.

Ryan Angus · · Unknown Hometown · Joined Jan 2021 · Points: 0

Hi Patti,

I know it's been a while. Are you still willing to run your scraper on a list of urls for boulder problems? If so, here is a list of 56k urls: https://www.dropbox.com/s/ccqg07d5c018vlx/boulderUrls.csv?dl=0

Thanks!

Ryan

P Degner · · anywhere · Joined Nov 2015 · Points: 242

Hi Ryan, this CSV is missing some info that my previous scraper started with (location, avg stars, route type, etc.). I'll have to tweak the scraper to get that info, which will take some time, but it sounds like a fun puzzle so I'll try to do it for you.

Ryan Angus · · Unknown Hometown · Joined Jan 2021 · Points: 0

Hi Patti, no worries if it's a pain! Thanks for looking into it...

Guideline #1: Don't be a jerk.

Other Sports
Post a Reply to "For the data nerds"

Log In to Reply
Welcome

Join the Community! It's FREE

Already have an account? Login to close this notice.