National Park trails

I love exploring the National Parks. Different parks contain different collections of hikes. I typically choose which park to visit based on a combination of sights I want to see and logistical constraints (e.g., travel time, cost, availability of campsites).

Once I’ve chosen a park to visit, I choose a set of trails to hike based on how much time/fitness I have (choosing hikes with distance and elevation gain within a certain range), with the goal of fitting in as many hikes as possible during my stay. Someone could also take the reverse approach – based on your time and fitness, choose a park to visit with the most suitable collection of hikes.

In this post, my goal is to explore the characteristics of different hikes available across all National Parks. To do this, I scraped crowd-sourced data of trails in National Parks from AllTrails.com in April 2018. This was a challenge – because the AllTrails.com search engine renders content dynamically, I had to use an RSelenium remote browser to access the results.

This data contains all trails the average visitor might reasonably choose to hike. The dataset contains 2,789 trails, with reviews/ratings/check-ins from 81,056 users. Some limitations of this dataset are:

Despite these limitations, there is still enough information in the dataset to answer a few questions National Park enthusiasts may have about the hiking in each park. I’ve merged this dataset with visitation data I scraped from NPS.gov.

Where are the crowds?

How crowded a trail is depends on 1) its popularity, and 2) its length. Comparing two trails with equal popularity, you are more likely to see people on the shorter trail than the longer one. The following graph plots the relationship between the popularity of each park and the total distance of its trails.

There is an overall relationship between the total trail length and number of visitors in a park (r = 0.74), which suggests that people may be choosing which parks to visit based in part on the amount of trails (although this is probably because the amount of trails is correlated with “sights to see”).

Once people visit a park, which trails are the most popular? The following graph plots each trail’s length with the number of reviews. I have excluded trails with more than 500 reviews, as well as those of over 50 miles (primarily for interpretability, but also because those trails are not prototypical for the types of hikes the average visitor might be interested in). There is a negative relationship – shorter trails are generally more popular. You are likelier to hike amongst a crowd on shorter hikes because they are more popular, and because there is less distance for people to be spread out.

This relationship makes sense – longer trails are more difficult to hike, which reduces the number of visitors who are capable enough to hike a particular trail. However, distance is an incomplete measure of difficulty. The AllTrails.com data also contains the elevation gain of each trail, which factors into a trail’s difficulty.

Which trails are the most difficult?

A trail’s difficulty is a function of its distance and elevation gain (also terrain, but the dataset does not contain that information). The following figure displays each trail’s distance and elevation gain. While there is a positive relationship (longer trails tend to have more elevation gain), the distributions are subtly different. The elevation gains of the trails are more positively skewed, with a larger proportion having minimal to no elevation gain. In contrast, the distances of trails are more spread out. Which are the most difficult trails on the graph?

To gain a standardized measure of hiking difficulty, I computed the expected time taken to complete each trail (without stops) based on Tobler’s hiking function. Short trails with extreme elevation gain may be more difficult than much longer trails with only minimal elevation gain. Tobler’s hiking function accounts for the hiking speed someone is likely to reach on terrain of different gradients. Unsurprisingly, there appears to be a negative relationship – easier hikes (that can be completed in less time) are less popular. However, there are still a number of difficult hikes that have a lot of reviews.

If you’re trying to choose a National Park to visit based on the trails you could reasonably hike, the following figure may be helpful – it displays the relative distribution of trails by difficulty (measured by hiking time required based on Tobler’s hiking function). Again, I have included only hikes that can be completed in under 8 hours.

Some parks have trails that are similarly difficult (e.g. most of Acadia’s trails can be completed in between 1 and 2 hours), while others have a greater variety (e.g. in the Great Smoky Mountains, you can hop on a trail of almost any length). Having an idea of the time it takes to hike an average trail in a park can give you an idea of how many hikes you can feasibly fit into a trip.

Get outside

Obviously, the dataset needs to be investigated more specifically to be helpful for planning a National Park trip and choosing the trails you intend to hike (I’m doing this prior to my trip to Sequoia National Park this summer). I’m hoping to create an interactive visualization that could help with this, but these visualizations at least show in principle that there are systematic patterns in the hiking and popularity of America’s National Parks.