Machine Learning Approach to Estimate Daily Bike Traffic (DBT) Using Emerging Data Source STRAVA
Abstract
Daily Bicycle Traffic (DBT) represents an indicator of bicycle activities and demand. Agencies widely use DBT to prioritize bicycle infrastructure investments, to evaluate bicycle safety, and to identify the health benefits of cyclists. However, obtaining bicycle count data that captures a wide variety of bicycle uses and riders remains challenging because most regions have limited resources (i.e. camera and inductive loop detectors) available to capture continuous bicycle counts. This sparse data problem may be solved through the development of emerging methods, which collect quality data that capture the spatial and temporal variations in bicycle counts. With the greater use of smartphones and wearable devices, the availability of bicycle data from smartphone apps like Strava provide the aggregated bicycle activity of anonymous cyclists. Strava appears particularly advantageous for bicycle volume estimation at a link-level since the smartphone app collects the individual trajectories of bicycle trips. Unlike the conventional data collection methods (i.e., manual or automatic counts using short-term count stations or a survey), which remain labor-intensive and costly, Strava appears as an economical alternative data source due to the proliferation of smartphones and low-cost GPS.
However, these emerging data sources experience a sampling bias because only the app users report their bicycle activities. This may produce spatial and temporal bias in the bicycle volumes. This research aims to estimate a network DBT using four machine learning regressions (Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN)) and two conventional approaches (Negative Binomial and Regularized linear Ridge Regression). The study fuses strava data with bike-share data collected from Portland, Oregon in 2017-2019. To overcome the sampling bias, this study integrates other data sources and variables captured in Census for DBT estimations. The model uses comprehensive explanatory variables including socio-economic characteristics; weather conditions, bicycle facilities, and land use. The proposed models train the inherent associations of independent variables without predetermined assumptions based on the spatial and temporal patterns of bicycle activities and other variables. This study also uses K-mean clustering to group network links that share similar characteristics in bicycle behaviors, which minimize the sampling bias in Strava counts and generates a representative grouping measure for network links. The developed models test their temporal transferability where the model developed using historical data collected in 2017 is applied for 2018 and 2019.
Results indicate that the forward ensemble of the tree model (RF) performs best to estimate daily bike volumes with a maximum absolute percentage error (APE) of 12%. The transferability test shows 22% APE when applied to 2019. All the other machine learning models show transferability risks due to their lack of capacity to handle the underlying complex relationship of high dimension, variable, and biased data from year to year. This study also finds that the data fusion of static, bike share and strava data significantly improves bike volume estimations. The developed model provides an insight for stakeholders on how to use emerging technologies and advanced modeling techniques for bicycle data collection and DBT estimation.
Machine Learning Approach to Estimate Daily Bike Traffic (DBT) Using Emerging Data Source STRAVA
Category
New Mobility Services
Description
Presenter: Stephen P. Mattingly
Agency Affiliation: Professor, Department of Civil Engineering, University of Texas at Arlington
Session: Technical Session D3: Planning Through Change: Transportation Demand Management Applications
Date: 6/2/2022, 10:30 AM - 12:00 PM
Presenter Biographical Statement: Dr. Mattingly is a Professor in Civil Engineering at the University of Texas at Arlington (UTA). He previously held positions at the University of Alaska, Fairbanks (UAF) and University of Southern California. He has authored more than 135 technical papers, conference proceedings, research reports, and book chapters. While at UAF (2000-2002), he served as the PI or co-PI on six projects. Since joining UTA, he has served as the PI and co-PI on over fifty-five projects including over twenty federal awards and fifteen state Department of Transportation (DOT) projects. His most recent research projects address a variety of interdisciplinary topics including transportation equity, transportation mobility for the transportation disadvantaged, developing an app for crowd-sourcing bicycle and pedestrian conflict data, transportation public health performance measures, data fusion of bicycle count data, nudging older adults to increase physical activity using technology, and the development of planning and transit performance measures for access to opportunities, integrating sustainability into the engineering curriculum and creating an engineering sustainability minor.