Predicting future crime poses a particularly interesting data challenge because it has both geospatial and temporal dimensions and may be affected by many different types of features like weather, city infrastructure, population demographics, public events, government policy, etc.
In September 2016, the National Institute of Justice launched a Real-Time Crime Forecasting Challenge to predict crime hotspots in the city of Portland, Oregon. Our team (Maxime and I) made a submission to the challenge. Our goal was to use both geospatial and temporal data to understand underlying factors of crime and predict future hotspots. All of the data are open source, making the project fully reproducible. And in the end, we are very excited to have been announced as one of the winners of the challenge!
How did we do it? In a series of two blog posts, I will walk through our approach to the challenge, which was ultimately a combination of machine learning, time-series modeling, and geostatistics (a combination that was more effective at predicting future crime hotspots than any of these techniques by themselves). This first post will focus on the data we used, and the next post (coming soon) will delve into the analysis of that data.