Public Policy Analytics is a new book by Ken Steif, Ph.D that teaches at the intersection of data science and public policy. The book is available online and eventually, in print. Designed for students studying City Planning and related disciplines, the book teaches both code and context toward improved public-sector decision making. Readers can expect an introduction to R, geospatial data science, and machine learning, conveyed through real world use cases of data science in government.
All of the book's data is free and open source, compiled from across the web. Each chapter includes API calls that read data directly into R. However, for posterity, the DATA folder on this repo has all the data, organized by chapter. The sections below provide a description of each dataset and the original source, when applicable.
Following the Introduction, Chapter 1 introduces indicators as an important tool for simplifying and communicating complex processes to non-technical decision makers. Introducing the tidyverse
, tidycensus
, and sf
packages, this chapter analyzes whether Philadelphia renters are willing to pay a premium for transit amenities.
Dataset | Description | Open Data URL | File Type | Location |
---|---|---|---|---|
SEPTA_Broad | Stations on the Broad Street line | http://septaopendata-septa.opendata.arcgis.com/datasets/septa-broad-street-line-stations | geojson | DATA/Chapter1 |
SEPTA_El | Stations on the Market Frankford (El) line | http://septaopendata-septa.opendata.arcgis.com/datasets/septa-market-frankford-line-stations | geojson | DATA/Chapter1 |
PHL_CT00 | Philadelphia Census Tracts with data on the total population, number of white residents, educational attainment, median household income, median rent, and poverty for the year 2000 | collected with tidycensus |
geojson | DATA/Chapter1 |
Chapter 2 explores the discontinuous nature of boundaries to understand how an Urban Growth Area in Lancaster County, PA affects suburban sprawl.
Dataset | Description | Open Data URL | File Type | Location |
---|---|---|---|---|
studyAreaTowns | Towns inside of the Lancaster County study area | http://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=1267 | geojson | DATA/Chapter2 |
Urban_Growth_Boundary | Lancaster County's Urban Growth Area | http://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=1274 | geojson | DATA/Chapter2 |
LancasterCountyBuildings | Footprints for a sample of 60% of buildings in the study area | http://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=1257 | geojson | DATA/Chapter2 |
LancasterCountyBoundary | Spatial extent of Lancaster County | http://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=1260 | geojson | DATA/Chapter2 |
LancasterGreenSpace | Non-developed land cover in the study area | http://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=3154 | geojson | DATA/Chapter2 |
Chapters 3 and 4 provide a first look at geospatial predictive modeling, forecasting home prices in Boston, MA. Chapter 3 introduces linear regression, goodness of fit metrics, and cross-validation, with the goal of assessing model accuracy and generalizability. Chapter 4 builds on the initial analysis to account for the 'spatial process' or pattern of home prices.
Dataset | Description | Open Data URL | File Type | Location |
---|---|---|---|---|
Boston_Nhoods | Neighborhoods in Boston, MA | https://data.boston.gov/dataset/boston-neighborhoods | shapefile | DATA/Chapter3_4/Boston_Nhoods |
bostonHousePriceData_clean | Sale price and housing characteristics of homes sold in Boston between August 2015 and August 2016 | https://data.boston.gov/dataset/property-assessment | csv | DATA/Chapter3_4 |
bostonCrimes | Crimes in Boston that occurred between August 2015 and August 2016 | https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system | csv | DATA/Chapter3_4 |
boston_sf_Ch1_wrangled | All data wrangled in Chapter 3 that can be used in Chapter 4 if analysis is not completed all at once | geojson | DATA/Chapter3_4 |
Chapter 5 tackles the controversial topic of Predictive Policing, forecasting burglary risk in Chicago. The argument is made that converting Broken Windows theory into Broken Window policing, can bake bias directly into a predictive model and lead to a discriminatory resource allocation tool. The concept of generalizability remains key.
Chapter 6 introduces the use of machine learning in estimating risk/opportunity for individuals. The resulting intelligence is then used to develop a cost/benefit analysis for Bounce to Work! a pogo-transit start-up. The goal is to predict the probability a client will 'churn' or not re-up their membership. This is valuable for public-sector data scientists working with individuals and families.
Dataset | Description | Open Data URL | File Type | Location |
---|---|---|---|---|
churnBounce | A churn-related dataset published by IBM. Field names have been modified to better fit the use case | https://www.kaggle.com/blastchar/telco-customer-churn/home | csv | DATA/Chapter6 |
housingSubsidy | Adopted from Moro & Rita this dataset is provided for the homework assignment | http://archive.ics.uci.edu/ml/datasets/Bank+Marketing | csv | DATA/Chapter6 |
Chapter 7 evaluates people-based algorithms for 'disparate impact' - the idea that even if an algorithm is not designed to discriminte on its face, it may still have a discriminatory effect. This chapter returns to a criminal justice use case, estimating the social costs and benefits.
Dataset | Description | Open Data URL | File Type | Location |
---|---|---|---|---|
compas-scores-two-years | Dataset of defendants in Broward County, FL screened by COMPAS over two years (2013 and 2014) | https://github.com/propublica/compas-analysis/blob/master/compas-scores-two-years.csv | csv | DATA/Chapter7 |
Chapter 8 builds a space/time predictive model of ride share demand in Chicago. New R functionality is introduced along with functions unique to time series data.
Dataset | Description | Open Data URL | File Type | Location |
---|---|---|---|---|
chicago_rideshare_trips_nov_dec_18_clean_sample | A 20% sample of rideshare trips taken in Chicago for November and December 2018 | https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips/m6dm-c72p | csv | DATA/Chapter8 |