Envista Mapping App

A data viasualization app to map the City of San Francisco's Envista construction projects using data from their Open Data portal.

Note: this is a work in progress.

Two databases exist, which contain Envista projects:
1) Envista Projects for Utility Excavation and Paving
API Endpoint: https://data.sfgov.org/resource/dsie-4zfb.json

2) Envista Projects for Paving
API Endpoint: https://data.sfgov.org/resource/vad7-rtjc.json

Start by importing needed Python libraries:

In [1]:
%matplotlib inline
import datetime
import requests
import json
import pandas as pd
import seaborn as sns

Query the utility API and store response in a dictionary:

In [2]:
url_util = ('https://data.sfgov.org/resource/dsie-4zfb.json')
r_util = requests.get(url_util)
response_dict_util = json.loads(r_util.text)

Look at the first record in the response dictionary:

In [3]:
response_dict_util[0]
Out[3]:
{u':@computed_region_bh8s_q3mv': u'28861',
 u':@computed_region_fyvs_ahh9': u'5',
 u':@computed_region_p5aj_wyqh': u'7',
 u':@computed_region_rxqg_mtj9': u'6',
 u':@computed_region_yftq_j783': u'9',
 u'cnn': u'21411000',
 u'create_date': u'2016-08-07T04:15:13.000',
 u'dpw_project_class': u'Paving',
 u'duration': u'499',
 u'end_date': u'2016-12-14T00:00:00.000',
 u'facility_indicator': u'Roadway Surface',
 u'facility_type': u'Roadway',
 u'facility_type_description': u'Roadway Surface / Paving-Roads Only',
 u'group_code': u'2489J',
 u'latitude': u'37.716189190995',
 u'limits': u'NAPLES ST intersection',
 u'location': {u'coordinates': [-122.434732, 37.716189], u'type': u'Point'},
 u'longitude': u'-122.434732259092',
 u'modify_date': u'2016-08-07T04:15:13.000',
 u'owner': u'SF DPW IDC - Streets & Highways',
 u'project_id': u'DPW0099_PR_37',
 u'project_name_full': u'2489J - Various Locations Pavement Renovation No 19',
 u'project_status': u'Started',
 u'project_type': u'Paving-Roads Only',
 u'start_date': u'2015-08-03T00:00:00.000',
 u'street_name': u'ITALY AVE',
 u'x': u'6002084.75879',
 u'y': u'2089027.25273'}

Load into a Pandas dataframe, df_util, and delete the first 5 columns which are not needed:

In [4]:
df_util = pd.DataFrame.from_dict(response_dict_util)
df_util = df_util.drop(df_util.columns[[0, 1, 2, 3, 4]], axis=1)

Look at the new dataframe:

In [5]:
df_util.head()
Out[5]:
cnn create_date description_full dpw_project_class duration end_date facility_indicator facility_subindicator facility_type facility_type_description ... modify_date owner project_id project_name_full project_status project_type start_date street_name x y
0 21411000 2016-08-07T04:15:13.000 NaN Paving 499 2016-12-14T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Roads Only ... 2016-08-07T04:15:13.000 SF DPW IDC - Streets & Highways DPW0099_PR_37 2489J - Various Locations Pavement Renovation ... Started Paving-Roads Only 2015-08-03T00:00:00.000 ITALY AVE 6002084.75879 2089027.25273
1 11412000 2016-08-07T04:15:13.000 NaN Paving 184 2018-11-01T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Roads Only ... 2016-08-07T04:15:13.000 SF DPW IDC - Project Management DPW0045_PR_36 _NA - Various Locations Pavement Renovation No 45 Committed Paving-Roads Only 2018-05-01T00:00:00.000 SAN FERNANDO WAY 5992110.88968 2096250.69885
2 23276000 2016-02-10T19:40:00.000 NaN Gas 90 2016-07-30T23:59:00.000 NaN NaN Gas NaN ... 2016-06-07T01:33:00.000 PG&E 31217760 Replace V8006 Started NaN 2016-05-01T00:00:00.000 27TH AVE 5987965.3937 2098968.32098
3 8953000 2016-08-07T04:15:13.000 NaN Paving 364 2017-06-30T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Microsurfacing ... 2016-08-07T04:15:13.000 SF DPW IDC - Project Management DPW0585_PP_35 2832J - Street Resurfacing Program – BSSR Pave... Started Paving-Microsurfacing 2016-07-01T00:00:00.000 MELROSE AVE 5996876.0104 2095407.32118
4 8732000 2016-08-07T04:15:13.000 NaN Paving 729 2017-05-02T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Roads Only ... 2016-08-07T04:15:13.000 SF DPW IDC - Streets & Highways DPW0394_PR_67 2534J - As Needed Paving Contract No 11 Started Paving-Roads Only 2015-05-04T00:00:00.000 MARIPOSA ST 6009654.12858 2105941.05884

5 rows × 25 columns

In [6]:
df_util.tail()
Out[6]:
cnn create_date description_full dpw_project_class duration end_date facility_indicator facility_subindicator facility_type facility_type_description ... modify_date owner project_id project_name_full project_status project_type start_date street_name x y
16088 738000 2015-01-15T18:56:00.000 Possible joint project with 22 Fillmore Muni F... Water 730 2020-01-01T23:59:00.000 NaN NaN Water NaN ... 2016-05-31T17:39:00.000 SF PUC Water PLN-1415-29 WD-2824: 8, 12, and 16-inch Ductile Iron Water... Planned NaN 2018-01-01T00:00:00.000 16TH ST 6004361.73741 2106589.81999
16089 12394000 2014-09-30T20:14:00.000 Track Replacement & Overhead Rehabilitation Transit 669 2019-05-01T23:59:00.000 NaN NaN Transit / Replacement ... 2016-04-19T20:17:00.000 SF MTA Capital Programs and Construction FG0122_L L-Line Transit Improvement Project Planned Replacement 2017-07-01T00:00:00.000 TARAVAL ST 5990607.64287 2099058.5005
16090 23377000 2014-09-30T20:14:00.000 Track Replacement & Overhead Rehabilitation Transit 669 2019-05-01T23:59:00.000 NaN NaN Transit / Replacement ... 2016-04-19T20:17:00.000 SF MTA Capital Programs and Construction FG0122_L L-Line Transit Improvement Project Planned Replacement 2017-07-01T00:00:00.000 36TH AVE 5985174.35992 2098873.06313
16091 12410000 2014-09-30T20:14:00.000 Track Replacement & Overhead Rehabilitation Transit 669 2019-05-01T23:59:00.000 NaN NaN Transit / Replacement ... 2016-04-19T20:17:00.000 SF MTA Capital Programs and Construction FG0122_L L-Line Transit Improvement Project Planned Replacement 2017-07-01T00:00:00.000 TARAVAL ST 5985638.77669 2098888.91368
16092 23385000 2014-09-30T20:14:00.000 Track Replacement & Overhead Rehabilitation Transit 669 2019-05-01T23:59:00.000 NaN NaN Transit / Replacement ... 2016-04-19T20:17:00.000 SF MTA Capital Programs and Construction FG0122_L L-Line Transit Improvement Project Planned Replacement 2017-07-01T00:00:00.000 37TH AVE 5984863.96092 2098862.24906

5 rows × 25 columns

Key fields:
create_date
description_full (incomplete data)
dwp_project_class (StreetScape, Sewer, Gas, Paving, ...)
owner (PG&E, SF DPW IDC - Hydraulic, SF DPW IDC - Streets & Highways, ...)
project_id
project_name_full
project_status (Planned, Committed, Started, Completed)
start_date
end_date
duration
street_name
latitude
longitude

Perform the same steps for the other Envista database:

In [7]:
url = ('https://data.sfgov.org/resource/vad7-rtjc.json')
r = requests.get(url)
response_dict = json.loads(r.text)

Look at the first record of response_dict to see if it has the same structure as the utility response dictionary:

In [8]:
response_dict[0]
Out[8]:
{u':@computed_region_bh8s_q3mv': u'28857',
 u':@computed_region_fyvs_ahh9': u'4',
 u':@computed_region_p5aj_wyqh': u'1',
 u':@computed_region_rxqg_mtj9': u'10',
 u':@computed_region_yftq_j783': u'3',
 u'cnn': u'12050000',
 u'create_date': u'2015-03-02T12:38:00.000',
 u'description_full': u'The improvements will include new concrete paving, new unit pavers, bench seating, planters, green infrastructure drainage systems, and new raised sidewalks at both north and south entries to Spofford Alley.',
 u'dpw_project_class': u'Paving',
 u'duration': u'1005',
 u'end_date': u'2017-12-01T23:59:00.000',
 u'facility_indicator': u'Roadway Surface',
 u'facility_subindicator': u'Paved Road-Other',
 u'facility_type': u'Roadway',
 u'facility_type_description': u'Roadway Surface / Paved Road-Other / Construction',
 u'latitude': u'37.794576879609',
 u'limits': u'CLAY ST to OLD CHINATOWN LN \\ WASHINGTON ST (1 - 99)',
 u'location': {u'coordinates': [-122.40739, 37.794577], u'type': u'Point'},
 u'longitude': u'-122.40738960074',
 u'modify_date': u'2015-05-27T14:47:00.000',
 u'owner': u'SF DPW BDC - Landscape Architecture',
 u'project_id': u'2294J',
 u'project_name_full': u'Spofford Chinatown Living Alley',
 u'project_status': u'Planned',
 u'project_type': u'Construction',
 u'start_date': u'2015-03-02T00:00:00.000',
 u'street_name': u'SPOFFORD ST',
 u'x': u'6010573.6578',
 u'y': u'2117401.51325'}

It does, so can delete the first 5 columns as part of the conversion to a dataframe as before:

In [9]:
df = pd.DataFrame.from_dict(response_dict)
df = df.drop(df.columns[[0, 1, 2, 3, 4]], axis=1)
In [10]:
df.head()
Out[10]:
cnn create_date description_full dpw_project_class duration end_date facility_indicator facility_subindicator facility_type facility_type_description ... modify_date owner project_id project_name_full project_status project_type start_date street_name x y
0 12050000 2015-03-02T12:38:00.000 The improvements will include new concrete pav... Paving 1005 2017-12-01T23:59:00.000 Roadway Surface Paved Road-Other Roadway Roadway Surface / Paved Road-Other / Construction ... 2015-05-27T14:47:00.000 SF DPW BDC - Landscape Architecture 2294J Spofford Chinatown Living Alley Planned Construction 2015-03-02T00:00:00.000 SPOFFORD ST 6010573.6578 2117401.51325
1 26040000 2015-02-10T13:48:00.000 ADA Improvements to Existing RestroomNew Acces... Paving 1031 2016-12-15T23:59:00.000 Roadway Surface Paved Road-Asphalt Roadway Roadway Surface / Paved Road-Asphalt ... 2015-05-27T14:41:00.000 SF DPW BDC - Landscape Architecture 3202V Alamo Square Planned NaN 2014-02-18T00:00:00.000 GROVE ST 6003031.40874 2110990.96485
2 6597000 2015-04-01T12:56:00.000 The Department of Public Works intends to ente... Paving 582 2016-10-31T10:00:00.000 Roadway Surface Paved Road-Asphalt Roadway Roadway Surface / Paved Road-Asphalt / Constru... ... 2015-10-29T02:42:00.000 SF DPW BDC - Landscape Architecture 3239V Guy Place Mini Park Started Construction 2015-03-29T10:00:00.000 GUY PL 6014245.46028 2114355.02283
3 25022000 2015-03-02T12:38:00.000 The improvements will include new concrete pav... Paving 1005 2017-12-01T23:59:00.000 Roadway Surface Paved Road-Other Roadway Roadway Surface / Paved Road-Other / Construction ... 2015-05-27T14:47:00.000 SF DPW BDC - Landscape Architecture 2294J Spofford Chinatown Living Alley Planned Construction 2015-03-02T00:00:00.000 OLD CHINATOWN LN 6010549.29985 2117561.92251
4 12126000 2015-02-10T13:48:00.000 ADA Improvements to Existing RestroomNew Acces... Paving 1031 2016-12-15T23:59:00.000 Roadway Surface Paved Road-Asphalt Roadway Roadway Surface / Paved Road-Asphalt ... 2015-05-27T14:41:00.000 SF DPW BDC - Landscape Architecture 3202V Alamo Square Planned NaN 2014-02-18T00:00:00.000 STEINER ST 6003007.45952 2111162.67772

5 rows × 25 columns

In [11]:
df.tail()
Out[11]:
cnn create_date description_full dpw_project_class duration end_date facility_indicator facility_subindicator facility_type facility_type_description ... modify_date owner project_id project_name_full project_status project_type start_date street_name x y
5349 21878000 2016-08-07T04:15:13.000 NaN Paving 729 2018-05-01T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Roads Only ... 2016-08-07T04:15:13.000 SF DPW IDC - Streets & Highways DPW0559_PR_41 2766J - As Needed Paving Contract No 12 Started Paving-Roads Only 2016-05-02T00:00:00.000 RANDALL ST 6004308.30805 2097532.38708
5350 27913000 2016-08-07T04:15:13.000 NaN Paving 304 2018-01-30T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Roads Only ... 2016-08-07T04:15:13.000 SF DPW IDC - Streets & Highways DPW0518_PR_33 2708J - Joint - SFWD Lead - Geary Blvd Pavemen... Committed Paving-Roads Only 2017-04-01T00:00:00.000 41ST AVE 5983157.66512 2112488.40812
5351 1919000 2016-08-07T04:15:13.000 NaN Paving 122 2019-01-17T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Roads Only ... 2016-08-07T04:15:13.000 SF DPW IDC - Project Management DPW0001_PR_7 _NA - 41st Ave and 44th Ave Pavement Renovation Committed Paving-Roads Only 2018-09-17T00:00:00.000 41ST AVE 5983565.98251 2100518.31951
5352 8536000 2016-08-07T04:15:13.000 NaN Paving 731 2018-12-02T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Microsurfacing ... 2016-08-07T04:15:13.000 SF DPW IDC - Project Management DPW0566_PP_239 2807J - As Needed Pavement Preservation Contra... Committed Paving-Microsurfacing 2016-12-01T00:00:00.000 LUNADO CT 5991683.16569 2092749.69716
5353 23764000 2016-08-07T04:15:13.000 NaN Paving 212 2018-04-07T00:00:00.000 Roadway Surface NaN Roadway Roadway Surface / Paving-Roads Only ... 2016-08-07T04:15:13.000 SF DPW IDC - Project Management DPW0088_PR_47 2792J - Various Locations Pavement Renovation ... Committed Paving-Roads Only 2017-09-07T00:00:00.000 19TH ST 6011264.03929 2105098.35991

5 rows × 25 columns

For testing purposes, turn the "df" dataframe into a groups of geojson objects to plot on a map (since it has fewer records than "df_util"). For now only 6 columns of data sent to the geojson file: "project_id", "owner", "dpw_project_class", "project_status", "start_date", "end_date".

In [12]:
def df_2_geojson(df, properties, lat='latitude', lon='longitude'):
    geojson = {"type":"FeatureCollection", "features":[]}
    for _, row in df.iterrows():
        feature = {"type":"Feature",
                   "properties":{},
                   "geometry":{"type":"Point",
                               "coordinates":[]}}
        feature["geometry"]["coordinates"] = [row[lon],row[lat]]
        for prop in properties:
            feature["properties"][prop] = row[prop]
        geojson['features'].append(feature)
    return geojson
    
cols = ["project_id", "owner", "dpw_project_class", "project_status", "start_date", "end_date"]
geojson = df_2_geojson(df, cols)

output_filename = 'envista.geojson'
with open(output_filename, 'wb') as output_file:
    output_file.write('')
    json.dump(geojson, output_file, indent=2)

The map that plots the geojson points is located here: https://hdavis.github.io/Envista/envista.html

Now that a preliminary map works, let's check for duplicates between two databases. A quick visual inspection shows many duplicates. For now, let's work with the Utility Excavation and Paving database ("df_util" dataframe) since it appears to be more comprehensive.

A more detailed check will need to be done to confirm that everything in the "Envista Projects for Paving" database ("df" dataframe) with 5354 records is in the "Envista Projects for Utility Excavation and Paving" database ("df_util" dataframe) with 16093 records (as of 8/7/2016).

Working with the "df_util" dataframe, let's subset it into slices based on the values/categories in the "dpw_project_class" field. These categories are:

In [13]:
df_util.dpw_project_class.value_counts()
Out[13]:
Paving                 5323
Sewer                  2639
Water                  2570
Transit                1899
Curb Ramps             1273
StreetScape             728
Gas                     705
Roadway                 464
Electric                215
Pedestrian Safety       123
Traffic Improvement     108
Land-Use                 37
Storm Drain               8
Communication             1
Name: dpw_project_class, dtype: int64

Creating the slices based on these categories:

In [14]:
df_comm = df_util[df_util.dpw_project_class == 'Communication']
df_curb = df_util[df_util.dpw_project_class == 'Curb Ramps']
df_electric = df_util[df_util.dpw_project_class == 'Electric']
df_gas = df_util[df_util.dpw_project_class == 'Gas']
df_landUse = df_util[df_util.dpw_project_class == 'Land-Use']
df_paving = df_util[df_util.dpw_project_class == 'Paving']
df_ped = df_util[df_util.dpw_project_class == 'Pedestrian Safety']
df_roadway = df_util[df_util.dpw_project_class == 'Roadway']
df_sewer = df_util[df_util.dpw_project_class == 'Sewer']
df_storm = df_util[df_util.dpw_project_class == 'Storm Drain']
df_streetScape = df_util[df_util.dpw_project_class == 'StreetScape']
df_trafficImprove = df_util[df_util.dpw_project_class == 'Traffic Improvement']
df_transit = df_util[df_util.dpw_project_class == 'Transit']
df_water = df_util[df_util.dpw_project_class == 'Water']

Checking to make sure there are no missing/stray records or rows:

In [15]:
numRows_df_util = len(df_util.index) # total number of records in "df_util"
numRows_df_util
Out[15]:
16093
In [16]:
numRows_all_slices = len(df_comm.index) + len(df_curb.index) + \
    len(df_electric.index) + len(df_gas.index) + len(df_landUse.index) + \
    len(df_paving.index) + len(df_ped.index) + len(df_roadway.index) + \
    len(df_sewer.index) + len(df_storm.index) + len(df_streetScape.index) + \
    len(df_trafficImprove.index) + len(df_transit.index) + len(df_water.index)
numRows_all_slices 
Out[16]:
16093

Can convert these different slices based on project class into geoJSON files, though might be best to load that data for the map directly from the SF City API and display the different categories on the map using gueries.

Looking at the breakdown of project owners to see if that might be a good way to display the map data:

In [17]:
df_util.owner.value_counts()
Out[17]:
SF DPW IDC - Streets & Highways                     4147
SF DPW IDC - Project Management                     2931
SF DPW IDC - Hydraulic                              2631
SF PUC Water                                        2448
SF MTA SS - Transportation Engineering (Capital)    1295
PG&E                                                 920
SF MTA Capital Programs and Construction             550
SF MTA SS - Livable Streets                          497
SF DPW IDC - Mechanical\Electrical                   245
Transbay                                             184
SF DPW IDC - Construction Management                 159
SF DPW BDC - Landscape Architecture                   37
SF PUC Green Infrastructure                           31
SF PUC Project Management Bureau                       8
SF DPW IDC - Structural                                7
AT&T - Construction and Engineering                    1
Name: owner, dtype: int64

There are 15 different owners, including the individual departments within a given owner (e.g. SF DPW IDC). The project class field has 14 categories. Using different colors to display the status of the project (started, completed, etc.) is the easiest way to divide up the data on the map and a good place to start.

Looking at the distribution of the 4 categories in the project_status field:

In [18]:
df_util.project_status.value_counts()
Out[18]:
Committed    8947
Started      4316
Planned      2755
Completed      75
Name: project_status, dtype: int64