Envista Mapping App¶

A data viasualization app to map the City of San Francisco's Envista construction projects using data from their Open Data portal.

Note: this is a work in progress.

Two databases exist, which contain Envista projects:
1) Envista Projects for Utility Excavation and Paving
API Endpoint: https://data.sfgov.org/resource/dsie-4zfb.json

2) Envista Projects for Paving
API Endpoint: https://data.sfgov.org/resource/vad7-rtjc.json

Start by importing needed Python libraries:

%matplotlib inline
import datetime
import requests
import json
import pandas as pd
import seaborn as sns

Query the utility API and store response in a dictionary:

url_util = ('https://data.sfgov.org/resource/dsie-4zfb.json')
r_util = requests.get(url_util)
response_dict_util = json.loads(r_util.text)

Look at the first record in the response dictionary:

response_dict_util[0]

{u':@computed_region_bh8s_q3mv': u'28861',
 u':@computed_region_fyvs_ahh9': u'5',
 u':@computed_region_p5aj_wyqh': u'7',
 u':@computed_region_rxqg_mtj9': u'6',
 u':@computed_region_yftq_j783': u'9',
 u'cnn': u'21411000',
 u'create_date': u'2016-08-07T04:15:13.000',
 u'dpw_project_class': u'Paving',
 u'duration': u'499',
 u'end_date': u'2016-12-14T00:00:00.000',
 u'facility_indicator': u'Roadway Surface',
 u'facility_type': u'Roadway',
 u'facility_type_description': u'Roadway Surface / Paving-Roads Only',
 u'group_code': u'2489J',
 u'latitude': u'37.716189190995',
 u'limits': u'NAPLES ST intersection',
 u'location': {u'coordinates': [-122.434732, 37.716189], u'type': u'Point'},
 u'longitude': u'-122.434732259092',
 u'modify_date': u'2016-08-07T04:15:13.000',
 u'owner': u'SF DPW IDC - Streets & Highways',
 u'project_id': u'DPW0099_PR_37',
 u'project_name_full': u'2489J - Various Locations Pavement Renovation No 19',
 u'project_status': u'Started',
 u'project_type': u'Paving-Roads Only',
 u'start_date': u'2015-08-03T00:00:00.000',
 u'street_name': u'ITALY AVE',
 u'x': u'6002084.75879',
 u'y': u'2089027.25273'}

Load into a Pandas dataframe, df_util, and delete the first 5 columns which are not needed:

df_util = pd.DataFrame.from_dict(response_dict_util)
df_util = df_util.drop(df_util.columns[[0, 1, 2, 3, 4]], axis=1)

Look at the new dataframe:

df_util.head()

df_util.tail()

Key fields:
create_date
description_full (incomplete data)
dwp_project_class (StreetScape, Sewer, Gas, Paving, ...)
owner (PG&E, SF DPW IDC - Hydraulic, SF DPW IDC - Streets & Highways, ...)
project_id
project_name_full
project_status (Planned, Committed, Started, Completed)
start_date
end_date
duration
street_name
latitude
longitude

Perform the same steps for the other Envista database:

url = ('https://data.sfgov.org/resource/vad7-rtjc.json')
r = requests.get(url)
response_dict = json.loads(r.text)

Look at the first record of response_dict to see if it has the same structure as the utility response dictionary:

response_dict[0]

{u':@computed_region_bh8s_q3mv': u'28857',
 u':@computed_region_fyvs_ahh9': u'4',
 u':@computed_region_p5aj_wyqh': u'1',
 u':@computed_region_rxqg_mtj9': u'10',
 u':@computed_region_yftq_j783': u'3',
 u'cnn': u'12050000',
 u'create_date': u'2015-03-02T12:38:00.000',
 u'description_full': u'The improvements will include new concrete paving, new unit pavers, bench seating, planters, green infrastructure drainage systems, and new raised sidewalks at both north and south entries to Spofford Alley.',
 u'dpw_project_class': u'Paving',
 u'duration': u'1005',
 u'end_date': u'2017-12-01T23:59:00.000',
 u'facility_indicator': u'Roadway Surface',
 u'facility_subindicator': u'Paved Road-Other',
 u'facility_type': u'Roadway',
 u'facility_type_description': u'Roadway Surface / Paved Road-Other / Construction',
 u'latitude': u'37.794576879609',
 u'limits': u'CLAY ST to OLD CHINATOWN LN \\ WASHINGTON ST (1 - 99)',
 u'location': {u'coordinates': [-122.40739, 37.794577], u'type': u'Point'},
 u'longitude': u'-122.40738960074',
 u'modify_date': u'2015-05-27T14:47:00.000',
 u'owner': u'SF DPW BDC - Landscape Architecture',
 u'project_id': u'2294J',
 u'project_name_full': u'Spofford Chinatown Living Alley',
 u'project_status': u'Planned',
 u'project_type': u'Construction',
 u'start_date': u'2015-03-02T00:00:00.000',
 u'street_name': u'SPOFFORD ST',
 u'x': u'6010573.6578',
 u'y': u'2117401.51325'}

It does, so can delete the first 5 columns as part of the conversion to a dataframe as before:

df = pd.DataFrame.from_dict(response_dict)
df = df.drop(df.columns[[0, 1, 2, 3, 4]], axis=1)

df.head()

df.tail()

For testing purposes, turn the "df" dataframe into a groups of geojson objects to plot on a map (since it has fewer records than "df_util"). For now only 6 columns of data sent to the geojson file: "project_id", "owner", "dpw_project_class", "project_status", "start_date", "end_date".

def df_2_geojson(df, properties, lat='latitude', lon='longitude'):
    geojson = {"type":"FeatureCollection", "features":[]}
    for _, row in df.iterrows():
        feature = {"type":"Feature",
                   "properties":{},
                   "geometry":{"type":"Point",
                               "coordinates":[]}}
        feature["geometry"]["coordinates"] = [row[lon],row[lat]]
        for prop in properties:
            feature["properties"][prop] = row[prop]
        geojson['features'].append(feature)
    return geojson
    
cols = ["project_id", "owner", "dpw_project_class", "project_status", "start_date", "end_date"]
geojson = df_2_geojson(df, cols)

output_filename = 'envista.geojson'
with open(output_filename, 'wb') as output_file:
    output_file.write('')
    json.dump(geojson, output_file, indent=2)

The map that plots the geojson points is located here: https://hdavis.github.io/Envista/envista.html

Now that a preliminary map works, let's check for duplicates between two databases. A quick visual inspection shows many duplicates. For now, let's work with the Utility Excavation and Paving database ("df_util" dataframe) since it appears to be more comprehensive.

A more detailed check will need to be done to confirm that everything in the "Envista Projects for Paving" database ("df" dataframe) with 5354 records is in the "Envista Projects for Utility Excavation and Paving" database ("df_util" dataframe) with 16093 records (as of 8/7/2016).

Working with the "df_util" dataframe, let's subset it into slices based on the values/categories in the "dpw_project_class" field. These categories are:

df_util.dpw_project_class.value_counts()

Paving                 5323
Sewer                  2639
Water                  2570
Transit                1899
Curb Ramps             1273
StreetScape             728
Gas                     705
Roadway                 464
Electric                215
Pedestrian Safety       123
Traffic Improvement     108
Land-Use                 37
Storm Drain               8
Communication             1
Name: dpw_project_class, dtype: int64

Creating the slices based on these categories:

df_comm = df_util[df_util.dpw_project_class == 'Communication']
df_curb = df_util[df_util.dpw_project_class == 'Curb Ramps']
df_electric = df_util[df_util.dpw_project_class == 'Electric']
df_gas = df_util[df_util.dpw_project_class == 'Gas']
df_landUse = df_util[df_util.dpw_project_class == 'Land-Use']
df_paving = df_util[df_util.dpw_project_class == 'Paving']
df_ped = df_util[df_util.dpw_project_class == 'Pedestrian Safety']
df_roadway = df_util[df_util.dpw_project_class == 'Roadway']
df_sewer = df_util[df_util.dpw_project_class == 'Sewer']
df_storm = df_util[df_util.dpw_project_class == 'Storm Drain']
df_streetScape = df_util[df_util.dpw_project_class == 'StreetScape']
df_trafficImprove = df_util[df_util.dpw_project_class == 'Traffic Improvement']
df_transit = df_util[df_util.dpw_project_class == 'Transit']
df_water = df_util[df_util.dpw_project_class == 'Water']

Checking to make sure there are no missing/stray records or rows:

numRows_df_util = len(df_util.index) # total number of records in "df_util"
numRows_df_util

16093

numRows_all_slices = len(df_comm.index) + len(df_curb.index) + \
    len(df_electric.index) + len(df_gas.index) + len(df_landUse.index) + \
    len(df_paving.index) + len(df_ped.index) + len(df_roadway.index) + \
    len(df_sewer.index) + len(df_storm.index) + len(df_streetScape.index) + \
    len(df_trafficImprove.index) + len(df_transit.index) + len(df_water.index)
numRows_all_slices

16093

Can convert these different slices based on project class into geoJSON files, though might be best to load that data for the map directly from the SF City API and display the different categories on the map using gueries.

Looking at the breakdown of project owners to see if that might be a good way to display the map data:

df_util.owner.value_counts()

SF DPW IDC - Streets & Highways                     4147
SF DPW IDC - Project Management                     2931
SF DPW IDC - Hydraulic                              2631
SF PUC Water                                        2448
SF MTA SS - Transportation Engineering (Capital)    1295
PG&E                                                 920
SF MTA Capital Programs and Construction             550
SF MTA SS - Livable Streets                          497
SF DPW IDC - Mechanical\Electrical                   245
Transbay                                             184
SF DPW IDC - Construction Management                 159
SF DPW BDC - Landscape Architecture                   37
SF PUC Green Infrastructure                           31
SF PUC Project Management Bureau                       8
SF DPW IDC - Structural                                7
AT&T - Construction and Engineering                    1
Name: owner, dtype: int64

There are 15 different owners, including the individual departments within a given owner (e.g. SF DPW IDC). The project class field has 14 categories. Using different colors to display the status of the project (started, completed, etc.) is the easiest way to divide up the data on the map and a good place to start.

Looking at the distribution of the 4 categories in the project_status field:

df_util.project_status.value_counts()

Committed    8947
Started      4316
Planned      2755
Completed      75
Name: project_status, dtype: int64

	cnn	create_date	description_full	dpw_project_class	duration	end_date	facility_indicator	facility_subindicator	facility_type	facility_type_description	...	modify_date	owner	project_id	project_name_full	project_status	project_type	start_date	street_name	x	y
16088	738000	2015-01-15T18:56:00.000	Possible joint project with 22 Fillmore Muni F...	Water	730	2020-01-01T23:59:00.000	NaN	NaN	Water	NaN	...	2016-05-31T17:39:00.000	SF PUC Water	PLN-1415-29	WD-2824: 8, 12, and 16-inch Ductile Iron Water...	Planned	NaN	2018-01-01T00:00:00.000	16TH ST	6004361.73741	2106589.81999
16089	12394000	2014-09-30T20:14:00.000	Track Replacement & Overhead Rehabilitation	Transit	669	2019-05-01T23:59:00.000	NaN	NaN	Transit	/ Replacement	...	2016-04-19T20:17:00.000	SF MTA Capital Programs and Construction	FG0122_L	L-Line Transit Improvement Project	Planned	Replacement	2017-07-01T00:00:00.000	TARAVAL ST	5990607.64287	2099058.5005
16090	23377000	2014-09-30T20:14:00.000	Track Replacement & Overhead Rehabilitation	Transit	669	2019-05-01T23:59:00.000	NaN	NaN	Transit	/ Replacement	...	2016-04-19T20:17:00.000	SF MTA Capital Programs and Construction	FG0122_L	L-Line Transit Improvement Project	Planned	Replacement	2017-07-01T00:00:00.000	36TH AVE	5985174.35992	2098873.06313
16091	12410000	2014-09-30T20:14:00.000	Track Replacement & Overhead Rehabilitation	Transit	669	2019-05-01T23:59:00.000	NaN	NaN	Transit	/ Replacement	...	2016-04-19T20:17:00.000	SF MTA Capital Programs and Construction	FG0122_L	L-Line Transit Improvement Project	Planned	Replacement	2017-07-01T00:00:00.000	TARAVAL ST	5985638.77669	2098888.91368
16092	23385000	2014-09-30T20:14:00.000	Track Replacement & Overhead Rehabilitation	Transit	669	2019-05-01T23:59:00.000	NaN	NaN	Transit	/ Replacement	...	2016-04-19T20:17:00.000	SF MTA Capital Programs and Construction	FG0122_L	L-Line Transit Improvement Project	Planned	Replacement	2017-07-01T00:00:00.000	37TH AVE	5984863.96092	2098862.24906

	cnn	create_date	description_full	dpw_project_class	duration	end_date	facility_indicator	facility_subindicator	facility_type	facility_type_description	...	modify_date	owner	project_id	project_name_full	project_status	project_type	start_date	street_name	x	y
0	21411000	2016-08-07T04:15:13.000	NaN	Paving	499	2016-12-14T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Roads Only	...	2016-08-07T04:15:13.000	SF DPW IDC - Streets & Highways	DPW0099_PR_37	2489J - Various Locations Pavement Renovation ...	Started	Paving-Roads Only	2015-08-03T00:00:00.000	ITALY AVE	6002084.75879	2089027.25273
1	11412000	2016-08-07T04:15:13.000	NaN	Paving	184	2018-11-01T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Roads Only	...	2016-08-07T04:15:13.000	SF DPW IDC - Project Management	DPW0045_PR_36	_NA - Various Locations Pavement Renovation No 45	Committed	Paving-Roads Only	2018-05-01T00:00:00.000	SAN FERNANDO WAY	5992110.88968	2096250.69885
2	23276000	2016-02-10T19:40:00.000	NaN	Gas	90	2016-07-30T23:59:00.000	NaN	NaN	Gas	NaN	...	2016-06-07T01:33:00.000	PG&E	31217760	Replace V8006	Started	NaN	2016-05-01T00:00:00.000	27TH AVE	5987965.3937	2098968.32098
3	8953000	2016-08-07T04:15:13.000	NaN	Paving	364	2017-06-30T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Microsurfacing	...	2016-08-07T04:15:13.000	SF DPW IDC - Project Management	DPW0585_PP_35	2832J - Street Resurfacing Program – BSSR Pave...	Started	Paving-Microsurfacing	2016-07-01T00:00:00.000	MELROSE AVE	5996876.0104	2095407.32118
4	8732000	2016-08-07T04:15:13.000	NaN	Paving	729	2017-05-02T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Roads Only	...	2016-08-07T04:15:13.000	SF DPW IDC - Streets & Highways	DPW0394_PR_67	2534J - As Needed Paving Contract No 11	Started	Paving-Roads Only	2015-05-04T00:00:00.000	MARIPOSA ST	6009654.12858	2105941.05884

	cnn	create_date	description_full	dpw_project_class	duration	end_date	facility_indicator	facility_subindicator	facility_type	facility_type_description	...	modify_date	owner	project_id	project_name_full	project_status	project_type	start_date	street_name	x	y
0	12050000	2015-03-02T12:38:00.000	The improvements will include new concrete pav...	Paving	1005	2017-12-01T23:59:00.000	Roadway Surface	Paved Road-Other	Roadway	Roadway Surface / Paved Road-Other / Construction	...	2015-05-27T14:47:00.000	SF DPW BDC - Landscape Architecture	2294J	Spofford Chinatown Living Alley	Planned	Construction	2015-03-02T00:00:00.000	SPOFFORD ST	6010573.6578	2117401.51325
1	26040000	2015-02-10T13:48:00.000	ADA Improvements to Existing RestroomNew Acces...	Paving	1031	2016-12-15T23:59:00.000	Roadway Surface	Paved Road-Asphalt	Roadway	Roadway Surface / Paved Road-Asphalt	...	2015-05-27T14:41:00.000	SF DPW BDC - Landscape Architecture	3202V	Alamo Square	Planned	NaN	2014-02-18T00:00:00.000	GROVE ST	6003031.40874	2110990.96485
2	6597000	2015-04-01T12:56:00.000	The Department of Public Works intends to ente...	Paving	582	2016-10-31T10:00:00.000	Roadway Surface	Paved Road-Asphalt	Roadway	Roadway Surface / Paved Road-Asphalt / Constru...	...	2015-10-29T02:42:00.000	SF DPW BDC - Landscape Architecture	3239V	Guy Place Mini Park	Started	Construction	2015-03-29T10:00:00.000	GUY PL	6014245.46028	2114355.02283
3	25022000	2015-03-02T12:38:00.000	The improvements will include new concrete pav...	Paving	1005	2017-12-01T23:59:00.000	Roadway Surface	Paved Road-Other	Roadway	Roadway Surface / Paved Road-Other / Construction	...	2015-05-27T14:47:00.000	SF DPW BDC - Landscape Architecture	2294J	Spofford Chinatown Living Alley	Planned	Construction	2015-03-02T00:00:00.000	OLD CHINATOWN LN	6010549.29985	2117561.92251
4	12126000	2015-02-10T13:48:00.000	ADA Improvements to Existing RestroomNew Acces...	Paving	1031	2016-12-15T23:59:00.000	Roadway Surface	Paved Road-Asphalt	Roadway	Roadway Surface / Paved Road-Asphalt	...	2015-05-27T14:41:00.000	SF DPW BDC - Landscape Architecture	3202V	Alamo Square	Planned	NaN	2014-02-18T00:00:00.000	STEINER ST	6003007.45952	2111162.67772

	cnn	create_date	description_full	dpw_project_class	duration	end_date	facility_indicator	facility_subindicator	facility_type	facility_type_description	...	modify_date	owner	project_id	project_name_full	project_status	project_type	start_date	street_name	x	y
5349	21878000	2016-08-07T04:15:13.000	NaN	Paving	729	2018-05-01T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Roads Only	...	2016-08-07T04:15:13.000	SF DPW IDC - Streets & Highways	DPW0559_PR_41	2766J - As Needed Paving Contract No 12	Started	Paving-Roads Only	2016-05-02T00:00:00.000	RANDALL ST	6004308.30805	2097532.38708
5350	27913000	2016-08-07T04:15:13.000	NaN	Paving	304	2018-01-30T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Roads Only	...	2016-08-07T04:15:13.000	SF DPW IDC - Streets & Highways	DPW0518_PR_33	2708J - Joint - SFWD Lead - Geary Blvd Pavemen...	Committed	Paving-Roads Only	2017-04-01T00:00:00.000	41ST AVE	5983157.66512	2112488.40812
5351	1919000	2016-08-07T04:15:13.000	NaN	Paving	122	2019-01-17T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Roads Only	...	2016-08-07T04:15:13.000	SF DPW IDC - Project Management	DPW0001_PR_7	_NA - 41st Ave and 44th Ave Pavement Renovation	Committed	Paving-Roads Only	2018-09-17T00:00:00.000	41ST AVE	5983565.98251	2100518.31951
5352	8536000	2016-08-07T04:15:13.000	NaN	Paving	731	2018-12-02T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Microsurfacing	...	2016-08-07T04:15:13.000	SF DPW IDC - Project Management	DPW0566_PP_239	2807J - As Needed Pavement Preservation Contra...	Committed	Paving-Microsurfacing	2016-12-01T00:00:00.000	LUNADO CT	5991683.16569	2092749.69716
5353	23764000	2016-08-07T04:15:13.000	NaN	Paving	212	2018-04-07T00:00:00.000	Roadway Surface	NaN	Roadway	Roadway Surface / Paving-Roads Only	...	2016-08-07T04:15:13.000	SF DPW IDC - Project Management	DPW0088_PR_47	2792J - Various Locations Pavement Renovation ...	Committed	Paving-Roads Only	2017-09-07T00:00:00.000	19TH ST	6011264.03929	2105098.35991