---
addenda:
- '[code](https://github.com/alexklapheke/trees)'
date: 1595378969
title: Cambridge street trees
---
This project now features an [interactive
map](https://alexklapheke.github.io/treemap/).
::: {.epigraph}
I think that I shall never see\
A billboard lovely as a tree.\
Indeed, unless the billboards fall\
I'll never see a tree at all.
:::
# Introduction
Every spring, the Callery pear down the street from me displays a
magnificent panoply of white blossoms, and releases... let's say a
*distinct* odor. A few months later, the tree of heaven in my backyard
drops an incredible torrent of flowers onto my porch, and within a few
months, another torrent of seed pods (this tree has it's own smell which
is marginally less unpleasant). It's a subtle difference between a
street lined with, say, honeylocusts and one lined with sycamores, but
trees really do go a long way in defining the character of a place,
making them a crucial consideration in urban planning.
Like many cities, Cambridge has long maintained a [dataset of all
municipal
trees](https://data.cambridgema.gov/Public-Works/Street-Trees/ni4i-5bnn),
including the species, location, date planted, and
[diameter](https://en.wikipedia.org/wiki/Diameter_at_breast_height).[^1]
By exploring this data, we can see what trends and patterns emerge, and
learn something about the arboreal life of our cities.
# Cleaning
Although the dataset does seem to cover every public tree, much of the
data is missing. For example, 83% of trees are missing the date they
were planted, presumably at least in part because they predate the
record system (a few of those that do list dates are from the 1970s).
There are many spurious values as well. Besides misaligned species names
(to be discussed in @sec:var), there are several misspelled genera
(e.g., "Planatus" for *Platanus*). The
diameter of each tree is listed, but the highest values are 154, 715,
745, 915, and 945 inches, corresponding respectively to about 13, 60,
62, 76, and 79 feet (the [stoutest tree in the
world](https://en.wikipedia.org/wiki/%C3%81rbol_del_Tule) is only 46
feet in diameter), casting all values into some suspicion. The dataset
also lists properties such as number of trunks---one [katsura
tree](https://goo.gl/maps/fzSmAtdHHqaXnHv5A) has ten, and two, a
[Japanese tree lilac](https://goo.gl/maps/WjFi3ZdhdnzqxiG38) and a
[serviceberry](https://goo.gl/maps/Ma7ErKrC9TMTSWmu8) have eleven---and
things of municipal importance, such as whether there are overhead
wires, or whether the roots are emerging through the sidewalk.
I also included data about invasiveness, by grabbing the
[list](https://www.mass.gov/doc/prohibited-plant-list-sorted-by-scientific-name/download){.pdf}
of species considered invasive in Massachusetts, copying the scientific
names to a [text
file](https://github.com/alexklapheke/trees/blob/master/prohibited-species.txt),
and merging.
``` {.python}
invasive_species = pd.read_csv("prohibited-species.txt")
invasive_species["Invasive"] = True
trees = pd.merge(
left = trees,
right = invasive_species,
how = "left",
on = ["Genus", "species"]
)
trees["Invasive"].fillna(False, inplace=True)
```
The `SiteType` column helpfully tells us whether the tree is, in fact, a
tree at all, or a stump, a plot being prepared for a tree, etc. After
cleaning, I removing all of the latter cases---although really, the
cleaning was a process ongoing until the end.
# Varieties {#sec:var}
The obvious first thing to do after cleaning is look at what varieties
line the city's streets. @fig:gen shows the most popular genera as
recorded in the dataset (their corresponding common names are in
[Appendix A](#appendix-a-common-names-of-trees)).
![The top 20 genera of street tree in
Cambridge](images/da3f95eb33086953a47a0eb4da5320ef1f195493.svg){#fig:gen}
However, just knowing that the streets are lined with maples
(*Acer* spp.) doesn't tell us if they are
towering, stately silver maples (*A.
saccharinum*), or splendiferous red maples
(*A. rubrum*), or invasive Norway maples
(*A. platanoides*), nor does knowing
there are over 3,000 oaks (*Quercus* spp.) cast
light on which of the [over 600
species](https://en.wikipedia.org/wiki/List_of_Quercus_species) they
comprise. We can break this graph down further by species (@fig:spec).
![The top 20 genera of street tree in Cambridge, broken down by species
(hover over each bar to see the specific
name)](images/c3b8cbbaf16380c0cd1dda95a99cc6148d2d6918.svg){#fig:spec}
There are some interesting takeaways here. For instance, not a single
public apple tree (genus *Malus*) has been
identified for species. This indicates that the data was not logged at
the time of planting; possibly, the species were identified by sight in
a later survey (ornamental apple trees are typically hard-to-identify
hybrids). Some of the "unknown species" labels are clearly in error.
*Liquidambar*, for example, contains only
four species, only one of which (*L.
styraciflua*) is grown ornamentally in the U.S., and is easily
distinguished from its relatives by its five-pointed leaves.[^2] More
alarmingly, *Ginkgo biloba* is the only
extant species in its entire class (a sister clade to the conifers), but
79 specimens are missing species information, and one is apparently of
*Ginkgo triacanthos*! Presumably, city arborists have not discovered a
new living fossil, but rather this row has gotten mixed up somehow
(maybe during merging) with a honeylocust,
*Gleditsia triacanthos*. We also see
such botanical novelties as *Ulmus cordata*, *Tilia calleryana*, and,
amusingly, *Acer acerifolia*---literally, a maple-leafed maple.
Besides the missing and spurious data, we see that over a third of
city-planted maples are Norway maples, which are now
[illegal](https://www.mass.gov/doc/prohibited-plant-list-sorted-by-scientific-name/download){.pdf}
in Massachusetts due to their propensity to crowd out native species. In
all, 2,282 trees considered invasive, representing five species, are
present in the dataset.
Invasive plants are not the only menace to biodiversity. We are lucky to
have over a thousand elms in Cambridge---after Dutch elm disease ravaged
populations in the 1960s and '70s, disease-resistant cultivars of
*Ulmus americana* were developed
which, along with fungicide, managed the destruction. The chestnut
blight fungus, introduced from East Asia around the turn of the 20th
century, has made the American chestnut
(*Castanea dentata*) all but
extinct, with a onetime population of several billion dwindling to a few
hundred today. So I was surprised to see four trees in the genus
*Castanea* in the dataset, but less
surprised when three turned out to be misidentified horse chestnuts
(*Aesculus hippocastanum*),[^3] and
the fourth was... wait for it... [a Norway
maple](https://goo.gl/maps/2Jar5K1Qrc5mFW5RA).
# Mapping
Every tree in the dataset has an associated latitude and longitude.
Indeed, the coordinates given for each tree display [incredible
precision](https://xkcd.com/2170/), giving no fewer than 15 decimal
places, making them [accurate](http://gis.stackexchange.com/a/8674) to
about 100 picometers, or approximately an atomic radius (I have no idea
how they chose which atom of the tree to use as a reference). In order
to map trees against Cambridge's [basemap
shapefiles](https://www.cambridgema.gov/GIS/gisdatadictionary/Basemap/BASEMAP_Roads),
I used [PyShp](https://pypi.org/project/pyshp/) to parse the shapefiles
and [PyCRS](https://pypi.org/project/PyCRS/) and
[PyProj](https://pypi.org/project/pyproj/) to convert from
[WGS84](https://epsg.io/4326) (i.e., standard latitude and longitude) to
[NAD 1983](https://epsg.io/102686), the coordinate system used in the
shapefiles.
``` {.python}
from pyproj import Transformer
# Convert between coordinate systems
transformer = Transformer.from_crs(
# WGS84 (EPSG:4326)
"+proj=longlat +a=6378137.0 +rf=298.257223563 +pm=0 +nodef",
# NAD 1983 StatePlane Massachusetts Mainland FIPS 2001 Feet
"+proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 " +
"+lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000.0000000001 " +
"+ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 no_defs"
)
# Convert coordinates of invasive trees and plot on basemap
for label in invasives["label"].unique():
plt.scatter(
*transformer.transform(
invasives[invasives["label"] == label]["lon"].to_list(),
invasives[invasives["label"] == label]["lat"].to_list()
),
)
```
In @fig:invmap, we can see the locations of invasive species of tree.
Norway maples (*Acer platanoides*) are
predictably strewn across the city, but we see clusters of trees of
heaven (*Ailanthus altissima*), and
a few sizeable black locusts (*Robinia
pseudoacacia*), their impressive diameters indicating older
trees. Of course, many more specimens of these noxious species exist on
private land, and many of the worst invasive species are not trees at
all, such as reeds (*Phragmites
australis*) and garlic mustard
(*Alliaria petiolata*).
![Locations of public trees in Cambridge considered invasive. Marker
size is proportional to tree diameter.\
\
*Acer
platanoides* (2,121)\
*Robinia
pseudoacacia* (55)\
*Phellodendron
amurense* (54)\
*Ailanthus
altissima* (50)\
*Rhamnus
cathartica*
(2)](images/789518acbec0030f9ca769c2bd1425d7020ff6b6.svg){#fig:invmap}
To see what trees are popular in different areas, I downloaded the
[neighborhood boundary
shapefiles](https://www.cambridgema.gov/GIS/gisdatadictionary/Boundary/BOUNDARY_CDDNeighborhoods),
and used [Shapely](https://pypi.org/project/Shapely/) to categorize each
latitude--longitude pair as being in one of the resulting polygons
(dismayingly, although the dataset has a `Neighborhood` column, it
doesn't contain a single data point). By filtering, we can see what the
most popular species is in each neighborhood.
![Most popular tree per neighborhood.\
\
*Gleditsia
triacanthos*\
*Acer*
(unknown species)\
*Acer
rubrum*\
*Fraxinus
americana*\
*Acer
platanoides*](images/66ffa614bb35b3a5299db00ebbc21343f8fb4791.svg){#fig:neighmap}
We can get an even more granular look by plotting each individual tree:
![Distributions of each neighborhood's most popular tree. Marker size is
proportional to tree diameter.\
\
*Gleditsia
triacanthos*\
*Acer*
(unknown species)\
*Acer
rubrum*\
*Fraxinus
americana*\
*Acer
platanoides*](images/1364a7eecb3d2e9dc8c53b6b1f6a933e83c610d8.svg){#fig:neighmapdots}
Here we see some interesting patterns. The unidentified maples so
popular around MIT almost exclusively line the Charles River
embankment.[^4] The white ashes (*Fraxinus
americana*) in the Cambridge Highlands are concentrated in the
[Lusitania
Woods](https://www.google.com/maps/place/Lusitania+Field/@42.3879328,-71.1463062,17z)
by Fresh Pond. The other neighborhoods, though, are more chaotic, are
favor generally popular choices such as honeylocust. Mapping trees by
census block brings this wonderful chaos to life:
![Most popular tree per [census
block](https://en.wikipedia.org/wiki/Census_block).](images/aaf055b3f1d38d8ff81ceba1ba6ab2e5499fd4ba.svg){#fig:blockmap}
# Planting dates
As mentioned above, only 17% of trees have a plant date listed, making
it difficult to find trends. Of course, it is still interesting to
examine the data we have---@fig:dates shows recorded plantings for the
genera in @fig:gen starting in 2007 since there are only a handful of
data points before then.
![Planting rate per year of 20 most popular genera since 2007, for those
recorded.](images/bb9014a71a6b7a803aa64954ad3f1c068d957971.svg){#fig:dates}
In particular, it is to determine when invasives were planted---of the
2,282 invasive trees in the dataset, only five Norway maples have a date
listed. This is unsurprising---the "invasiveness" of invasives is
precisely their ability to aggressively reproduce. However, even of
these, we can see in @fig:acerdates that two were planted in 2013 and
2014, respectively, well after the [2006
law](https://www.mass.gov/service-details/prohibited-plant-list-background)
prohibiting such species took effect.
![Planting dates of *Acer
platanoides*, for those recorded.\
\
Law
banning invasive species in
effect](images/6f25db9880909aa2c085b5cfc027f4e5d5d541f1.svg){#fig:acerdates}
There are some interesting trends visible in @fig:dates---a recent fad
for magnolias, a steady increase in serviceberry plantings
(*Amelanchier* spp.), and spikes in
pear tree plantings (nearly all of which are Callery pears,
*Pyrus calleryana*) in 2009 and
2017---although given the paucity of data, it is impossible to say if
they are real.
We can get another sense of trends by looking at location, rather than
genus.
![Planting dates of trees by location since 2007, for those with a date
listed. Marker size is proportional to tree diameter.\
\
2007\
\
2020](images/921f4433c60b16c9687e3c112f42bb36cc06f8fd.svg){#fig:mapdates}
We can see several strings of orange and yellow where a whole row of
trees was planted all at once. In addition, unsurprisingly, the largest
trees tend to be the oldest, shown in purple.
There is much more I would like to do with this data---and much more,
and cleaner, data I'd like to have---but I've learned a lot about
Cambridge's trees, especially about the prevalence of invasive species
on my local streets.
# Appendix A: Common names of trees {#appendix-a-common-names-of-trees .unnumbered}
Genus Common name Family (APG IV [@apg4]) Population
------------------ --------------------- ------------------------- ------------
*Abies* fir Pinaceae 46
*Acer* maple Sapindaceae 4,875
*Aesculus* horse chestnut Sapindaceae 84
*Ailanthus* tree of heaven Simaroubaceae 67
*Amelanchier* serviceberry Rosaceae 499
*Betula* birch Betulaceae 374
*Carpinus* hornbeam Betulaceae 205
*Carya* hickory Juglandaceae 7
*Castanea* chestnut Fagaceae 4
*Catalpa* catalpa Bignoniaceae 27
*Cedrus* cedar Pinaceae 2
*Celtis* hackberry Cannabaceae 155
*Cercidiphyllum* katsura Cercidiphyllaceae 79
*Cercis* redbud Fabaceae 169
*Chionanthus* fringetre Oleaceae 5
*Cladrastis* yellowwood Fabaceae 122
*Cornus* dogwood Cornaceae 504
*Corylus* hazel Betulaceae 3
*Cotinus* smoketree Anacardiaceae 16
*Crataegus* hawthorn Rosaceae 82
*Enkianthus* enkianthus Ericaceae 1
*Eucommia* Chinese rubber tree Eucommiaceae 51
*Fagus* beech Fagaceae 130
*Fraxinus* ash Oleaceae 1,427
*Ginkgo* ginkgo Ginkgoaceae 467
*Gleditsia* honeylocust Fabaceae 2,637
*Gymnocladus* coffeetree Fabaceae 171
*Halesia* silverbell Styracaceae 9
*Hamamelis* witch hazel Hamamelidaceae 46
*Ilex* holly Aquifoliaceae 33
*Juglans* walnut Juglandaceae 19
*Juniperus* juniper Cupressaceae 104
*Koelreuteria* golden rain tree Sapindaceae 106
*Laburnum* golden rain tree Fabaceae 1
*Larix* larch Pinaceae 17
*Liquidambar* sweetgum Altingiaceae 377
*Liriodendron* tuliptree Magnoliaceae 177
*Maackia* maackia Fabaceae 26
*Magnolia* magnolia Magnoliaceae 244
*Malus* apple Rosaceae 762
*Metasequoia* dawn redwood Cupressaceae 49
*Morus* mulberry Moraceae 33
*Nyssa* tupelo Nyssaceae 25
*Ostrya* hophornbeam Betulaceae 40
*Oxydendrum* sourwood Ericaceae 15
*Parrotia* ironwood Hamamelidaceae 8
*Phellodendron* cork tree Rutaceae 59
*Picea* spruce Pinaceae 105
*Pinus* pine Pinaceae 964
*Platanus* sycamore Platanaceae 1,244
*Populus* poplar/aspen Salicaceae 192
*Prunus* cherry/plum Rosaceae 1,092
*Pseudotsuga* Douglas fir Pinaceae 11
*Ptelea* hoptree Rutaceae 1
*Pyrus* pear Rosaceae 1,072
*Quercus* oak Fagaceae 3,015
*Rhamnus* buckthorn Rhamnaceae 4
*Robinia* black locust Fabaceae 59
*Salix* willow Salicaceae 38
*Sassafras* sassafras Lauraceae 1
*Sciadopitys* umbrella pine Sciadopityaceae 1
*Sorbus* rowan Rosaceae 1
*Stewartia* tea tree Theaceae 19
*Styphnolobium* pagoda tree Fabaceae 357
*Styrax* storax Styracaceae 1
*Syringa* lilac Oleaceae 422
*Taxus* yew Taxaceae 26
*Thuja* arborvitae Cupressaceae 142
*Tilia* linden Malvaceae 1,544
*Tsuga* hemlock Pinaceae 101
*Ulmus* elm Ulmaceae 1,204
*Viburnum* viburnum Adoxaceae 20
*Zelkova* zelkova Ulmaceae 627
: Common names of the tree genera found in Cambridge. Note that, in
general, there is not a one-to-one correspondence between common and
scientific names, which is why the latter are generally preferred.
{\#tbl:latin}
[^1]: They've also compiled it into quite a nice [browsable
map](https://gis.cambridgema.gov/dpw/trees/trees_walk.html).
[OpenTrees.org](https://opentrees.org/) maintains a collection of
such datasets (possible fodder for future projects) as well as a
browsable map of all of them.
[^2]: Like so
([source](https://commons.wikimedia.org/wiki/File:NAS-062-c_Liquidambar_styraciflua.png))
![](images/0673b4f24fdb0df10c541165d0e50cb6e8a31635.jpg)
[^3]: An unrelated tree whose toxic fruit (left) looks very similar to
that of a true chestnut, *C.
sativa* (right) (sources:
[1](https://commons.wikimedia.org/wiki/File:Illustration_Aesculus_hippocastanum0.jpg),
[2](https://commons.wikimedia.org/wiki/File:Illustration_Castanea_sativa0.jpg)).
![](images/ca21e35e6c5bc350acb32186f4e14916d5096c2e.jpg)
[^4]: Although unidentified, we can presume based on the placement that
these are of the same species.