# Quickstart ## Load packages Once packages are installed, we are ready to load them. ```{julia} using MissingLinks, GeoDataFrames, Plots, Mmap, Graphs, StatsBase, GeoFormatTypes, DataFrames ``` ## Loading data Next, we will read our data. To keep things fast for this example, we are working with a tiny segment of the Charlotte pedestrian network, in the [Northlake area](https://www.openstreetmap.org/#map=15/35.34618/-80.85734). We need data on the pedestrian network, and origins and destinations. In this case we are using residential parcels as origins, and a selection of commercial destinations in the Northlake area digitized from OpenStreetMap as destinations. The data are included with the MissingLinks package; running ```{julia} MissingLinks.get_example_data() ``` will create a folder `example_data` in the current working directory, containing the data. If you are using your own data, you won't need to do this - but I recommend starting with the example data to ensure that the software is working correctly. All of our layers are already projected to EPSG:32119 (State Plane North Carolina, meters). I recommend using a meter-based coordinate system. The system will not work correctly with a geographic coordinate system, and has not been tested with feet. Next, we can load the data. In most cases, you'll have three files: the network itself, as well as data on origins and destinations for the access metric. ```{julia} sidewalks = GeoDataFrames.read("example_data/Northlake.gpkg") parcels = GeoDataFrames.read("example_data/Northlake_Parcels.gpkg") destinations = GeoDataFrames.read("example_data/Northlake_Destinations.gpkg") ``` We can plot what we read in: ```{julia} # sidewalks have an elevation component that we need to remove to plot sidewalks.geom = MissingLinks.remove_elevation!.(sidewalks.geom) plot(parcels.geom, color="#00f", legend=false, aspect_ratio=:equal) plot!(sidewalks.geom, color="#f00") plot!(destinations.geometry, color="#f00") ``` ## Building the network Next, we need to convert our network GIS layer into a mathematical "graph"—a data structure of nodes and edges representing intersections and streets. This network dataset is already "fully noded"—i.e. all intersections occur at the ends of line segments. The tool can also work with "semi-noded" data—where all intersections are at the end of one line segment, but may be in the middle of another. To use that kind of data, run the preprocessing function [`semi_to_fully_noded`](@ref) before building the graph. To build the graph, we will use the [`graph_from_gdal`](@ref) function, which will convert GeoDataFrames into a graph. ```{julia} graph = graph_from_gdal(sidewalks) ``` It is possible to specify more than one layer (for example, if you have sidewalks and crosswalks in separate layers) by separating the layers with commas. It is also possible to set a tolerance for when nodes are considered coincident. However, I recommend leaving this at the default small value and using [`add_short_edges!`](@ref) to close such gaps, to avoid issues that are described in the linked documentation. Similarly, [`remove_tiny_islands`](@ref) can be used to remove small "islands" in the graph that may be data errors. There are also tools for troubleshooting the graph, namely [`find_dead_ends`](@ref) to find locations that are dead ends, and [`find_disconnected_crossings`](@ref) to find places where sidewalks cross without intersecting. While both of these things will be present in a correct graph, they may also represent data errors. ## Assigning weights to nodes Next, we need to assign weights to the nodes. We will create two sets of weights, for origins and destinations. For origins, the weights are just the number of units on the parcel. The weight of each parcel will get divided among the graph edges within 20 meters, and then the weight for each edge will be evenly divided between the nodes it connects. `[:,1]` retrieves the first column of the weights (if multiple weight columns are specified instead of just units, there will be one column in the output for each column specified. This avoids re-doing computationally-intensive geographic calculations if there are more weight columns—for instance, if both our origins and our destinations were coded in the parcel layer, or if we wanted to measure accessibility from multiple types of origin). ```{julia} origin_weights = create_graph_weights(graph, parcels, [:units], 20)[:,1] ``` We can look at what proportion of the units are served by sidewalks and therefore were assigned to the network. Some units will not be assigned to any edge and therefore will not count towards origin_weights. ```{julia} sum(origin_weights) / sum(parcels.units) ``` We can similarly create our destination weights. There is no weight column here—all of the destinations are equally weighted. However, we need a weight column so we just create a column of all ones. Since destinations are further from the sidewalk network due to parking lots, we use a 100 meter distance threshold. In the paper we use parcels and a 20 meter threshold, on the assumption that the edges of parcels will be close to the road, but here we have point data on destinations. ```{julia} destinations.one .= 1 destination_weights = create_graph_weights(graph, destinations, [:one], 100)[:, 1] sum(destination_weights) / sum(destinations.one) ``` ## Creating the distance matrix A point-to-point distance matrix is used extensively throughout the algorithm for identify and scoring links. We store distances in this matrix as UInt16 integer meters, to save memory (each UInt16 can represent values from 0-65535, and takes only two bytes of memory, as opposed to four or eight for traditional integers and longs). For a network as small as the one we're working with here, it would be possible to store this distance matrix in memory. However, in real-world applications, often the matrix will be larger than available memory, so storing it on disk and using memory-mapping of the disk file is recommended. Memory-mapping lets the computer treat a portion of the disk as if it were memory, and the CPU and the operating system work to make sure that the data are available quickly when needed. ```{julia} matrix_file = open("quickstart_distances.bin", "w+") matrix = Mmap.mmap(matrix_file, Matrix{UInt16}, (nv(graph), nv(graph))) # this just makes sure the matrix is filled with 0s before we begin. This should not make any real difference as all values should be overwritten but is just a good practice. fill!(matrix, 0) fill_distance_matrix!(graph, matrix) ``` ## Link identification We are now ready to identify missing links. The [`identify_potential_missing_links`](@ref) will do this and return a vector of places in the network that meet the specified criteria (in this case, no longer than 100 meters, and with a network distance of 1000 meters or more; you can change these values in the code below). ```{julia} all_links = identify_potential_missing_links(graph, matrix, 100, 1000) ``` We can plot the links identified (shown here in blue): ```{julia} all_links_geo = links_to_gis(graph, all_links) plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal) plot!(all_links_geo.geometry, color="#4B9CD3") ``` ## Link deduplication Clearly, most of these links are essentially duplicates. The next step is to deduplicate them, by merging links where both ends are within 100m of one another (configurable by changing the number below). ```{julia} links = deduplicate_links(graph, all_links, matrix, 100) ``` We can now plot these deduplicated links: ```{julia} links_geo = links_to_gis(graph, links) plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal) # set line width wider to make the links easier to see plot!(links_geo.geometry, color="#4B9CD3", lw=2) ``` ## Link scoring The final step is to score the identified links, using the [`score_links`](@ref) function. For this we need to specify what our origin and destination weights are, as well as our distance decay function. Here, I use a 1-mile (1609 meter) cutoff. This will return a vector with the score for each link. We have to specify both the distance decay function (first argument) and the point at which that function goes to zero (last argument); for a smooth decay function, I recommend instead specifying a piecewise function that goes to zero when the result is small enough that additional destinations do not materially affect accessibility. ```{julia} link_scores = score_links(d -> d < 1609, graph, links, matrix, origin_weights, destination_weights, 1609) ``` ## Saving links to GIS Generally, you will want to export your results to GIS for further analysis, including the scores. You can create a GeoDataFrame including the scores like this: ```{julia} links_geo = links_to_gis(graph, links, :score=>link_scores) ``` You can then write this to GIS file like this: ```julia GeoDataFrames.write("links.gpkg", links_geo) ``` We can also use it in further plotting. For example, this plot shows the top five links (the `competerank` function just ranks values largest to smallest). ```{julia} plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal) # set line width wider to make the links easier to see plot!(links_geo[competerank(links_geo.score, rev=true) .<= 5, :geometry], color="#4B9CD3", lw=2) ``` The new links that cross the road on the east are most valuable from an accessibility standpoint. This is likely in part due to the accessibility metric we chose; the links in the west are simply too far from the destinations to be meaningful for increasing access within 1 mile. ## Link-level analyses To understand the impact of an individual link, MissingLinks provides two functions: routing and regional access calculations (isochrone support is planned). Each one is a higher level of aggregation than the last. First, we'll find the highest scoring link, though these same procedures would work with any link. We also plot it so we can see what we're working with: ```{julia} best_link = links[argmax(link_scores)] plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal) # set line width wider to make the link easier to see plot!(links_geo.geometry[argmax(link_scores)], color="#4B9CD3", lw=4) ``` The link is a crossing in the southeast part of the graph. ### One-to-one routes The lowest level of aggregation is the one-to-one route. This allows us to visualize the impact of the link on a particular trip from one location to another. Next, we'll create a route from one point to another (specified as latitude/longitude): ```{julia} # all routing functions require an edge index # we can build it once and cache it; this is optional # but vastly improves performance with large graphs index = MissingLinks.index_graph_edges(graph) # now, we can do the route without the link route_without_link = route_one_to_one( graph, (35.3393, -80.8620), # origin (35.3408, -80.8555), # destination crs=GeoFormatTypes.EPSG(32119), # the projection of your graph index=index # the prebuilt index ) ``` #### Including the link `route_one_to_one` always routes only on the edges in the graph. To use a potential link, we need to create a new graph that has the link in it. We can use `realize_graph` to create a new graph with the link added (we can also specify multiple links, by separating them with commas inside the `[]`): ```{julia} graph_with_link = realize_graph(graph, [best_link]) ``` And then we can use that to create the route. If you are only doing one route, you could leave off the line that creates the index, and remove the `index` parameter to `route_one_to_one`; the index will then be automatically created. ```{julia} index_with_link = MissingLinks.index_graph_edges(graph_with_link) route_with_link = route_one_to_one( graph_with_link, (35.3393, -80.8620), # origin (35.3408, -80.8555), # destination crs=GeoFormatTypes.EPSG(32119), # the projection of your graph index=index_with_link # the prebuilt index ) ``` And we can plot the result, with the original route in blue and the new one in red: ```{julia} plot(sidewalks.geom, color="#000", legend=false, aspect_ratio=:equal) plot!(route_without_link.geom, color="#4B9CD3", lw=4) plot!(route_with_link.geom, color="#a24040", lw=4) ``` We can also write the routes out for use in GIS (in this case, to `routes.gpkg`), by converting to a DataFrame and saving to a GIS file: ```{julia} routes = DataFrame([route_without_link, route_with_link]) metadata!(routes, "geometrycolumns", (:geom,)) # Change the CRS code to the CRS of your graph so the routes display properly in GIS GeoDataFrames.write("routes.gpkg", routes, crs=GeoFormatTypes.EPSG(32119)) ``` ### Regional access The regional access metric shows for every point around a link, how much the access from that point increases due to the link. Let's see an example: ```{julia} access = regional_access( d -> d < 1609, # decay function, we consider locations accessible that are within 1 mile (1609 meters) graph, matrix, best_link, destination_weights, # using origin_weights here would map how many people can get to each location 1609 # point at which our decay function is (essentially) zero - in this case, 1609 meters ) ``` ```{julia} # xlim and ylim just set the map extents, adjust or remove to map different areas plot(access, xlim=(439200, 441500), ylim=(177500, 179500)) # add sidewalks on top of the map plot!(sidewalks.geom, color="white") # and the link on top of that plot!(links_geo.geometry[argmax(link_scores)], color="#e84646ff", lw=4) ``` The largest increases in access happen on the west side of the link, presumably because most destinations are on the east side. We can also save this to GIS (GeoTIFF format): ```{julia} write("access.tif", access) ```