\n", "\n", "**REMEMBER:**
\n", "\n", "* A `GeoDataFrame` allows to perform typical tabular data analysis together with spatial operations\n", "* A `GeoDataFrame` (or *Feature Collection*) consists of:\n", " * **Geometries** or **features**: the spatial objects\n", " * **Attributes** or **properties**: columns with information about each spatial object\n", "\n", "
\n", "\n", "**REMEMBER**:
\n", "\n", "Single geometries are represented by `shapely` objects:\n", "\n", "* If you access a single geometry of a GeoDataFrame, you get a shapely geometry object\n", "* Those objects have similar functionality as geopandas objects (GeoDataFrame/GeoSeries). For example:\n", " * `single_shapely_object.distance(other_point)` -> distance between two points\n", " * `geodataframe.distance(other_point)` -> distance for each point in the geodataframe to the other point\n", "\n", "
\n", "\n", "**EXERCISE**:\n", "\n", "We will start with exploring the bicycle station dataset (available as a GeoPackage file: `data/paris_bike_stations_mercator.gpkg`)\n", " \n", "* Read the stations datasets into a GeoDataFrame called `stations`.\n", "* Check the type of the returned object (with `type(..)`)\n", "* Check the first rows of the dataframes. What kind of geometries dooes this datasets contain?\n", "* How many features are there in the dataset? (hint: use the `.shape` attribute)\n", " \n", "
Hints\n", "\n", "* The geopandas.read_file() function can read different geospatial file formats. You pass the file name as first argument.\n", "\n", "
\n", " \n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data1.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data2.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data3.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data4.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "* Make a quick plot of the `stations` dataset.\n", "* Make the plot a bit larger byt setting the figure size to (12, 6) (hint: the `plot` method accepts a `figsize` keyword).\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data5.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A plot with just some points can be hard to interpret without any spatial context. Therefore, in the next exercise we will learn how to add a background map.\n", "\n", "We are going to make use of the [contextily](https://github.com/darribas/contextily) package. The `add_basemap()` function of this package makes it easy to add a background web map to our plot. We begin by plotting our data first, and then pass the matplotlib axes object (returned by dataframe's `plot()` method) to the `add_basemap()` function. `contextily` will then download the web tiles needed for the geographical extent of your plot.\n", "\n", "\n", "\n", "\n", "
\n", "\n", "**EXERCISE**:\n", "\n", "* Import `contextily`.\n", "* Re-do the figure of the previous exercise: make a plot of all the points in `stations`, but assign the result to an `ax` variable.\n", "* Set the marker size equal to 5 to reduce the size of the points (use the `markersize` keyword of the `plot()` method for this).\n", "* Use the `add_basemap()` function of `contextily` to add a background map: the first argument is the matplotlib axes object `ax`.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data6.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data7.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "* Make a histogram showing the distribution of the number of bike stands in the stations.\n", "\n", "
\n", " Hints\n", "\n", "* Selecting a column can be done with the square brackets: `df['col_name']`\n", "* Single columns have a `hist()` method to plot a histogram of its values.\n", " \n", "
\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data8.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "Let's now visualize where the available bikes are actually stationed:\n", " \n", "* Make a plot of the `stations` dataset (also with a (12, 6) figsize).\n", "* Use the `'available_bikes'` colums to determine the color of the points. For this, use the `column=` keyword.\n", "* Use the `legend=True` keyword to show a color bar.\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data9.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "Next, we will explore the dataset on the administrative districts of Paris (available as a GeoJSON file: \"data/paris_districts_utm.geojson\")\n", "\n", "* Read the dataset into a GeoDataFrame called `districts`.\n", "* Check the first rows of the dataframe. What kind of geometries does this dataset contain?\n", "* How many features are there in the dataset? (hint: use the `.shape` attribute)\n", "* Make a quick plot of the `districts` dataset (set the figure size to (12, 6)).\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data10.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data11.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data12.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data13.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**EXERCISE**:\n", " \n", "What are the largest districts (biggest area)?\n", "\n", "* Calculate the area of each district.\n", "* Add this area as a new column to the `districts` dataframe.\n", "* Sort the dataframe by this area column for largest to smallest values (descending).\n", "\n", "
Hints\n", "\n", "* Adding a column can be done by assing values to a column using the same square brackets syntax: `df['new_col'] = values`\n", "* To sort the rows of a DataFrame, use the `sort_values()` method, specifying the colum to sort on with the `by='col_name'` keyword. Check the help of this method to see how to sort ascending or descending.\n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data14.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data15.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "clear_cell": true }, "outputs": [], "source": [ "# %load _solved/solutions/01-introduction-geospatial-data16.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**EXERCISE**:\n", "\n", "* Add a column `'population_density'` representing the number of inhabitants per squared kilometer (Note: The area is given in squared meter, so you will need to multiply the result with `10**6`).\n", "* Plot the districts using the `'population_density'` to color the polygons. For this, use the `column=` keyword.\n", "* Use the `legend=True` keyword to show a color bar.\n", "\n", "