{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Exercise: Geospatial Data Structures in Python"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"In discussions about Python you will often read statements like this:\n",
"\n",
"\n",
"\n",
"[Source: GitHub Issue - rasterio vs python gdal #11](https://github.com/inbo/niche_vlaanderen/issues/11)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"### In this notebook, you will ...\n",
"1. write a class representing a Polygon in Python,\n",
"2. profile the execution time of your class and compare it to shapely and ogr implementations,\n",
"3. draw conclusions about the advantages and disadvantages of using ogr, shapely or your own implementation,\n",
"4. and finally, explain what _[Pythonic](https://docs.python-guide.org/writing/style/)_ means."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Comparing ogr, shapely and pure Python"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We will compare three different classes to **represent a polygon** in Python and how to **calculate its bounding box.**\n",
"1. The **`Geometry` class provided by `ogr` package**.\n",
"2. The **`Polygon` class provided by the `shapely` package.**\n",
"3. A simplified **Python implementation of a `Polygon` class** written by yourself."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"#### The three classes should be compared in regard to:\n",
"* Execution time\n",
"* Programming time (how long did it take to write the code)\n",
"* Readability\n",
"* Flexibility (how easily can you adapt it to your needs)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:36:01.853924Z",
"start_time": "2024-07-24T13:36:01.851040Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import shapely\n",
"\n",
"test_coordinates = [(0,1.5),(0.5,1),(1,1.5),(0.5,2),(0,1.5)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 1. OGR package"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__E 1.1:__ Use the `ogr` package to create a polygon geometry object using the same `test_coordinates` as above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__E 1.2:__ Calculate the envelope of the polygon. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 2. Shapely package"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__E 2.1:__ Use `shapely` package to create a polygon object using the same `test_coordinates` as above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__E 2.2:__ Calculate the bounding box of the shapely polygon."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__E 2.3:__ Plot the two polygons using matplotlib."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 3. Do-It-Yourself `Polygon` class"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Theory: Classes and objects in Python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class Person:\n",
" age = None\n",
" name = None\n",
" \n",
" def is_adult(self):\n",
" if self.age >= 18:\n",
" return True\n",
" else:\n",
" return False"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"person_a = Person()\n",
"person_a.is_adult()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__E 3.1:__ Write a class called `MyPolygon`. It should be created by passing a list of coordinates representing the nodes of the polygon, e.g. `MyPolygon(coordinates=[(1,1),(1,2),(2,2),(2,1),(1,1))])`. Calling `polygon.coordinates` should return the coordinates."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_poly = MyPolygon(coordinates=test_coordinates)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Test whether the instance of class `Polygon` returns the right value by executing the cell below. If there is no error message, everything's fine. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"assert my_poly.coordinates == test_coordinates"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__E 3.2:__ Add a method called `envelope` to the class which calculates the bounding box of the polygon. The method should return the bounding box as a list containing the minimum and maximum coordinates of the bounding box, i.e.`[minimum_x, minimum_y, maximum_x, maximum_y]`.\n",
"\n",
"__Note:__ A bounding box is often reffered to as the envelope of a geometry.\n",
"\n",
"Check if your results are the same as above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 4. Comparison of execution times\n",
"Compare the exectution times of the ogr, shapely and your own class using the magic command `%%timeit`. This function will execute the cell multiple times to get a good estimate of the execution time. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__E 4.1:__ Compare the execution times of the calculation of the bounding box of all three classes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_poly = MyPolygon(test_coordinates)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"%%timeit\n",
"my_poly.envelope()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"__Question:__ What do you observe when you compre the execution times of all three methods? Can you explain the difference in execution times?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### Comparison bounding box creation and object creation\n",
"\n",
"When choosing the most efficient way to calculate something, we also need to consider the overhead of the calculation. The overhead contains all the processing steps that need to be taken as a preparation before the execution of the desired calculation. Depending on the implementation, this can change your decision. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__E 4.2:__ Measure the execution times of all three implementations __including the object creation.__ "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"%%timeit\n",
"my_poly = MyPolygon(test_coordinates)\n",
"my_poly.envelope()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__Question:__ What do you observe when you compare the execution times and the object creation of all three polygon implementations? Can you explain the difference in execution times?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"__Your Answer:__"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 5. Questions for Discussion"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__Q1:__ Based on your results of the exercises, evaluate the three methods in the table below. (Double click the cell, to edit it)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"| | Execution time
(fast - slow)| Programming time
(fast - slow)| Readability
(high - low) | Flexibility
(high - low) |\n",
"|------------|--------------|------------------|-------------|-------------------|\n",
"| __DIY Python__ | | | | |\n",
"| __OGR__ | | | | |\n",
"| __Shapely__ | | | | |"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"__Q2:__ What are the advantages and disadvantages of using ogr, shapely or your own implementation for vector data processing? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Your answers:__"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"### OGR and GDAL Python bindings\n",
"* `ogr` and `gdal` are automatically generated Python bindings (using SWIG) to the C libraries GDAL and OGR \n",
"* So when you use `ogr` and `gdal` classes in Python you are actually executing C code. \n",
"* Still, the syntax or `ogr` and `gdal` doesn't really feel like Python (e.g. it needs many lines of code) \n",
"* `shapely` is a native Python package and is therefore closer to the way Python code is supposed to be written"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### But wait this is not the end.\n",
"\n",
"Why are we still mostly using shapely even though it is so slow?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**E.:** Create 1000 random polygons and calculate the bounding box of each polygon. Compare the execution times of all three classes. Do this separately using the 'ogr' implementation and the 'shapely' implementation. "
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:40:38.380007Z",
"start_time": "2024-07-24T13:40:38.378196Z"
}
},
"outputs": [],
"source": [
"n = 1000000"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Shapely"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:49:50.903043Z",
"start_time": "2024-07-24T13:49:37.023369Z"
}
},
"outputs": [],
"source": [
"# Create 1000 random polygons using shapely \n",
"import random \n",
"import shapely.geometry as sg\n",
"\n",
"random_shapely_polygons = [sg.Polygon([(random.uniform(0, 10), random.uniform(0, 10)) for i in range(5)]) for j in range(n)] "
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:49:54.386483Z",
"start_time": "2024-07-24T13:49:50.904192Z"
}
},
"outputs": [],
"source": [
"# Loop over all polygons and calculate the area\n",
"# Shapely\n",
"areas_shapely = np.empty(n)\n",
"for i, poly in enumerate(random_shapely_polygons):\n",
" areas_shapely[i] = shapely.area(poly)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OGR"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:50:28.196661Z",
"start_time": "2024-07-24T13:50:22.197443Z"
}
},
"outputs": [],
"source": [
"# Create 1000 random polygons using shapely \n",
"import random \n",
"\n",
"random_ogr_polygons = []\n",
"for i in range(n):\n",
" start_point = (random.uniform(0, 10), random.uniform(0, 10))\n",
" coordinates = [start_point] + [(random.uniform(0, 10), random.uniform(0, 10)) for i in range(3)] + [start_point]\n",
" new_polygon = create_polygon(coordinates)\n",
" random_ogr_polygons.append(new_polygon)"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:50:28.515615Z",
"start_time": "2024-07-24T13:50:28.197560Z"
}
},
"outputs": [],
"source": [
"# Loop over all polygons and calculate the area\n",
"areas = np.empty(n)\n",
"for i, poly in enumerate(random_ogr_polygons):\n",
" areas[i] = poly.GetArea()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Geopandas"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:49:59.014303Z",
"start_time": "2024-07-24T13:49:57.082201Z"
}
},
"outputs": [],
"source": [
"# Create a geodataframe with n random polygons\n",
"import geopandas as gpd\n",
"random_polygons_df = gpd.GeoDataFrame({'geometry': random_shapely_polygons})"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:49:59.037585Z",
"start_time": "2024-07-24T13:49:59.015449Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 4.954251\n",
"1 8.478507\n",
"2 18.045323\n",
"3 6.314120\n",
"4 5.871313\n",
" ... \n",
"999995 6.119961\n",
"999996 16.395933\n",
"999997 12.813947\n",
"999998 12.657971\n",
"999999 5.360472\n",
"Length: 1000000, dtype: float64"
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random_polygons_df.area"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What kind of geometry data type is GeoPandas using?"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {
"ExecuteTime": {
"end_time": "2024-07-24T13:52:58.182064Z",
"start_time": "2024-07-24T13:52:58.178537Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"shapely.geometry.polygon.Polygon"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(random_polygons_df.geometry[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It uses shapely but somehow the calculation is now faster than OGR. It uses a vectorization. We will learn tomorrow what this is and how it works.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## References\n",
"\n",
"If you would like to take a deeper look at object oriented programming in Python take a look at the following resources: \n",
"\n",
"* [Object Oriented Programming in Python](https://github.com/TheDigitalCatOnline/thedigitalcatonline.github.com/tree/master/notebooks)\n",
"* [Python Tutorial](https://docs.python.org/3/tutorial/classes.html)\n",
"* [Abstract Classes](https://docs.python.org/3/library/abc.html#module-abc)\n",
"* [Computational Geometry in Python: From Theory to Application](https://www.toptal.com/python/computational-geometry-in-python-from-theory-to-implementation\n",
")\n",
"\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}