Glossary

Open data

The restaurants dataset we use in several lessons has been provided by the DuDe team from the Hasso Plattner Institute, University of Potsdam. Many thanks for this great contribution.

We use OpenStreetMap data licensed under the ODbL license from the outputs of geocoders. We acknowledge the efforts of this awesome community of volunteers.

We use several datasets provided by the Database Group of the University of Leipzig under the Creative Commons License. We thank their contribution and invite learners to read the following two papers:

Open-source software

  • The GeoPandas Python package for parsing geographic data.

  • The RecordLinkage Python package for entity resolution.

  • The CatBoost Python package for classification/regression.

  • The PyOD Python package for outlier detection.

  • The scikit-surprise Python package for building recommender systems.

  • The Mimesis Python package for faking data.

  • The TextDistance Python package for a long list of distance functions for strings.

  • The zentity Elasticsearch plugin for real-time entity matching.

  • The photon Geocoder package using Elasticsearch as its backend for real-time location matching.

Search engines

  • Try “entity resolution” or one of its many aliases in Google’s dataset search.

  • Try “entity resolution survey” to find several frequently cited surveys in scholar.google.com.

Get hands-on with 1400+ tech skills courses.