Glossary
We'll cover the following
Open data
The restaurants
dataset we use in several lessons has been provided by the DuDe team from the Hasso Plattner Institute, University of Potsdam. Many thanks for this great contribution.
We use OpenStreetMap data licensed under the ODbL license from the outputs of geocoders. We acknowledge the efforts of this awesome community of volunteers.
We use several datasets provided by the Database Group of the University of Leipzig under the Creative Commons License. We thank their contribution and invite learners to read the following two papers:
Evaluation of entity resolution approaches on real-world match problems by Köpcke, H., Thor, A., and Rahm, E.
Comparative Evaluation of Distributed Clustering Schemes for Multisource Entity Resolution by Saeedi, A., Peukert, E., and Rahm, E.
Open-source software
The GeoPandas Python package for parsing geographic data.
The RecordLinkage Python package for entity resolution.
The CatBoost Python package for classification/regression.
The PyOD Python package for outlier detection.
The scikit-surprise Python package for building recommender systems.
The Mimesis Python package for faking data.
The TextDistance Python package for a long list of distance functions for strings.
The zentity Elasticsearch plugin for real-time entity matching.
The photon Geocoder package using Elasticsearch as its backend for real-time location matching.
Search engines
Try “entity resolution” or one of its many aliases in Google’s dataset search.
Try “entity resolution survey” to find several frequently cited surveys in scholar.google.com.
Get hands-on with 1400+ tech skills courses.