In Machine Learning (ML), we use several datasets for research and application purposes. These high-quality free datasets are available online. These datasets can be either text-based, in the form of images or speech data.
We can access some of the public datasets from the sources listed below:
Kaggle
Kaggle allows users to explore and access various datasets in different formats.
The Big Bad NLP Database
This source primarily contains datasets that can be used to perform natural language processing.
Google Dataset Search
This works similar to Google Scholar, where detailed information about over 25 million datasets is available.
Some of the popular datasets used in applications of machine learning, deep learning, and data science are listed below:
MNIST dataset
This is a dataset of handwritten digits containing a sample of 70,000 examples. We can use this dataset to learn image classification and simple pattern recognition.
The dataset can be found
Sentiment140
This dataset contains tweets data. We can use it for sentiment analysis. It is 160,000 records with six features. This dataset can be used for natural language processing.
The dataset can be found
Credit card fraud detection
This dataset contains 284,807 credit card transactions with labels. We can use this dataset to build a model for detecting fraudulent activity.
The dataset can be found
IRIS dataset
This dataset contains information about petal and sepal width in flowers. It includes three classes with 50 entries each. We use this dataset for learning pattern recognition.
The dataset can be found
Free Resources