Gain insights into data science fundamentals, explore machine learning and big data, and delve into real-time projects. Discover systematic approaches to data acquisition, wrangling, and solving diverse problems.

new.tar.gz

data_science_notebooks

docker_env

data_science_notebooks-copy

There is a lot of dispersed, and somewhat conflicting information on the internet when it comes to data science, making it tough to know where to start. Don't worry. 

This course will get you familiar with the state of data science and the related fields such as machine learning and big data. You will be going through the fundamental concepts and libraries which are essential to solve any problem in this field. 

You will work on real-time projects from Kaggle while also honing your mathematical skills which will be used extensively in most problems you face. You will also be taken through a systematic approach to learning about data acquisition to data wrangling and everything in between. This is your all-in-one guide to becoming a confident data scientist.

An Introductory Guide to Data Science and Machine Learning


# Train, test and validation Datasets

We divide the dataset at hand into **training** and **test** dataset.
 
* We train the model on the training dataset and evaluate its performance. 

* We evaluate the model's performance on the test dataset (on which model is not trained) and report the performance of the model.


* Scikit Learn provides `train_test_split`, which gives us the training and test dataset. These code snippets have been taken from the Scikit Learn documentation itself. 

  


Cross Validation is a technique for making robust models. You'll discover how it works in this lesson.

What is Data Science ?

Applications of Data Science

Overview of Libraries

Probability and Statistics

Machine Learning Part-1

Machine Learning Part-2

Machine Learning Part-3

Deep Learning

Machine Learning Tools and Libraries

Big Data Tools and Technologies

Where to go next ?

Cross Validation

Train, test and validation Datasets