Creating the Workspace - Jupyter Notebooks

We'll cover the following...

- - First Things First: Why Python?
- - Jupyter Notebook: What It Is and Why Data Scientists Love It
- - How to Install Jupyter Notebook
    - 1. Running Jupyter Notebooks With the Anaconda Python Distribution
    - 2. Getting Anaconda
- - Creating Your First Jupyter Notebook
- - Jupyter Notebook
- - Bonus Tip

First Things First: Why Python?

Of all the languages out there, why is Python the most popular choice in the machine learning and data science world? We like things that are simple and intuitive. Python is just that, simple and intuitive. It’s readable, it’s low in complexity, and it’s easy to learn.

As a data scientist or machine learning engineer you can use a quick implementation in Python to validate your ideas about some complex mathy concepts in a fast and hassle-free manner. It’s easily understandable for you and others.

As a Data Scientist, your life revolves around data. Outside the playground, you stumble upon reality. Data in real life is oftentimes raw, unstructured, incomplete, and large. Python comes with the promise of knowing how to handle these issues. But how does it do that? What is so special about Python?

All hail the mighty packages! What’s so special about Python are the great open source code repositories that are continuously being updated. These open source contributions give Python its superpowers and an edge over other languages. The best thing about using these packages is that they have a minimal learning curve. Once you have a basic understanding of Python, you can very easily import, use, and benefit from all of the packages out there without having to understand everything is going on under the hood. Last but not the least, these packages are completely free to use as well!

Since we have our data scientist hat on, let’s talk about data. If you still have doubts, I’ll let this survey from IBM convince you why we should learn data science in Python and not in R or any other language.

Jupyter Notebook: What It Is and Why Data Scientists Love It

This story begins with IPython. IPython is an interactive command-line terminal for Python. Command-line terminals are not everyone’s cup of tea, so in 2011 IPython introduced a new tool named the Notebook — a modern and powerful web interface to Python. In 2015 the Notebook project was re-branded as the Jupyter Notebook.

The Jupyter Notebook is an incredibly powerful and sleek tool for developing and presenting data science projects. It can integrate code and its output into a single document, combining visualizations, narrative text, mathematical equations, and other rich media. It’s simply awesome.

As if all these features weren’t sufficient enough, Jupyter Notebook can handle many other languages, like R, as well. Its intuitive workflows, ease of use, and zero-cost have made it THE tool at the heart of any data science project.

Essentially, Jupyter is a great interface to the Python language and a must-have for all Data Science projects.

How to Install Jupyter Notebook

1. Running Jupyter Notebooks With the Anaconda Python Distribution

The easiest and most straight forward way to get started with Jupyter Notebooks is by installing Anaconda. Anaconda is the most widely used Python distribution for data science. It comes pre-loaded with the most popular libraries and tools (e.g., NumPy, Pandas, and Matplotlib). What this means is that immediately get to real work, skipping the pain of managing tons of installations, dependencies, and OS-specific installation issues.

2. Getting Anaconda

Follow these simple steps:

Download the latest version of Anaconda for Python 3 (ignore Python 2.7).
Install Anaconda by following the instructions on the download page and/or in the executable.

Note: There are other ways of running the Jupyter Notebook as well, e.g., via pip and Docker, but we are going to keep it sweet and simple. We will stick with the most hassle-free approach.

Press + to interact

As a “Hello-World-step” you can rename your notebook to “Hello Data Science” by clicking Untitled [1] and typing the new name, as shown in the snapshot below.

A notebook contains a list of cells. Each cell can contain executable code or formatted text (Markdown). Right now, the notebook contains only one empty code cell. Try typing print(“Hello Data Science!”) in the cell[2], then click on the run button [3] (or press Shift-Enter). Hitting the run button sends the current cell to this notebook’s Python kernel, which runs it and returns the output. The result is displayed below the code cell:

Jupyter Notebook

You can try this here:

📝 How to Use a Jupyter NoteBook?

Click on “Click to Launch” 🚀 button to work and see the code running live in the notebook.

You can click to open the Jupyter Notebook in a new tab.

Go to File and click Download as and then choose the format of the file to download 📥. You can choose Notebook(.ipynb) to download the file and work locally or on your personal Jupyter Notebook.

⚠️ The notebook session expires after 15 minutes of inactivity. It will reset after 15 consecutive minutes.

Python Fundamentals for Data Science

The Fundamentals of Statistics

Machine Learning 101

End-to-End Machine Learning Project

The Real Talk