Docker Setup
Docker on the Educative Platform
We use Docker to allow authors to create their own custom environments to accommodate any language that is not supported by our platform.
Authors can create custom docker images that deploy on the Google Cloud and allow end-users to directly have access to these customized environments through our platform.
The Jargon
Base Image: An OS userspace.
Image: A read-only template with instructions for creating a Docker container.
Dockerfile: A simple file with no extension, named Dockerfile is used to builds a Docker image, and specifies everything that will go in the application’s environment setup.
Container: An image when run, builds a container. This contains your application environment, basically where your application will be run.
Tarball
-tarball--Dockerfile--helloworld.ipynb
The above shows the directory structure of our tarball.
The command to create this tarball is as follows:
tar -czvf pyspark.tar.gz Dockerfile helloworld.ipynb
As you can see, the Dockerfile must be nested directly inside the tarball (not inside any child folder).
This allows Educative to access the Dockerfile directly and create a docker container for our application’s environment.
The other file helloworld.ipynb
has code for testing and is also optional, it is already provided in the lessons that follow.
Points to Note:
- All files can be found for download in the Appendix of this lesson.
- Only ONE tarball with ONE Dockerfile can be uploaded and used at a time. Uploading a new tarball will replace any previously uploaded.
Dockerfile
# Base container image, which has the Ubuntu Linux distribution and Nodejs pre-installedFROM ubuntu:20.04# Install the following packagesRUN apt-get update && apt-get install software-properties-common -y &&\add-apt-repository ppa:deadsnakes/ppa && apt-get update &&\apt-get install python3.6 -y &&apt install python3-pip -y &&\pip3 install --upgrade pip && pip3 install jupyter &&\mkdir /usr/local/notebooks &&\apt-get install -y openjdk-8-jdk# Install any Python modules required in the notebookRUN pip3 install pandas && pip3 install numpy && pip3 install seaborn &&\pip3 install matplotlib && pip3 install sklearn &&\pip3 install python-dotenv &&\pip3 install pyspark && \pip3 install tqdm && \pip3 install pathlib && \pip3 install matplotlib && \pip3 install utils# Add configuration fileADD config.py /root/.jupyter/jupyter_notebook_config.py# Add ipynb filesADD helloworld.ipynb /usr/local/notebooks/
Line 2:
FROM ubuntu:20.04
:
The FROM command sets the Base Image for the rest of the instructions. This command must be on top of the Dockerfile.
In this example, we are starting with the base image of ubuntu:20.04
. This will allow us to install Python and NodeJS side by side. Make sure you use ubuntu:20.04
version in your Dockerfile if you are making a LiveVM
Lines 5-10:
RUN apt-get update && apt-get install software-properties-common -y &&\
add-apt-repository ppa:deadsnakes/ppa && apt-get update &&\
apt-get install python3.6 -y &&apt install python3-pip -y &&\
pip3 install --upgrade pip && pip3 install jupyter &&\
mkdir /usr/local/notebooks &&\
apt-get install -y openjdk-8-jdk
The RUN command is used to execute instructions against the image. It is used to run a command during the build process of the docker image.
In these commands, we are installing the required packages to run PySpark
Lines 14-21:
RUN pip3 install pandas && pip3 install numpy && pip3 install seaborn &&\
pip3 install matplotlib && pip3 install sklearn &&\
pip3 install python-dotenv &&\
pip3 install pyspark && \
pip3 install tqdm && \
pip3 install pathlib && \
pip3 install matplotlib && \
pip3 install utils
These lines will install the required python libraries.
Line 24:
ADD config.py /root/.jupyter/jupyter_notebook_config.py
This line will add jupyter configuration file into it’s desired place.
Line 27:
ADD helloworld.ipynb /usr/local/notebooks/
Adding helloworld.ipynb
to the notebooks folder.
Uploading the tarball
Appendix
File to download
The file attached below can be used to set the environment on Educative’s Platform.