Coding Environments
Let’s explore coding environments for data science projects.
Options for writing Python code
There’s a variety of options for writing Python code to do data science. The best environment to use likely varies based on what you are building, but notebook environments are becoming more and more common as the place to write Python scripts. The three types of coding environments I’ve worked with for Python are IDEs, text editors, and notebooks.
If you’re used to working with an IDE, tools like PyCharm and Rodeo are useful editors and provide additional tools for debugging versus other options. It’s also possible to write code in text editors such as Sublime and then run scripts via the command line. I find this works well for building web applications with Flask and Dash, where you need to have a long-running script that persists beyond the scope of running a cell in a notebook. I now perform the majority of my data science work in notebook environments, because they cover exploratory analysis and productizing models.
I like to work in coding environments that make it trivial to share code and collaborate on projects. Databricks and Google Colab are two coding environments that provide truly collaborative notebooks, where multiple data scientists can simultaneously work on a script. When using Jupyter notebooks, this level of real-time collaboration is not currently supported. Still, it’s good practice to share notebooks in version control systems such as GitHub for sharing work.
In this course, we’ll only use the text editor and notebook environments for coding. To learn how to build scalable pipelines, I recommend working on a remote machine, such as EC2, becoming more familiar with cloud environments, and building experience setting up Python environments outside of your local machine.
Get hands-on with 1200+ tech skills courses.