Course Expectations

Get a brief introduction to what you’ll learn in this course.

Why this course?

Welcome to this intermediate-level course on machine learning using R for aspiring data scientists. This course covers all the fundamentals needed to craft valuable machine learning models that tackle real-world problems.

This course uses the R programming language because R is uniquely positioned as the go-to tool for data scientists in government, healthcare, nonprofits, and business:

  • R was built from the ground up as a language for analyzing data.

  • Any data analysis technique commonly used by data scientists is available in R.

  • R’s collection of libraries called tidyverse allows data scientists to do more with less code.

  • The data visualization capabilities of R are second to none.

  • R’s new tidymodels framework, which is a collection of packages, allows data scientists to quickly craft valuable machine learning models while simultaneously adhering to best practices.

By the end of this course, you’ll have valuable skills to apply to machine learning with R to produce new insights that generate results for your organization(s).

What to expect?

To correctly set expectations as learners of this course, keep the following essentials in mind:

  • Machine learning is a vast topic; this course might not cover everything.

  • This course introduces three widely used machine learning techniques: decision trees, random forests, and XGBoost.

  • The course covers the mathematics and underlying logic of these three techniques.

  • The course uses tabular “small data” as this is the most common data used in practice by data scientists.

  • Advanced scenarios (e.g., weighted data and severe class imbalance) are out of the scope of this course.

  • Production deployment of machine learning models is also out of the scope of this course.

This course teaches the techniques commonly used by professional data scientists worldwide.

Prerequisites of the course

  • Experience in R programming—especially with the tidyverse (e.g., dplyr and ggplot2).

  • A desire to take data science skills to the next level with machine learning.

  • Comfort with mathematics.

The math skills/knowledge required for most of the course is relatively low-level and easily accessible to a broad audience. The last part of the course covers a particular machine learning technique (XGBoost) where the math requirements are higher.

For aspiring data scientists, knowledge of the math of the machine learning techniques taught in this course is a common expectation of hiring managers.

Learning outcomes

Upon successful completion of this course, learners will have achieved the following learning outcomes:

  • Develop an understanding of supervised learning and how to apply machine learning to problems.

  • Understand the differences between classification and regression problems.

  • Learn the CART decision trees.

  • Explore data visually to determine the usefulness of machine learning.

  • Learn the random forest and XGBoost machine learning techniques.

  • Understand the importance of the bias-variance tradeoff.

  • Learn to tune machine learning models to combat overfitting.

  • Engineer features to produce the most valuable machine learning models.

  • Craft valuable machine learning models in R using tidyverse and tidymodels.

Have fun with the course!