Explore R for data science, from basics to machine learning. Learn data manipulation, visualization, version control, and workflow optimization for real-world challenges.

R for Data Scientists - From Basics to Machine Learning (1).png

tarball.tar.gz

Tidyverse

Embark on an exciting journey with R as your trusted ally for data science. This comprehensive course will equip you with essential skills to leverage R's power for data manipulation and analysis. The course is suitable even if you have limited R experience, empowering you to perform data science tasks effectively.

The journey starts with R fundamentals to lay a strong foundation. You’ll master the tidyverse to create powerful and readable code. Then, you will explore the import of various data sources, visualization, and best practices before gaining hands-on machine learning experience. Also, you’ll learn version control with Git and GitHub and to optimize your R code for efficient data science workflows.

At course completion, you'll emerge as a confident R data scientist, ready to tackle real-world challenges. You'll be well-equipped to advance your career in data science with R's extensive capabilities in your toolbelt.

Data Science in R: From Basics to Machine Learning

Often in a data science context, we'll read some data and then need to add new data columns based on the existing data. For instance, we're creating an additional column to classify fish as healthy, underweight, or overweight based on a formula that uses weight and length data already provided in the input data. Or if we’re working with grade data for students, maybe we need to add a column for maximum and minimum grades by student.

However, most tidyverse functions, like `cor` or `mean`, are intended to aggregate across rows of data; they're column-wise aggregations. And that idea is consistent with the fact that we’re working with tidy data, so most of our aggregations will be column-wise, across our rows (observations) and not our columns (variables).  

When working with tidy data, row-wise aggregations are most common in the data cleaning stages rather than the actual analytics. The functions of tidyverse are designed to work best when rows represent observations and columns represent variables. So, we'll typically perform row-wise aggregations to create columns (a.k.a. variables) representing latent measurements—things we indirectly observed based on directly measuring other variables. 


When calculating additional columns this way, the tidyverse has two essential functions that provide quite elegant solutions: `mutate` and `rowwise`. These two functions allow us to add columns (`mutate`) to the existing `tibble` and to do so using aggregations of the values in the same row (`rowwise`).

One thing to be aware of is that base-R provides a function called `apply` that allows us to achieve the same row-wise operation direction. If we search forums, we often see solutions referencing `apply`, primarily because that function doesn’t require the tidyverse. However, `apply` tends to make the code harder to read, so using `rowwise` is preferable.

So, what do `mutate` and `rowwise` look like in practice? Let's take a look at `GradeData.csv` in the code example below, where we have a column for `StudentID` and several course columns which contain student grades by course, e.g., `MATH101`. 

Learn to perform complex row-by-row operations inside a tibble using rowwise and mutate.

Why R?

R Fundamentals

R Fundamentals Exercises

Readable Coding with tidyverse

Tidyverse Exercises

Importing More Data Sources

Data Visualization with ggplot2

Best Practices for Data Scientists

Statistical Analysis and Machine Learning with tidymodels

Exploring tidymodels through Exercises

Useful Libraries for Data Science

Git Integration

Getting The Most Out of R

Appendix

Credit Card Fraud Detection using the R Language

Applying Complex Row-by-Row Operations