Conclusion

Learn about interactive R tools and other areas to delve into next.

Congratulations on completing the “R for Data Scientists” course! Throughout the course, we have acquired a strong foundation in R programming for data science, honing our skills in data manipulation, visualization, and modeling. With this solid knowledge base, you are now well-equipped to embark on further explorations into the exciting world of data science and its various domains.

Where to go next?

However, as with any field, there’s always more to learn. In this lesson, we’ll introduce some data science topics that can be explored further as a launching pad for future learning.

Press + to interact
There's always more to learn in R
There's always more to learn in R

Interactive tools from R

One of the most exciting aspects of data science is the ability to create interactive data visualizations and dashboards that allow users to explore the data on their terms. R offers a variety of interactive tools to enhance data exploration and visualization, including shiny and plotly.

The shiny package enables the creation of interactive web applications without needing to know HTML, CSS, or JavaScript. With shiny, we can create dashboards, interactive data visualizations, and other web applications that allow users to interact with data. Non-technical stakeholders can then explore data in an environment that’s comfortable for them. Additionally, shiny applications can be as simple or complex as we like and can be aesthetically customized to meet the requirements of any organization. The shiny package is worth considering if our projects involve any dashboarding, especially if the project is completed primarily in R.

Press + to interact
The shiny logo
The shiny logo

Similarly, plotly is an interactive plotting library for R that allows the creation of highly customizable and interactive visualizations. With plotly, we can create plots that respond to user input, such as zooming and panning, and embed these plots in web applications, such as those built with shiny or other reports. This allows us to create engaging visualizations that help communicate findings more effectively. The difference between shiny and plotly is that shiny provides for building an entire web application, while plotly is specifically for plotting. As such, it’s often beneficial to combine the two: shiny to construct an overall web application, and plotly providing interactive plots within the application.

Press + to interact
The plotly logo
The plotly logo

Bayesian models

Another exciting topic in data science is Bayesian modeling. Bayesian modeling is a statistical modeling technique that allows the incorporation of prior knowledge into the modeling process. In practice, this often translates to things like, “Based on a previous experiment, we strongly believe the coefficient for this parameter to be between zero and one.” In Bayesian modeling, we can start with this prior distribution for the parameter and then update the distribution based on the current dataset available. This can be a significant advantage because we often have new experiments that overlap with previous ones, allowing for more robust findings.

R has several packages for Bayesian modeling, including rstan, rstanarm, and arms. These packages provide robust tools for building and analyzing Bayesian models. They allow us to create models that consider previous knowledge and estimate the probability of different outcomes in ways that more traditional frequentist statistics cannot.

High-performance computing

High-performance computing (HPC) techniques may be needed to work with large data sets or for computationally intensive tasks. R has various tools for parallel and distributed computing that allow us to take advantage of multiple CPUs or machines to speed up analyses.

Some of the tools for parallel computing include the parallel package, which provides support for parallel computing on a single machine, and the doParallel package, which provides a simple way to parallelize R code across multiple cores. For distributed computing, R provides interfaces to popular distributed computing systems like Apache Spark, allowing the scaling up of analyses to handle even larger datasets.

Conclusions

As the R community grows, new packages and enhancements are continually added to CRAN. It’s worth staying on top of additions to the CRAN database because a new package that directly addresses the issues we’re currently tackling may have been released.

To sum up, this course has provided a solid foundation in R programming, data manipulation, visualization, and modeling. We learned about the use of R to explore and prepare data, create visualizations to understand trends and relationships, and build models to make predictions. Some exciting topics like interactive tools, Bayesian modeling, and high-performance computing were also introduced. Whether it’s high-level summary statistics or machine learning model implementation, we now have the tools to do what we need to do!

We hope you enjoyed this course! If you have any questions, comments, or concerns, please feel free to email us. We look forward to hearing from you!