Conclusion
Learn about interactive R tools and other areas to delve into next.
We'll cover the following
Congratulations on completing the “R for Data Scientists” course! Throughout the course, we have acquired a strong foundation in R programming for data science, honing our skills in data manipulation, visualization, and modeling. With this solid knowledge base, you are now well-equipped to embark on further explorations into the exciting world of data science and its various domains.
Where to go next?
However, as with any field, there’s always more to learn. In this lesson, we’ll introduce some data science topics that can be explored further as a launching pad for future learning.
Interactive tools from R
One of the most exciting aspects of data science is the ability to create interactive data visualizations and dashboards that allow users to explore the data on their terms. R offers a variety of interactive tools to enhance data exploration and visualization, including shiny
and plotly
.
The shiny
package enables the creation of interactive web applications without needing to know HTML, CSS, or JavaScript. With shiny
, we can create dashboards, interactive data visualizations, and other web applications that allow users to interact with data. Non-technical stakeholders can then explore data in an environment that’s comfortable for them. Additionally, shiny
applications can be as simple or complex as we like and can be aesthetically customized to meet the requirements of any organization. The shiny
package is worth considering if our projects involve any dashboarding, especially if the project is completed primarily in R.
Similarly, plotly
is an interactive plotting library for R that allows the creation of highly customizable and interactive visualizations. With plotly
, we can create plots that respond to user input, such as zooming and panning, and embed these plots in web applications, such as those built with shiny
or other reports. This allows us to create engaging visualizations that help communicate findings more effectively. The difference between shiny
and plotly
is that shiny
provides for building an entire web application, while plotly
is specifically for plotting. As such, it’s often beneficial to combine the two: shiny
to construct an overall web application, and plotly
providing interactive plots within the application.
Bayesian models
Another exciting topic in data science is Bayesian modeling. Bayesian modeling is a statistical modeling technique that allows the incorporation of prior knowledge into the modeling process. In practice, this often translates to things like, “Based on a previous experiment, we strongly believe the coefficient for this parameter to be between zero and one.” In Bayesian modeling, we can start with this prior distribution for the parameter and then update the distribution based on the current dataset available. This can be a significant advantage because we often have new experiments that overlap with previous ones, allowing for more robust findings.
R has several packages for Bayesian modeling, including rstan
, rstanarm
, and arms
. These packages provide robust tools for building and analyzing Bayesian models. They allow us to create models that consider previous knowledge and estimate the probability of different outcomes in ways that more traditional frequentist statistics cannot.
High-performance computing
High-performance computing (HPC) techniques may be needed to work with large data sets or for computationally intensive tasks. R has various tools for parallel and distributed computing that allow us to take advantage of multiple CPUs or machines to speed up analyses.
Some of the tools for parallel computing include the parallel
package, which provides support for parallel computing on a single machine, and the doParallel
package, which provides a simple way to parallelize R code across multiple cores. For distributed computing, R provides interfaces to popular distributed computing systems like Apache Spark, allowing the scaling up of analyses to handle even larger datasets.
Conclusions
As the R community grows, new packages and enhancements are continually added to CRAN. It’s worth staying on top of additions to the CRAN database because a new package that directly addresses the issues we’re currently tackling may have been released.
To sum up, this course has provided a solid foundation in R programming, data manipulation, visualization, and modeling. We learned about the use of R to explore and prepare data, create visualizations to understand trends and relationships, and build models to make predictions. Some exciting topics like interactive tools, Bayesian modeling, and high-performance computing were also introduced. Whether it’s high-level summary statistics or machine learning model implementation, we now have the tools to do what we need to do!
We hope you enjoyed this course! If you have any questions, comments, or concerns, please feel free to email us. We look forward to hearing from you!