**Learn R and get comfortable with data science**

Excited by the endless possibilities offered by the fields of data science and data analysis? Let R set you on your way!

Data scientists, statisticians and analysts use R for statistical analysis, data visualization and predictive modeling. R gives aspiring analysts and data scientists the ability to represent complex sets of data in an impressive way.

Make yourself comfortable in R and get deep into data science using R with this Learning Path.

**About the Authors:**

is a data scientist with a large E-commerce organization. In his 7 years of experience in data science, he has tackled complex real-world data science problems and delivered production-grade solutions for top multinational companies. Selva lives in Bangalore with his wife. He can follow him on Twitter at http://www.twitter.com/r_programming and he periodically writes at http://r-statistics.co.*Selva Prabhakaran*is not new to big data as he has over 15 years of experience in creating big data repositories and solutions for large multinational organizations in Europe. Having become a single father, he has changed his focus and is now working within the academic and research community. Richard has special interest in big data and is currently undertaking research within the field. His research interests revolve around machine learning, data retrieval, and complex systems.*Richard Skeggs*has been working in test automation since 2004. He has been involved with various activities including creating test automation solutions from scratch, leading test automation teams, and working as a consultant with test automation processes. During his working career, he has had experience with different test automation tools such as Mercury WinRunner, MicroFocus SilkTest, SmartBear TestComplete, Selenium-RC, WebDriver, Appium, SoapUI, BDD frameworks, and many other different engines and solutions. He has had experience with multiple programming technologies based on Java, C#, Ruby, and so on, and with different domain areas such as healthcare, mobile, telecoms, social networking, business process modelling, performance and talent management, multimedia, e-commerce, and investment banking.*Mykola Kolisnyk*works as the chief data scientist in the IBM Watson IoT worldwide team, helping clients to apply advanced machine learning at scale on their IoT sensor data. He holds a Master's degree in computer science from the Swiss Federal Institute of Technology, Zurich, with a specialization in information systems, bioinformatics, and applied statistics.*Romeo Kienzler*obtained a Ph.D. in digital soil mapping from Cranfield University and then moved to ETH Zurich, where he has been working for the past three years as a postdoc. In his career, Dr. Veronesi worked at several topics related to environmental research: digital soil mapping, cartography and shaded relief, renewable energy and transmission line siting. During this time Dr. Veronesi specialized in the application of spatial statistical techniques to environmental data.*Fabio Veronesi**Yu-Wei, Chiu (David Chiu)*

- Requires no programming knowledge - we’re covering basics of R too!

- Get to know the basic concepts of R: the data frame and data manipulation
- Get data from numerous sources such as files, databases, and even Twitter
- Understand how easily R can confront probability and statistics problems
- Work with complex data sets and understand how to process data sets
- Evaluate k-Means, Connectivity, Distribution, and Density-based clustering
- Create professional data visualizations and interactive reports
- Create a codebook so that the data can be presented in summary

This video provides an overview of the entire course.

The aim of this video is to show how to install R on our system.

- Visit the CRAN website
- Choose the download option based on your operating system
- Follow the standard procedure for installing R

To run and write code in R, we first need to focus on how to get and install the IDE.

- Go to RStudio.com
- Download the correct version of the software
- Install the latest version

We have installed R and RStudio. Now let’s check out how to install the packages.

- Explain what a package contains
- Check out the sources of packages
- Learn how to install them

The aim of this video is to teach you what data types and data structures in R are.

- Explain data types
- Explain data structures
- Show some examples in R

In this video, we will see how to work with vectors in R.

- Explain how to create a vector
- Create different types of vectors
- Access specific items within a vector

The aim of this video is to show how to work with random numbers and do rounding and binning.

- Show how to create random numbers
- How to round
- How to bin numeric vectors

Taking vectors a step ahead, let’s see how we can to handle missing values.

- Explain how to find missing values
- Explain how to omit the missing values

We now know a lot about how vectors work, but how do we get specific items from a vector based on any condition? Let’s check out just that in this video.

- How to write conditions
- How to write complex conditions
- How to use the which() operator to get the required items

This video will introduce a new data structure called list and how to work with it.

- Understand what lists are
- See when lists are used
- Learn how to perform data manipulation with lists

In this video, our goal is to understand how to perform set operations in R.

- Explain the syntax of important set operations such as union, intersect and so on
- Perform the set operations to grasp the usage

What is sampling and sorting and how to do it in R?

- Explain sampling
- Show how to do it in R
- Explain how to sort in R

Checking conditions is often a requirement for a programmer to write maintainable code. Let’s understand how we can check conditions in R.

- Show how to write if and else statements
- Grasp the correct usage of ifelse()

You may have come across several instances whilst coding where you need to perform repetitive operations through loops, right? In this video, we’ll see how to do that in R using for loops.

- Explain the for loop’s syntax
- Show how to skip an iteration
- Check out how to break out from a loop

Let’s explore what data frames are and how to work with them.

- Learn how create a data frame
- Access elements in a data.frame: select, filter, and so on
- Understand the functions related to data frames

In this video, we will check out how to import and export data in R.

- Show the function used to import different forms of data
- Check out the functions used to export various forms of data

The aim of this video is to check out how to work with matrices and frequency tables.

- Learn how to create a matrix
- Access data in a matrix
- Grasp how to generate frequency tables

Our goal in this video is to use W to merge data frames.

- Learn different types of merges
- Understand how to use them in R

In this video, we will look at how to de-aggregate data frames and create cross tabulations.

- Show the melt function
- Show the dcast() function from the reshape2 package

In this video, we will look at how to handle date variables in R.

- Introduce the lubridate package
- Understand the date format
- Learn the Date operations

The goal of this video is to see how to perform string operations in R.

- Introduce the paste function for concatenating
- Introduce the stringr package

Let’s learn how to avoid code replication.

- Introduce functions
- Understand the best practices by writing functions in R

The aim of this video is to understand how to debug and handle errors.

- Learn the 3 ways used to debug in R
- Grasp the 2 ways to handle errors in R

We’ll see in this video how to write fast loops with apply().

- Introduce the syntax
- Explain the difficult part of using apply functions

Sometimes we’d want to iterate through lists. What do we do then? Let’s learn using fast loops with sapply, vapply and lapply to help us achieve this goal.

- Explain sapply
- Explain lapply and how it is different

How to make plots and customize them.

- Show how to use the plot function
- Make a scatterplot
- Explain the arguments and features of the plot function

Sometimes, just a single Y axis is not enough. It becomes difficult to depict the variations for two variables on different scales in the same chart. To solve this, we’ll look at how to make a plot with two Y axes.

- Show the method
- Show the syntax to make two Y axes

In this video, we will learn how to make multiple plots and custom layout to get better at our analyzing skills.

- Show how to make multiple plots
- Show how to customize plot layouts

The aim of this video is to create different types of plots.

- Show how to make a histogram, a bar chart, and a density plot
- Show how to make dot plots and box plots

What are the steps and actions one needs to do as part of data analysis before jumping to predictive modeling? Let’s understand this better.

- Explain the different steps
- Show how to do univariate analysis of numeric variables
- Show how to do univariate analysis of categorical variables

The aim of this video is to teach you what normal distribution, central limit theorem, and confidence intervals are.

- Explain the concept behind normal distribution and CLT
- Show the R code implementation
- Explain confidence intervals and implement them in R

In this video, we will understand correlation and Covariance, the concept behind them, and their implementation in R.

- Explain correlation and covariance
- The concept and the difference between them
- Implement them in R

What is the chi-square statistic, when is it used, and how to do the chi-sq test?

- Explain the chi-sq statistic and its purpose
- Explain the meaning behind it
- Show the R implementation

What is ANOVA, its purpose, when to use it, and how to implement it in R?

- Explain the concept and purpose
- When to use it
- How to implement it in R

What are the other commonly used statistical tests in R and how to implement them?

- Explain the one and two sample t-test, parametric versus non-parametric
- Explain the Wilcoxon signed rank test
- Show the R implementation

All knowledge is incomplete without being put to practice. We’ve got a good taste of the core concepts that govern statistical analysis with R. Let’s solve the challenges pertaining to data manipulation in this video.

- Run through the basics of data handling
- Understand the Unique key
- Solve the proposed challenges

What is data if not represented visually! We have solved challenges related to data manipulation. Now it’s time to tackle visualization in this video.

- Create a histogram on the given data
- Create a line chart with multiple lines
- Create a box plot

Practice solving exercises that involve making statistical inferences.

- Test the statistical significance between two continuous variables
- Problem that tests statistical significance between continuous and categorical variable

The aim of this video is to introduce the magrittr package, its significance, and features such as pipe operators.

- Why use magrittr and pipes
- Explain various pipe operators in magrittr
- Show the R code implementation and suitable examples

Understand and use the 7 data manipulation verbs.

- What are the 7 data manipulation verbs?
- Why are they simple and widely adopted?
- Implementation in R for all the verbs

How to group datasets by one or more variables using dplyr.

- Explain the process of grouping
- Explain the group_by and summarize functions
- Show the implementation in R

How to join two tables using the two table verbs of dplyr.

- Explain the two table verbs
- Explain the different types of joins
- Show the implementation in R

How to work with databases with DplyR.

- Create a SQLite DB and upload data
- Pull partial data and do manipulation
- Download full data and see the SQL

Understand the basics of data.table; do filter and select operations.

- Explain the purpose and significance of data.table
- How the syntax differs generally and how to filter and select
- Show the R code implementation

Understand the syntax; create and update columns in a data.table.

- Understand the general syntax in sync with SQL syntax
- Create new columns
- Update the columns

Learn how to aggregate data.tables. Also learn the .N and .I operators.

- The syntax to group data
- The .N and .I operators – using them effectively
- The R code implementation

Understand and implement chaining, keys, functions, and .SD.

- How to set keys and why
- Write apply family functions within data.table
- Show the usage of .SD

How to write for-loops with set, set keys, and join data.tables?

- Explain how to use set() and why it’s great
- How to set keys and filter data.tables
- Show how to do joins using square brackets and merge function

This will be an overview of entire course.

The aim of this video is to introduce the R programming language.

- Know about the R programming language and its usage
- Review the supported programming paradigms
- Explore the use of R in statistical analysis

The aim of this video is to start with some important basics such as function and variable declaration.

- Discover how to declare a variable and assign a value
- Get to know how to declare a function and use it
- Learn how to run commands over multiple lines

The aim of this video is to introduce the different types of data structures available.

- Introduce the data types that are available within R
- Coerce one data type to another
- Discover the functions available to test and examine data types

The aim of this video is to show that R scripts can be run from outside of the R IDE.

- Run a simple script from the command line
- Pass parameters into the R script from the command line

The aim of this video is to introduce the concept of a data frame. How it can be initialized as well as how data can be added to it.

- Create a data frame and add data
- Access columns, rows and individual items
- Delete data from a data frame and remove all the data

The aim of this video is to introduce the concept of creating a data frame from a CSV file.

- Opening a CSV file for reading data
- The two most commonly used functions for creating a data frame from a CSV file
- The final step is to write data back to the CSV file

The aim of this video is to introduce the concept of ingesting data from a compressed file into R.

- Opening the zip file for reading
- Processing multiple files within the zip
- Write data back to the zip file

The aim of this video is to introduce the concept of ingesting data from a database into an R data frame. Introducing the tools required and the best practices in employing the tools.

- First create a connection to the database from R
- Learn how the data can be used within R
- The final step is to write data back to the database

The aim of this video is to allow the user to start cleaning datasets.

- Learn how to normalize strings and change case
- Learn how to handle outliers
- Learn how to format dates

The aim of this video is to understand the process available to deal with missing values from a dataset.

- Discover how to handle missing fields
- Perform a test for missing values
- Learn how to handle missing values

The aim of this video is to look at the date format and process time.

- Look at the available date formats
- Access the system date
- Learn how to process timestamps

The aim of this video is to introduce the concept of a data frame, how it can be initialized, as well as how data can be added to it.

- Creating a data frame and add data
- Access columns rows and individual items
- Delete data from a data frame and remove all of the data

The aim of this video is to allow the user to understand the importance of a codebook.

- Understand what a codebook is
- Know what goes into a codebook
- Learn the use of variables and metadata

This video looks at creating the codebook from standard r functionality.

- Access and use datasets within the R studio package
- Look at using the memsic codebook function
- Use the summary function

The aim of this video is to create a codebook using standard R functionality.

- Use the class function to determine data types
- Learn how to use sapply

The aim of this video is to allow the user to understand what data mining is and the steps involved.

- Define what data mining is
- Look at the steps needed to mine data

The aim of this video is to begin the process of creating the data story

- Look at identifying outliers
- Introduce the tsoutliers package
- Look at the use of anomaly detection

This video looks at creating a linear regression model.

- The first step is to understand what a regression model is
- Then we look at how to create one
- Using the summary function to display the model

The aim of this video is to look at clustering of data.

- Know what is clustering
- Learn the functions available to cluster data
- Finally work through a simple example

The aim of this video is to introduce the concept of classifying data within R.

- Define what classifying data is
- The functions available within R to classify data
- A simple example of classifying data

The aim of this video is to introduce the concept of data visualization and some of the tools available.

- Introduction to visualization
- Look at graphing data
- Learn how to map data in R

The aim of this video is to allow the user to create a simple visualization within R.

- Get to grips with the use of qplot
- Learn more details of how to use dygraph
- Learn how to over lay data onto a graph

This video looks at some tools to create interactive visualizations.

- Introduce streaming in R
- Revisit the use of loops in R
- Create a dynamic plot using loops

The aim of this video is to publish the graphics created with the visualization tools in R.

- Save visualizations to file
- The functions available to cluster data
- Finally work through a simple example

The aim of this video is to introduce some of the concepts that have not been covered in this course.

- Extract twitter data
- Filter twitter data based on location
- Use parallel processing within R for large datasets

This video gives an overview of the entire course.

The aim of this video is to show how easy it is to use R for data mining. On the other hand, the expectations are set because R is sometimes a bit hard to learn—especially for programmers.

- Know about the history of R
- Learn a compelling and motivating data visualization example
- Learn data mining in the data science world

You have to accept that most of your work will involve data cleansing, which is one of the most important steps in data mining. Fortunately, R has all the tools in place to do this task as elegantly as possible.

- Know the difference between data mining, data science, and cognitive computing
- Learn about cognitive computing
- Explore data cleansing as it is one of the most important aspects in data mining

The aim of this video is to explain the basic concepts of R. This is exemplified by showing how easy it is to load data in R. Get an idea about how this is done in most of the cases as well as for some special cases such as databases and big data technologies.

- Learn the basic concepts of R
- Learn how to load a local CSV file and explore how to load a remote CSV file via HTTP
- Review how to load data from databases and big data (Hadoop HDFS, Spark) and use a DataFrame Proxy (dashDB, BigR)

This video gives an overview of the data frame object, which is an essential part of R and part of every analysis. You will learn what a data frame is and how to use it for data manipulation.

- Understand what a data frame is and how to create it
- Know what kind of data manipulations an R data frame supports
- Get convinced of the power of a data frame by seeing a data normalization example

We want to explain that data is nothing but points in a multidimensional vector space exemplified by an example.

- Learn how to transform a table into a multi-dimensional vector space
- Learn about distance measures and multidimensional vector spaces

Points in a multidimensional vector space can be drawn and analyzed by introducing k-means—the simplest of the clustering algorithms.

- Learn how k-means works
- Know about other distance measures
- Know about other cluster algorithms

Coming from a hard-to-understand dataset, process and visualize it to gain insights.

- Understand the dataset and the problems with it
- Understand how to process the dataset
- Learn how to visually find patterns in data by the Example

The aim of this video is to show how powerful R is as a data language. We will query an internal example dataset and show how it can be filtered and aggregated on.

- Learn about the structure of the internal mtcars dataset
- Filter on the dataset
- Aggregate on the dataset

The aim of this video is to show how powerful R is as a data language. Now we concentrate on data types.

- Learn about the data types present in R and what they are good for

Next, we concentrate on functions and indexing.

- Learn how functions are defined and used in R
- Get the concept of a functional language
- Use indexing to access subsets of data

The aim of this video is to show how object-oriented programming is done in R since some of the algorithms covered rely on it.

- Learn OOP crash course
- Learn the S3 way of OOP in R
- Learn the S4 way of OOP in R

The aim of this video is to show a little example to motivate the attendee based on the standard market basket analysis.

- Load and parse transaction data
- Calculate measures on the data
- Generate and inspect association rules

The aim of this video is to explain the mathematical structure "graph".

- Know what a graph is
- Learn about some statistical measures on graphs
- Explore how the association rules can be transformed into a graph to gain additional insights

The aim of this video is to explain the different types of association rules.

- Know the simple rules such as Boolean and single-dimensional associations
- Learn more complex rules such as multi-dimensional associations
- Learn special cases of rules such as correlation-based or quantitative associations

The aim of this video is to explain the Apriori Algorithm.

- Learn what frequent item sets are (a prerequisite to understanding Apriori)
- Learn how Apriori works internally
- Look into an example of how to do it in R

The aim of this video is to explain the Eclat Algorithm.

- Transform the transaction matrix
- Derive rules from the transformed matrix
- Understand how to do it in R with an example

The aim of this video is to explain the FP-Growth Algorithm.

- Transform the transaction matrix
- Build a tree and extract rules
- An overview of the pros and cons of all three algorithms

This video introduces the discipline of classification, the mathematical foundation for understanding Bayes' theorem and the Naïve Bayes classifier.

- Get introduced to classification
- Get insights into the mathematical foundations for understanding Bayes' theorem
- Learn Bayes' theorem

Now since we've understood Bayes' theorem, we can derive the Bayes classifier and use naïve Bayes for spam classification in R.

- Derive the Bayes classifier
- Understand Naïve Bayes and how to apply it to spam classification
- See an example in R

This is a practical example of using naïve Bayes for spam classification in R

- Prepare the data
- Run the algorithm
- Assess the classification performance

Introduction to support vector machines, understanding how to use them to separate points in multidimensional vector spaces, and finally using kernels in non-linearly separable data

- Get introduced to support vector machines
- Understand how to use them to separate points in multidimensional vector spaces
- Use kernels in non-linearly separable data

Introduction to lazy learning using k-nearest neighbors. This video explains how KNNs work and how they are applied in R.

- Get introduced to lazy learning using k-nearest neighbors
- Explain how KNNs work
- Learn how KNNs are applied in R

This video introduces the discipline of hierarchical clustering.

- Recap of classification from the previous section
- Description of the algorithm
- Description of the result based on a cluster hierarchy

This video introduces the discipline of distribution based clustering.

- Introduction to statistical distributions
- Expectation maximization, the algorithm's objective
- Description of the algorithm

This video introduces the discipline of density based clustering.

- Introduction to the basic idea
- The concept of reachability
- Description of the algorithm

A practical example of using DBSCAN in R.

- Loading and preparing data
- 3D plotting to get a first notion
- Running the algorithm and measure accuracy

This video introduces neural networks.

- Learn about the Perceptron
- NN training and non-linearity
- Delve into deep learning

This video shows an example in R—how to use the H2D deep learning framework for handwritten digit recognition (classification).

- Loading and visualize the data
- NN configuration and training
- Performance assessment

This video shows an example in R—how to use the H2D deep learning framework for anomaly detection of real-time Iot sensor data.

- Deploying the test data generator to the cloud, and introduction to the Lorenz attractor model
- Visualizing the real-time data
- Using deep learning auto-encoder to detect anomalies

This video provides an overview of the entire course.

Creating professional looking plots, both static and interactive, may seem hard; however, with R we can create and fully customize plots with a few lines of code.

- Use ggplot2 that can produce beautiful plots with a few lines of code
- Plots are fully customizable; to obtain perfect plots every time, use customizable plots.
- Create complex results using interactive plots

Often, beginners fail to properly understand their dataset before analyzing it. However, a good understanding of the origin and structure of the data is of primary importance.

- Introduce the data provider, that is, EPA.
- Understand the EPA network of stations
- A detailed description of the data structure

It is not always good to import data in R using the default settings. For doing it successfully, several parameters need to be set.

- Set the working directory
- Understand the important setting of the function read.table
- Import the data and check the structure

Importing Excel tables in R may be tricky. However, with the right explanation the proper package can be installed and everything should work out fine.

- Install the package xlsx
- Understand the format of the code to import Excel files
- Import and check the data

Exporting data in R may seem difficult, since we have many options to choose from. However, R has powerful exporting functions that with few options can do the job successfully.

- Firstly, we need to subset our data to have something to export
- Then, we can learn how to export data in R
- The final step would be exporting data into multiple Excel sheets

Producing elegant plots in ggplot2 may seem difficult but it is actually quite easy to do. In fact, ggplot2 takes care, by default, of most of the graphical design of the plot, meaning that we can produce beautiful histograms with just a few lines of code.

- Load ggplot2 and then import the dataset
- Plot a simple histogram, using the default settings.
- Plot multiple distributions with faceting

Histograms are useful for certain tasks, but for comparing several variables at once they are not the best. Box plots can be used instead, since they allow the comparison of the distribution of multiple variables side by side.

- Explain what a box plot is and what does it represents
- Create multiple box plots with just two lines of code
- Order the plot to achieve better results

Categorical variables are invariably difficult to visualize in meaningful ways. Bar charts are important for plotting categorical variables and defining their characteristics.

- Learn bar charts
- Create simple bar charts in ggplot2
- Learn how to automatically order a data.frame and plot ordered bar charts

In many cases, we are interested in comparing multiple variables at once and checking their correlation. Scatterplots allow us to do just that and are an important tool in a data analyst's toolbox.

- Describe the importance of scatterplots
- Create simple scatterplots in ggplot2
- Create more complex visualization by tweaking some basic options

In many cases, the variable time is underestimated. However, time-series are extremely useful to determine the temporal pattern of a variable.

- Understand the structure of time-series plots
- Plot a simple time-series plot in ggplot2
- Customize the plots with color and size

Many datasets are affected by uncertainty and people not always know how to show this in plots. This video will present ways to solve this and take uncertainty into account.

- Understand how to handle uncertainty
- Present simple ways to include uncertainty in bar-charts
- Present the scatterplots with double error bars

By default, ggplot2 creates plots with a grayish background, and without axes lines and white gridlines. This is not the standard look you normally find in scientific manuscripts.

- Explain the graphical elements of the standard theme
- Change the default theme
- Explore the differences between the default theme and the others

The default color scale is not always appropriate to spot all the differences in the data we are trying to plot. In many cases, we have to change it so that our plots can become more informative.

- Change the default two colors for plotting continuous variables
- Explore ways to include more colors in the color scale
- Present discrete color scale for categorical variables

ggplot2 uses the names of the columns as labels, meaning that if these are not self-explanatory, the plot will not provide a good framework to understand its meaning. By adding some lines of code, we can customize the plot in order to change the labels and make it clearer.

- Add a title for the plot
- Change the title of the legend
- Change the axes labels

The default plots created by ggplot2 lack several elements that in many cases are useful to provide additional information to viewers. However, there are simple functions that can be used to add supplementary elements to the plot.

- Add the trend lines to scatterplots
- Learn how to add vertical and horizontal lines to plots
- Customize the lines

In many cases, it is crucial to be able to include textual labels on plots to provide viewers with additional information. This can be done in ggplot2 in both static and dynamic ways.

- Add fixed text labels
- Add dynamic textual labels
- Add text outside the plot and change the axis labels

With the function facet_wrap, it is only possible to create a grid of plots of the same type. However, in some cases, it is necessary to create side-by-side graphs with diverse plots. This can be done in the package gridExtra.

- Review the facet_wrap function
- Install the gridExtra package
- Create the multi plots

We could easily save our plots as images directly from R Studio. This way of saving however, does not provide much flexibility. If we want to customize our images, we need to learn how to export plots from the R code.

- Create an object with the plot we want to save
- Learn the basics of the ggsave function
- Change the size of the image

The default size that ggplot2 uses to save plots is ideal for most of our needs, such as embedding plots in Word documents. However, in some cases, we may need to specify a particular page size for our plots, which can be easily done with the option paper.

- Specify the page size
- Rotate the page
- Specify other options

Static plots are the standard for publishing in traditional media, such as journal papers. However, the world is moving towards an internet-based presentation of results and even scientific journals are quickly adapting it. Many now offer the possibility of including interactive plots. In R, we can create plots for the Web with the rCharts package, which is a bit more difficult to install than ggplot2.

- Explain the rCharts package
- Install devtools
- Install rCharts from GitHub

rCharts features a syntax more similar to standard plotting in R than what we saw with ggplot2. However, it is easy to pick up by showing simple examples and then including additional details.

- Explain the syntax of rCharts
- Include more details
- Add JavaScript functions for more flexibility

Even though we know nothing about HTML and CSS, we can still obtain beautiful bar-charts using templates created by other users.

- Plot basic interactive bar charts
- Add axis labels
- Use a template for an elegant finish

If too many data points are present in our dataset, scatterplot visualization may become very confusing in static plots. However, in interactive plots this limitation no longer applies, since we can select to visualize only some datasets.

- Create basic interactive scatterplots
- Understand the interactivity
- Add elements and controls

Time-series plots are a great way to visualize the temporal pattern of a variable. However, sometimes we cannot fully understand the exact date of each point based only on the values on the x axis. Interactive visualization can solve this problem by adding tooltips in which we can take a look at the raw data.

- Set the data in the correct format
- Plot a basic time-series plot
- Add elements

Shiny is a package to build fully featured websites from scratch in R. The way it communicates between the user interface and the server may seem difficult to understand. However, with some explanation, understanding Shiny becomes very easy and intuitive.

- Introduce the Shiny package
- Explain the tutorial and examples
- Understand the basic structure of a Shiny website

Understanding the structure of a Shiny website is very important. However, presenting it from a website is not enough for the viewers to replicate it. Therefore, in this video, we are going to create a simple website with data and plots we already used, to further help viewers.

- Understand the basic structure of Shiny
- Add elements to UI and Server
- Test the website

If we plan to upload our Shiny website on-line, we need to implement a way for users to upload their own data. In this video, we are going to show how to do just that.

- Importing files in Shiny
- Simple code to do it
- Add a separator for more flexibility

One of the key components of a successful website is the ability to respond to users’ interactions. This can be achieved with conditional panels, which change the UI based on users’ interactions.

- Explain conditional panels
- Understand UI modifications
- Apply server modifications

One of the key components of a successful website is the ability to respond to users’ interactions. This can be achieved with conditional panels, which change the UI based on users’ interactions.

- Modify server side modifications
- Keep track of the IDs
- Recognize variables automatically

So far, we have looked at ways to create and add elements to a Shiny website. However, sooner or later, this website needs to be deployed on the Internet so that everybody can use it. Here, you will learn how to do it using a free account on shinyapps.io.

- Separate ui.r and server.r
- Add plots to the script
- Finally, we deploy the site.

R has got a lot of functions and a user can also define a function for a specific purpose. Once user creates functions, it becomes really important to learn about passing arguments. Let’s explore how to create an R function and pass arguments to it.

- Create a function to add two numbers
- Use a default argument and pass arguments into the function
- Use a named argument with an if-else condition

R stores and manages variables using the environment. Each function activates its environment whenever a new function is created. Let’s see how the environment of each function works.

- Determine the current environment and compare it with an identical function
- Create an environment and print it within a function
- Compare the environment inside and outside a function

Lexical scoping determines how a value binds to a free variable in a function. This is a key feature that originated from the scheme functional programming language This video will show us how lexical scoping works in R.

- Create a function with x+3 as the return for a variable x
- Create a nested function
- Create an x string and modify x within a function
- Create a
*globalassign*function but reassign x to 5

In previous videos, we illustrated how to create a named function. But dealing with functions without a name, that is, closure, can be a bit tricky. Let’s see how to use it in a standard function.

- Sum up two variables with a closure
- Invoke a closure function within a function
- Use vectorization calculations and apply the function to a vector

R functions evaluate arguments lazily; the arguments are evaluated as they are needed. Thus, it reduces the time needed for computation. Let’s take a look at how lazy evaluation works.

- Create a
*lazyfunc*function - Specify the default value to an argument
- Use lazy evaluation to perform Fibonacci computation

Normally, we operate on variables a and b by creating a function func (a,b). Although it is standard function syntax, it’s hard to read. We need to simplify the function syntax. Let’s see how we can do that using infix.

- Transform infix to a prefix operation
- Find the intersection between vectors
- Extract the set difference between vectors
- Overwrite an existing operator with the infix operator

In R, there might be instances where we may have to assign a value to a function call. It becomes really important to learn about the replacement function, as it does the same. Let’s explore how it works and how we can use it.

- Assign names to data with the
*names*function - Create a replacement function
- Remove multiple values, values of certain positions, using the
*erase*function

There are various errors we may encounter during development in R, as in any other programming language. We need to learn how to handle those errors. Not only will it help in rectification but also it will make the program more robust.

- Print an error message using the
*stop*function - Replace the
*stop*function with a*warning*function and suppress warnings - Use the
*try*and*tryCatch*functions

As it is inevitable for all code to include bugs, an R programmer has to be well prepared for them with a good debugging toolset. Let’s explore how to debug a function using various functions.

- Create a function called
*debugfunc*and apply the debug function to it - Debug using the
*browser*function and recover the debug process - Use the
*trace*function and track the usage of certain functions

The primary step for any data analysis to collect high-quality, meaningful data. One important data source is open data, which is published online in either text format or as APIs. Let’s see how we can download the text format of an open data file.

- Download Yahoo! finance data
- Use the get directory for R studio and list files
- Download Wi-Fi hotspot location data and install the R Curl package

Now that we’ve learned how to download open data files, it becomes crucial to know how to read and write them for further processing. Let’s see how we can read a file with R.

- Determine the current directory
- Read using the
*read.table*function and filter data - Use
*read.csv*

The functions we’ve learned, read.table and read.csv, are useful only when the data size is small. We need know how to read large files for flexible data processing. Let’s explore how we can do that using the scan function.

- Use the scan function to read data
- Examine data with mode and str
- Use the read.fwf function

Excel is widely used for storing and analyzing data. One can convert Excel files to other formats. But it’s a bit complex process. This video shows how to read and write an Excel file containing world development indicators with the xlsx package.

- Install and load the xlsx package
- Download an Excel file
- Examine the data and dimension of the file

As R reads data in memory, it is perfect for processing and analyzing small datasets. However, database documents are becoming more common for the purpose of storing and analyzing bigger data. In this video, we will demonstrate how to use RJDBC to connect data stored in the database.

- Install RJDBC and download JDBC driver for MySQL
- Connect to MySQL using a registered MySQL driver
- Retrieve the table list from the connection and obtain data

In most cases, the majority of data will not exist in the database, but will instead be published in different forms on the Internet. To dig up more valuable information from these data sources, we need to know how to access and scrape data from the Web.

- Browse the S&P 500 index and install rvest
- Use the HTML function to scrape and parse the S&P 500 index
- Use cell_label and cell_value and set the extracted label to value

Data analysis requires preprocessing of data. There are various steps which need to be performed for preparing data ready for analysis. The primary step is renaming data variables so that one can operate efficiently. Let’s see how we can use the names function to rename variables.

- Download employees.csv and salaries.csv from GitHub
- Use the names function to examine column names
- Rename columns and rows with colnames(), rownames() and dimnames()

There are many instances where one does not specify the data type while importing. This leads to a difficulty in data manipulation as assigned data type is different than actual one. Let’s explore how we can simplify this by converting data type.

- Use the class function to examine data type
- Convert date variables into date format and names to character type
- Use str to examine dataset and convert data type within a file

Some attributes in employees and salaries are in date format. So, we have to calculate the number of years between the employees' date of birth and current year to estimate their age. This might be a tedious task. Let’s see how we can do it by manipulating date data.

- Obtain the difference between hire_date and birth_date in days and weeks
- Use the lubridate package to manipulate dates
- Convert the date to POSIX format and calculate the ages of employees

Similar to database operations, we can add a new record to the data frame by the schema of the dataset. But in R, we can also perform these operations much more easily. In this video, we’ll see how to use the rbind and cbind functions to add a new record or attribute.

- Use rbind to insert a new record into employees
- Reassign combined results to employees
- Add new position and age attributes

Some analyses require partial data of particular interest. For that purpose, data filtering is required. In database operations, SQL command is used with the where clause to subset data. But, we need to know how it is done in R. Let’s see how we can do that.

- Subset specific rows and columns of dataset
- Exclude columns and certain attributes and use the comparison operator
- Use the substr function to extract partial records

There might be some unwanted records in the dataset even after filtering. This can generate inaccurate results. Now that we’ve learned how to filter the dataset, let’s see how we remove or drop bad data.

- Exclude last_name from the filtered subset
- Drop rows by assigning a negative index
- Use the within function to remove unwanted attributes

Similar to data tables in a database, we sometimes need to combine two datasets for correlating data. In R, we can do that using merge and plyr. Also, in order to analyze data more efficiently, R provides two methods, sort and order, which we must learn to sort data.

- Merge two datasets using common key and the plyr package
- Use the sort and order functions
- Sort data by column and use the arrange function in plyr

There are instances where data analysis is possible only when the data is in a specific format. We must know how to reshape data and remove data with missing values for efficient data processing.

- Use the dcast function and transform the data to wide
- Use the melt function to transform the data back to long from wide
- Use na.omit to remove missing value data

Missing data may occur from data process flaws or simply typos. But this small mistake can affect the whole analysis as the results may be misleading. Thus, it becomes really important to learn how to detect missing values in R.

- Set the to_date attribute and change it to a missing value
- Use sum to count missing values, their ratio and percentage
- Install Amelia and plot missing values map using the missmap function

We’ve learned how to detect missing data. But, there might be some instances where analysis may go wrong due to those missing values. This video will introduce some techniques to impute missing values for efficient data processing.

- Subset data and use na.omit to remove records with missing values
- Calculate the mean salary and impute it with missing value
- Use the mice package to impute data

When you process a dataset that is a gigabyte or larger in size, you may find that data.frame is rather inefficient. To address this issue, you can use the enhanced extension of data.frame—data.table. In this video, we will see how to create a data.table in R.

- Download purchase_view.tab and purchase_order.tab from GitHub
- Create a data.table
- Use the readr package

Two major advantages of a data.table as compared to a data.frame are the speed and clearer syntax of the former. Similar to a data.frame, we can perform operations to slice and subset a data.table. This video shows some operations that you can perform on data.table.

- Slice the data with a given index sequence
- Set up a filtering condition and use the “:=” notation
- Use the “<:-“notation and the copy function

Another advantage of a data.table is that we can easily aggregate data without the help of additional packages. This video illustrates how to perform data aggregation using data.table.

- Average the data and use the by part
- Use the unique function and then arrange the data.table in order
- Use the “:=” notation and create a new column using aggregated results

In addition to performing data manipulation on a single table, we often need to import more features or correlate data from other data sources. Therefore, we can join two or more tables into one. In this video, we look at some methods to merge two data.table.

- Create two data.table objects and use merge to join them
- Use setkey to sort two data.table and then merge them
- Extract and search the data of a value group

To perform more advanced descriptive analysis, we must know how to use the dplyr package to reshape data and obtain summary statistics. This video will guide us how to use dplyr to manipulate data and to use the filter and slice functions to subset and slice data.

- Use the filter function and %in% operator
- Use the slice function to slice data by row index
- Connect dplyr to SQLite and perform SQL operations

As a single machine cannot efficiently process big data problems, a practical approach is to take samples that we can effectively use to draw conclusions. Here, we will see how to use dplyr to sample from data.

- Sample six rows from the data
- Use the sample_frac function
- Specify the sample weighting in sample argument

Besides selecting individual rows from the dataset, we can use the select function in dplyr to select a single or multiple columns from the dataset. In this video, we will look at how to select particular columns using the select function.

- Use select and obtain subset with User, Product, and Quantity
- Select columns with the P character. Use the select and filter functions together
- Use num_range inside the select function

To perform multiple operations on data using dplyr, we can wrap up the function calls into a larger function call. Or, we can use the %>% chaining operator to chain operations instead. This video will introduce chaining of operations when using dplyr.

- Use the sum function onto a sequence from 1 to 10
- Use a chaining operator to chain multiple function calls
- Chain a data filtering and column selection operation

Arranging rows in order may help us rank data by value or gain a more structured view of data in the same category. In this video, we will take a look at how to arrange rows with dplyr.

- Sort data by price in ascending order
- Use the desc function to sort in descending order
- Sort data by multiple keys in an arrange function

To avoid counting duplicate rows, we can use the distinct operation in SQL. In dplyr, we can also eliminate duplicated rows from a given dataset. Let’s explore how to do that.

- Obtain unique products from the dataset
- Use distinct function to distinct duplicated rows containing multiple columns.
- Use nrows to compare rows before and after data distinction

Besides performing data manipulation on existing columns, there are situations where a user may need to create a new column for more advanced analysis. Let’s see how to add a new column using dplyr.

- Create a new column named avg_price using the mutate function
- Drop existing variables using the transmute function
- Use the transform function to add a new column

Besides manipulating a dataset, the most important part of dplyr is that one can easily obtain summary statistics from the data. In SQL, we use the GROUP BY function for this purpose. This video will show us how to summarize data with dplyr.

- Use the summarize and group_by functions
- Use the summarize_each function
- Use the ungroup function and can sort summarized data

In a SQL operation, we can perform a join operation to combine two different datasets. In dplyr, we have the same join operation that enables us to merge data easily. In this video, we’ll learn how join works in dplyr.

- Create two data.table objects
- Use the inner_join function to join two data.table objects
- Perform left join, right join and full join

In ggplot2, the data is charted by mapping the element from mathematical space to physical space. We can use simple elements to build a figure. This video shows how to construct our very first ggplot2 plot using the superstore sales dataset. .

- Install the ggplot2 package and import superstore_sales.csv
- Summarize sales amount and subset the sales data.
- Create a canvas with point, line geometry, labels and title.

Aesthetics mapping describes how data variables are mapped to the visual property of a plot. In this video, we discuss how to modify aesthetics mapping on geometric objects so that we can change the position, size and color of a given geometric object.

- Create a scatterplot and set aesthetics mapping on a geometric object
- Adjust the point size and aesthetics properties
- Override the position of the y-axes and remove the aesthetics property

Geometric objects are elements that we mark on the plot. One can use geometric object in ggplot2 to create either a line, bar, or box chart. Moreover, one can integrate them with aesthetic mapping to create a more professional plot. This video introduces how to use geometric objects to create various charts.

- Use geom_point to create a scatterplot and geom_line to plot a line chart
- Use geom_bar to make a stack bar chart
- Create a histogram using geom_histogram and plot density using geom_density

Besides mapping particular variables to the x or y axis, one can first perform statistical transformations on variables, and then remap the transformed variable to a specific position. With the help of this video, we’ll be able perform variable transformations with ggplot2.

- Create a dataset named sample_sum2
- Use geom_point and geom_smooth to create a line plot with regression line
- Use the stat and stat_summary functions

Besides setting aesthetic mapping for each plot or geometric object, one can use scale to control how variables are mapped to the visual property. Let’s explore how to adjust the scale of aesthetics in ggplot2.

- Make a scatterplot and resize points using the scale_size_continuous function
- Repaint the points and adjust their shape using province
- Refill the color of the bar using scale_fill_brewer and rescale Y axis

When performing data exploration, it is essential to compare data across different groups. Faceting is a technique used to create graphs for subsets of data. This video will help us use the facet function to create a chart for multiple subsets of data.

- Create multiple subplots using the facet_wrap function
- Change the layout of the plot in vertical direction
- Use the facet_grid function to facet by more variables

One can adjust the layout, color, font, and other attributes of a non-data object using the theme system in ggplot2. By default, ggplot2 provides many themes, and one can adjust the current theme. This video will show us how to use the theme_* function and customize a theme.

- Use a different theme function to adjust the theme of the plot
- Set the theme freely using the theme function

To create an overview of a dataset, we may need to combine individual plots into one. This video will guide us on how to combine individual subplots into one plot.

- Load the grid library and a create new page
- Create two ggplot2 plots: scatterplot and line chart
- Put chart onto visible area by row and column position

One can use a map to visualize the geographical relationship of spatial data. This video shows us how to create a map from a shapefile with ggplot2 and use ggmap to download data from a mapping service.

- Load ggmap and maptools and use the geom._polygon function to plot a map
- Read Wi-Fi hotspot data and create a scatterplot with the geom._point function
- Download a map using the get_map function and add Wi-Fi hotspot locations

Creating an R Markdown report with RStudio is a straightforward process. This video will teach us how to use the built-in GUI to create markdown reports in different format.

- Open a new R Markdown file and select a document title
- Compile and render a report using Knit HTML
- Generate HTML report and edit output format, figures

In an R Markdown report, we can embed R code chunks with the knitr package. This video will guide us on how to create and control the output with different code chunk configurations.

- Use the
*knitr*syntax to create a basic code chunk - Hide the script and stop code evaluation
- Create inline code and render a figure in the markdown report

The ggvis package creates HTML output with CSS and JavaScript. Thus, one can embed ggvis graphics into web applications or HTML reports. Let’s explore how we can do that and make interactive plots.

- Install the
*ggvis*package and import*realestate.csv*into the R session - Create a
*scatterplot*and use*filter*to subset data - Create a plot that allows the user to choose the color

In ggvis, one can use a simple layer to create lines, points, and other geometry objects in the plot. This video guides us through using ggvis syntax and grammar to create different plots.

- Create a scatterplot and assign different colors and shapes to points
- Render smooth and linear regression lines.
- Make a histogram, bar plot, box plot and line plot

In addition to making different plots in ggvis, we can control how axes and legends are displayed in a ggvis figure with the *_axis and *_legend functions. Let’s see how we can set their appearance properties and rescale the mapping of the data with the scale function.

- Use the add_axis function to control the axis orientation
- Control the scale of axis and legend
- Create a bar plot and change color using the
*scale_nominal*function

ggvis can be used to create an interactive web form. It allows the user to subset data and change the visual properties of the plot by interacting with the web form. In this video, we learn how to add interactivity to a ggvis plot.

- Make a bar plot with color dropdown and size slider
- Create a radio button to choose the regression method
- Create dropdown to filter data and subset it
- Add a radio button to choose an attribute of the x-axis

An R Markdown report outputs codes and static figures; one cannot perform exploratory data analysis through web interaction. To enable the user to explore data via a web form, we have to build an interactive web page. In this video, we see how to create an interactive web report with Shiny.

- Open a new R Markdown file and choose Shiny as the document type
- Create a ggvis plot
- Compile and render a report

In addition to hosting a Shiny app on a local machine, we can host our Shiny app online. RStudio provides a service, http://www.shinyapps.io/, that allows anyone to upload their Shiny app. Let’s see how we publish an R shiny report using shiny apps.

- Click on ‘Publish Document’ and install the required packages
- Create an account with Shiny Apps and obtain your secret key
- Use a secret key to publish your R Shiny document.

Generating samples is the first step for working with probability distributions. So, learning this basic concept is very important.

- Type commands and run them on R studio

When the probability of many events is equal, we need a uniform distribution to show that.

- Install R
- Generate uniform distributions

You need to generate samples from a binomial distribution when you evaluate the success or failure of several independent trials. This video will enable you to do that.

- Generate binomial random variates using functions such as gbinom, dbinom and pbinom.

For calculating the probability of events with a fixed time interval, Poisson distribution is the best option.

- Generate samples
- Plot histograms

Real-world data follows a normal distribution curve. So sampling from a normal distribution should be learnt. This video will help you with that.

- Generate samples from normal distribution curves.
- Plot the histograms and compare.

Using R to generate chi-squared distribution.

- Generate samples
- Obtain distribution functions
- Plot the functions

To estimate the mean of the population from a normal distribution, the student’s t distribution is used.

- Generate samples
- Plot samples with different degrees of freedom

Along with generating samples, we can also sample subsets from datasets. This video will arm you to do that.

- Install and load the quantmod package
- Generate samples and make the histogram
- Separate data into groups and clusters.

When there are one or more random variables within the model, we need stochastic processes.

- Simulate random trading and Brownian motion processes

To estimate the interval range of unknown parameters in data, we use confidence intervals.

- Generate samples
- Obtain the Z score. Compute the standard deviation error
- Shade and plot the histogram or graph

To compare two mean values, we perform Z-tests on data.

- Collect samples
- Calculate standard deviation and Z score
- Compute p value. Perform a Z test

In cases where the standard deviation is unknown, we need to perform student’s T-tests.

- Perform one sample and two sample T –tests

When the data distribution is unknown, non-parametric testing comes into the picture. We do that by conducting exact binomial tests in R.

- Run the binom.test function with the necessary parameters

When comparing samples or a sample with a probability distribution test, we require Kolmogorov-Smirnov tests.

- Generate uniformly distributes sample data
- Plot ECDF. Apply the Kolmogorov-Smirnov test

To discover the relationship between two categorical variables, we need to conduct a Pearson’s chi-squared test.

- Build a matrix
- Plot a mosaic plot
- Perform Pearson’s chi-squared tests

To test the belonging of two groups to a population, we use Wilcoxon rank Sum and signed rank tests.

- Prepare the data. Plot histogram
- Run the Wilcoxon signed rank and rank sum tests

To investigate an individual categorical variable relation, one-way ANOVA is used.

- Visualize data with a boxplot
- Conduct a one-way ANOVA
- Do an ANOVA analysis and post-hoc comparison

When there are more than two categorical variables involved, two-way ANOVA is used.

- Load the data. Plot boxplots
- Make an interaction plot. Perform a two-way ANOVA
- Perform a post-hoc comparison

Before rule mining, it is important to transform the data into transactions.

- Install and load the a rules package
- Use the load function
- Convert the data.table into transactions

You will learn to display transactions and associations in this video.

- Obtain a list representation and use the summary function
- Display the transaction using the inspect function

To find the relation within a transaction dataset, we use the Apriori rule.

- Use Apriori to discover rules
- Display rules
- Sort rules

Sometimes, rules are repeated and are redundant. We need to know how to remove these rules to get significant information. This video will enable you to do that.

- Sort the rules by lift to find redundant rules
- Remove redundant rules

To explore the relation between items, we visualize association rules.

- Install and load the arulesviz package
- Make a scatterplot of pruned rules. Present the rules in a grouped item and graph

Eclat is faster than Apriori in mining itemsets. Hence it is essential to learn how it works.

- Generate a frequent itemset
- Obtain the summary information
- Examine the frequent itemset

You will learn to create transactions with temporal information in this video.

- Download the dataset
- Install and load the arulesSequences package
- Load web traffic data. Create transactions

A better algorithm for mining frequent sequential patterns is cSPADE. It is important to learn about it and understand it.

- Generate frequent sequential patterns
- Examine the summary
- Transform the sequence into dat set format

Time-indexed variables should be represented in time series data. Hence it is important to know how to create one.

- Read a financial report into an r session
- Use the ts function to transform the finance data into a time series object
- Use the class function to determine the data type. Print the contents.

Plotting a time series object will make visualization easy and effective.

- Plot time series data.
- Plot different types of files using different plots

To get the components of a time series, we need to decompose it.

- Construct a time series object
- Destruct the time series object
- Plot the components of the destructed time series

To measure the error rate of a regression model, we need to calculate RMSE and RSE.

- Retrieve predicted values. Calculate the mean square error
- Retrieve the RSE and RMSE of the model

We can forecast a time series from the smoothed model. Let’s learn how to do that.

- Load the forecast package
- Predict the income. Plot the prediction results

ARIMA takes auto-correlation into consideration. This helps in real-life examples.

- Simulate an ARIMA process and generate time series data
- Take the difference of the time series and plot it
- Plot a time series along with its ACF and/or its PACF

After understanding the ARIMA model, we can create an ARIMA model of our own. Let’s see how to do that.

- Create an ARIMA model
- Print the training set errors of the mode

We can predict values with the ARIMA model.

- Generate the prediction of future values
- Obtain the summary of our prediction
- Plot the forecast. Evaluate the model with an autocorrelation plot

You will apply your knowledge of the ARIMA model in prediction of stock prices.

- Install and load the quantmod package
- Plot the historical prices obtained. Find the best fitted model
- Prediction of future stock prices

When there is only one predictor variable, and the relationship between the response variable and independent variable is linear, linear regression model is used.

- Download a dataset. Read it
- Fit the independent and dependent variables
- Plot and view

To obtain information on our model, we use the summary function.

- Use summary function and get relevant information
- Use different functions to display properties

You can predict unknown values using the fitted regression model.

- Assign values to be predicted
- Compute the prediction result
- Plot the prediction results into a scatterplot

To measure the error rate of a regression model, we need to calculate RMSE and RSE.

- Retrieve predicted values. Calculate the mean square error
- Retrieve the RSE and RMSE of the model

You can predict the value of a dependent variable based on multiple independent variables using multiple regression analysis.

- Fit variables into a linear regression model
- Obtain the summary
- Predict and plot

To find the best fitted regression model, we perform stepwise regression.

- Select the optimum model with backward selection
- Fit data with variables that give the least AIC

A general linear model is a generalization of a linear model which can be used for linear prediction.

- Fit independent variables
- Use ANOVA

When you need to predict values of binary type, logistic regression analysis is really useful.

- Read a customer file. Fit data. Use summary
- Use the ‘predict’ function
- Retrieve data using the table function

A classification tree is used to predict class labels. Hence it is important to know about it.

- Build a classification model

Visualizing the classification model gives a better idea about it. You will learn how to plot it in this video.

- Use plot and text functions to plot the classification tree

We need to measure the performance of our classification model. For that, we need to create a confusion matrix and then proceed further with it.

- Predict labels
- Generate a classification table
- Generate a confusion matrix

ROCR makes visualizing the classification model easy for the users. It is essential to learn it.

- Install and load the ROCR package
- Use the prediction function
- Use a performance function and visualize the ROCR curve

Hierarchical clustering is the basic clustering technique. It is important to learn it to group similar objects.

- Load data
- Perform different types of clustering

Viewing clusters is enabled by dendrogram but we need to divide the data into clusters. For that, we use the cutree function.

- Categorize the data and examine cluster labels
- Make a scatter plot. Visualize clusters

K-means clustering is a faster way of clustering data. In this video, we will learn how to do it.

- Cluster the customer data
- Draw a scatterplot and color the points

In places where distances cannot be used, density-based clustering is used. DBSCAN is used for that.

- Install and load the DBSCAN packages
- Cluster data
- Plot the data in a scatterplot

Along with the cluster information, we need to know the distance between two clusters. You get that from silhouette information.

- Install and load the cluster package
- Generate a k-means object
- Compute silhouette information and plot it

After fitting data into clusters using different clustering methods, you may wish to measure the accuracy of the clustering. You will learn to do that here.

- Install and load the fpc package
- Use hierarchical and k-means clustering
- Retrieve the cluster validation statistics and generate cluster statistics

The pattern to be recognized will not be always shapeless. It can have a specific shape like numbers. Here we will learn digit recognition.

- Install and load the png package
- Read images and transfer the data into a scatterplot
- Perform the k-means and dbscan clustering methods

You will learn to group similar text documents in this session.

- Install and load tm and SnowballC packages
- Read and convert the data into corpus
- Convert the corpus into a document term matrix and perform k-means clustering

Not all data is useful. When there is a lot of redundant data, we have to perform dimension reduction. For that we use PCA. Let’s learn how to do that.

- Load a dataset
- Perform PCA
- Use the predict function to get the output.

Extracting features that contribute the most is important for any application. You can use a scree plot for that

- Generate a bar plot or line plot
- Use functions for plotting

The Kaiser method is an alternative for scree plot

- Obtain the standard deviation and variance
- Select components

Mapping of data and variables is found using biplot

- Create a scatterplot
- Create the biplot