Getting started with Rstudio on Ubuntu Linux

This tutorial shows how to install Rstudio on Ubuntu 20.04.

Rstudio offers an integrated development environment for working with the free programming language “R”, which is available under a license from GNU. Rstudio is an ideal computer environment for generating detailed statistical visualizations and as such is used by statisticians around the world.

RStudio is also available as a software program and server application that is used by a variety of different Linux distributions and for Windows and macOS.

Download programming language R (requirements)

The Rstudio desktop application requires the R programming language to work on Linux distributions. It is necessary to download an R version that is compatible with your Linux operating system. You can download it from a software repository.

1- Download R using the web browser

If you cannot get R from the software center, the repository needs to be updated first. You can just skip all of this and download it from the internet by entering this link:

https://cran.studio.com

In the search field of your web browser. Your homepage should look like the following screenshot:

2- Download R from the Linux terminal

Launch the CLI terminal, type the following command and press Enter:

Then run an update using the following commands:

$ sudo apt-get update

This command pulls the updates from R and pulls all relevant files from the main Ubuntu repository.

Then issue the following command to install R:

The above command iterates through the package list, shows how much space it will fill, and then prompts for confirmation. Press the ‘Y’ key on your keyboard to proceed with the installation.

The output will most likely confirm the installation.

You can look it up in the search box as shown below:


Install Rstudio on Ubuntu 20.04 using the command terminal

With the host programming language installed, we can now proceed to install Rstudio. We’ll use the command line terminal to demonstrate the installation.

Start the terminal and output the following

$ sudo Install apt-get gdebi-core

You will be asked to enter the root password. As soon as you enter the password, the package installation begins

$ wget https://download1.rstudio.org/Desktop/bionic/amd64/rstudio-1.3.1093-amd64.deb

The Rstudio online package is now connected and will be transferred to your hard drive.

You will be asked to enter the root password again. Enter the password so that the package list is read and loaded.

The installed one will ask for permission to proceed, press the y key on your keyboard.

The output verifies the installation as shown below.

First steps with RStudio:

To start RStudio, go to the search box and look for Rstudio. You will see it in one of the lists as shown below:

Click the Rstudio icon to launch it.

Examining records with RStudio

With Rstudio you can visualize any data in the form of graphics, tables and diagrams.

To understand how data is visually represented in Rstudio, let’s take the sample population from the 2010 census for each zip code as an example.

The process of data analysis can be vaguely reduced to the following four steps:

1-Import raw data

You can import the raw data into Rstudio directly from the web by systematically doing it in the console window with the following command:

$ cpd < – read.csv(Url(“https://data.lacity.org/api/views/nxs9-385f/rows.csv?accessType=DOWNLOAD”)

With the executed command, Rstudio fetches the data as a csv file from the web and the content is assigned to the cpd variable.

Another way to import data into Rstudio is to manually download the dataset to your hard drive and then open the content using Rstudio’s data import feature.

On the Environment tab, go to the Import Dataset option and select the dataset file to upload. Click Ok and the dataset dialog will appear. Here you enter the parameters as well as the names and decimal places. When you’re done, just hit Import and the record will be added to the Rstudio and its name will be assigned a variable.

To see which datasets are in use, issue the following command with the variable associated with a dataset:

$view(cpd)

2 – Manipulating the data

Now that you’ve imported the dataset, you can transform that data even more extensively. The data are manipulated by transformation functions. Suppose you want to tour to a specific array within the data set. If we were to go to the Total Population column in our dataset, we would enter the following command:

$ cpd$ Total population

The data can also be called up in the form of a vector:

$ cpd[1,3]

The subset function in Rstudio enables us to query the data set. Let’s say we need to highlight the lines where the male to female ratio is positive. To select these lines, enter the following command:

$ a <– subset(cpd, men overall > Total women)

In the command above, the first parameter we assigned had to be the variable associated with the record we applied the function to. A Boolean condition is considered as the second parameter. In addition, the Boolean condition must be evaluated for each line. It serves as a deciding factor in whether or not a line should be part of the output.

3 -Using the averaging functions of the data set

Rstudio has special functions to calculate averages of the data set:

$ mean(cpd$ Total men) – calculates simple average
$ median(cpd$ Total women) – indicates the median to the a column
$ Quantile(cpd$ Total population) – gives the quantile to the a column
$ var(cpd$ Total men) – calculates the variance to the a column
$ sd(cpd$ Total women) – gives the standard deviation

You can also perform one of these functions on the entire dataset to get the summary report on the dataset.

$ summary(cpd)

4 -Create a chart for the data set

If you work with Rstudio frequently, you will find its visualization tool very resourceful. With the plot and other visualization functions in Rstudio you can create a diagram from every imported data set.

To generate a scatter plot for the dataset, issue the following command:

$ plot(x = s$ Total Male, y = s$ Total Women, Type = ‘P’)

Now let’s discuss the parameters involved here. In each parameter, s refers to the subset of the original data set, and adding “p” indicates that the output should be plotted.

You can also display your data set in the form of a histogram:

$ hist(cpd$ Total households)

Similarly, you can get a bar graph of the imported dataset:

$ counts &lt; – table(cpd$ Total population)
$ Barplot(counts mainly=“Total population distribution”,
$ xlab=“Number of the total population”)

Manage data in unevenly spaced time series

To manage data with unevenly distributed time series, you should integrate the zoo package with Rstudio. To get the Zoo package, go to the lower right corner of the screen in Rstudio and go to the package component. The zoo package converts the irregular time series data into zoo objects. The arguments inserted to create zoo objects are the data that comes first, followed by the value to sort on.

Zoo objects support ease of use. All you have to do is type in “plot” and you will be presented with all of the plot methods you can use with this Zoo package.

If you are not sure what a particular Rstudio function has to offer, enter the name of that function and follow it with “?” to display the command prompt in the Help menu. Pressing Ctrl + Space after a function name also creates the auto-complete window.

Wrap up

This tutorial showed how to set up Rstudio on Ubuntu 20.04 and covered the basics of statistical display and manipulation with Rstudio. If you want to make better use of Rstudio, the first thing to do is to familiarize yourself with the basics of R programming. Rstudio is a powerful tool and has applications in many industries around the world: Artificial Intelligence and Data Mining to name a few.

Getting to know the core elements of R programming is a bit of a learning curve, but it’s well worth the effort.

Related Posts