Objectives

What is the purpose of these notes?

  1. Introduce you to how to run R code in RStuidio’ from within a Markdown document;
  2. Give you the basic R syntax and structure;
  3. Provide a tiny Markdown example.

Context

Reminder about :

Flexibility & reproducibility

  • R Markdown allows the user to integrate R code into a report

  • When data changes or code changes, so does the report

  • No more need to copy-and-paste graphics, tables, or numbers

  • Creates reproducible reports

    • Anyone who has your R Markdown (.Rmd) file and input data can re-run your analysis and get the exact same results (tables, figures, summaries)
  • Can output report in HTML (default), Microsoft Word, or PDF

An example

This example shows an R Markdown (.Rmd) file opened in the Source pane of RStudio:

  • To turn an Rmd file into a report, click the Knit HTML button in the Source pane menu bar
  • The results will appear in a Preview window, as shown on the right
  • You can knit into html (default), MS Word, and pdf format
  • These lecture notes are also created in RStudio (using html_document as the output format)

R code chunks

  • To integrate R output into your report, you need to use R code chunks

  • All of the code that appears in between the “triple back-ticks” gets executed when you Knit

In-class exercise: Hello world!

  1. Open RStudio on your machine

  2. File > New File > R Markdown …

  3. Change summary(cars) in the first code block to print("Hello world!")

  4. Click Knit HTML to produce an HTML file.

  5. Save your Rmd file as helloworld.Rmd

All of your Homework assignments and many of your Labs will take the form of a single Rmd file, which you will edit to include your solutions and then submit on Google Classroom!

R basics

Pro tip: The ideas here apply to Python just as well, but the syntax is slightly different. We will cover those differences in a later lecture.

  • Everything we’ll do comes down to applying functions to data

  • Data: things like 7, “seven”, \(7.000\), the matrix \(\left[ \begin{array}{ccc} 7 & 7 & 7 \\ 7 & 7 & 7\end{array}\right]\)

  • Functions: things like \(\log{}\), \(+\) (two arguments), \(<\) (two), \(\mod{}\) (two), mean (one)

A function is a machine which turns input objects (arguments) into an output object (return value), possibly with side effects, according to a definite rule

Data building blocks

You’ll encounter different kinds of data types

  • Booleans Direct binary values: TRUE or FALSE in R

  • Integers: whole numbers (positive, negative or zero)

  • Characters fixed-length blocks of bits, with special coding; strings = sequences of characters

  • Floating point numbers: a fraction (with a finite number of bits) times an exponent, like \(1.87 \times {10}^{6}\)

  • Missing or ill-defined values: NA, NaN, etc.

Operators (functions)

You can use R as a very, very fancy calculator

Command Description
+,-,*,\ add, subtract, multiply, divide
^ raise to the power of
%% remainder after division (ex: 8 %% 3 = 2)
( ) change the order of operations
log(), exp() logarithms and exponents (ex: log(10) = 2.302)
sqrt() square root
round() round to the nearest whole number (ex: round(2.3) = 2)
floor(), ceiling() round down or round up
abs() absolute value

7 + 5 # Addition
[1] 12
7 - 5 # Subtraction
[1] 2
7 * 5 # Multiplication
[1] 35
7 ^ 5 # Exponentiation
[1] 16807

7 / 5 # Division
[1] 1.4
7 %% 5 # Modulus
[1] 2
7 %/% 5 # Integer division 
[1] 1

Comparisons are also binary operators; they take two objects, like numbers, and give a Boolean

7 > 5
[1] TRUE
7 < 5
[1] FALSE
7 >= 7
[1] TRUE
7 <= 5
[1] FALSE

7 == 5
[1] FALSE
7 != 5
[1] TRUE

Boolean operators

Basically “and” and “or”:

(5 > 7) & (6*7 == 42)
[1] FALSE
(5 > 7) | (6*7 == 42)
[1] TRUE

(will see special doubled forms, && and ||, later)

More types

  • typeof() function returns the type

  • is.foo() functions return Booleans for whether the argument is of type foo

  • as.foo() (tries to) “cast” its argument to type foo — to translate it sensibly into a foo-type value

Special case: as.factor() will be important later for telling R when numbers are actually encodings and not numeric values. (E.g., 1 = High school grad; 2 = College grad; 3 = Postgrad) ##

typeof(7)
[1] "double"
is.numeric(7)
[1] TRUE
is.na(7)
[1] FALSE

is.character(7)
[1] FALSE
is.character("7")
[1] TRUE
is.character("seven")
[1] TRUE
is.na("seven")
[1] FALSE

Variables

We can give names to data objects; these give us variables

A few variables are built in:

pi
[1] 3.141593

Variables can be arguments to functions or operators, just like constants:

pi*10
[1] 31.41593
cos(pi)
[1] -1

Assignment operator

Most variables are created with the assignment operator, <- or =

time.factor <- 12
time.factor
[1] 12
time.in.years = 2.5
time.in.years * time.factor
[1] 30

The assignment operator also changes values:

time.in.months <- time.in.years * time.factor
time.in.months
[1] 30
time.in.months <- 45
time.in.months
[1] 45

  • Using names and variables makes code: easier to design, easier to debug, less prone to bugs, easier to improve, and easier for others to read

  • Avoid “magic constants”; use named variables

  • Use descriptive variable names

    • Good: num.students <- 35
    • Bad: ns <- 35

The workspace

What names have you defined values for?

ls()
[1] "time.factor"    "time.in.months" "time.in.years" 

Getting rid of variables:

rm("time.in.months")
ls()
[1] "time.factor"   "time.in.years"

First data structure: vectors

  • Group related data values into one object, a data structure

  • A vector is a sequence of values, all of the same type

  • c() function returns a vector containing all its arguments in order

students <- c("Sean", "Louisa", "Frank", "Farhad", "Li")
midterm <- c(80, 90, 93, 82, 95)
  • Typing the variable name at the prompt causes it to display
students
[1] "Sean"   "Louisa" "Frank"  "Farhad" "Li"    

Indexing

  • vec[1] is the first element, vec[4] is the 4th element of vec
students
[1] "Sean"   "Louisa" "Frank"  "Farhad" "Li"    
students[4]
[1] "Farhad"
  • vec[-4] is a vector containing all but the fourth element
students[-4]
[1] "Sean"   "Louisa" "Frank"  "Li"    

Vector arithmetic

Operators apply to vectors “pairwise” or “elementwise”:

final <- c(78, 84, 95, 82, 91) # Final exam scores
midterm # Midterm exam scores
[1] 80 90 93 82 95
midterm + final # Sum of midterm and final scores
[1] 158 174 188 164 186
(midterm + final)/2 # Average exam score
[1] 79 87 94 82 93
course.grades <- 0.4*midterm + 0.6*final # Final course grade
course.grades
[1] 78.8 86.4 94.2 82.0 92.6

Pairwise comparisons

Is the final score higher than the midterm score?

midterm 
[1] 80 90 93 82 95
final
[1] 78 84 95 82 91
final > midterm
[1] FALSE FALSE  TRUE FALSE FALSE

Boolean operators can be applied elementwise:

(final < midterm) & (midterm > 80)
[1] FALSE  TRUE FALSE FALSE  TRUE

Functions on vectors

Command Description
sum(vec) sums up all the elements of vec
mean(vec) mean of vec
median(vec) median of vec
min(vec), max(vec) the largest or smallest element of vec
sd(vec), var(vec) the standard deviation and variance of vec
length(vec) the number of elements in vec
pmax(vec1, vec2), pmin(vec1, vec2) example: pmax(quiz1, quiz2) returns the higher of quiz 1 and quiz 2 for each student
sort(vec) returns the vec in sorted order
order(vec) returns the index that sorts the vector vec
unique(vec) lists the unique elements of vec
summary(vec) gives a five-number summary
any(vec), all(vec) useful on Boolean vectors

Functions on vectors

course.grades
[1] 78.8 86.4 94.2 82.0 92.6
mean(course.grades) # mean grade
[1] 86.8
median(course.grades)
[1] 86.4
sd(course.grades) # grade standard deviation
[1] 6.625708

More functions on vectors

sort(course.grades)
[1] 78.8 82.0 86.4 92.6 94.2
max(course.grades) # highest course grade
[1] 94.2
min(course.grades) # lowest course grade
[1] 78.8

Referencing elements of vectors

students
[1] "Sean"   "Louisa" "Frank"  "Farhad" "Li"    

Vector of indices:

students[c(2,4)]
[1] "Louisa" "Farhad"

Vector of negative indices

students[c(-1,-3)]
[1] "Louisa" "Farhad" "Li"    

More referencing

which() returns the TRUE indexes of a Boolean vector:

course.grades
[1] 78.8 86.4 94.2 82.0 92.6
a.threshold <- 90 # A grade = 90% or higher
course.grades >= a.threshold # vector of booleans
[1] FALSE FALSE  TRUE FALSE  TRUE
a.students <- which(course.grades >= a.threshold) # Applying which() 
a.students
[1] 3 5
students[a.students] # Names of A students
[1] "Frank" "Li"   

Named components

You can give names to elements or components of vectors

students
[1] "Sean"   "Louisa" "Frank"  "Farhad" "Li"    
names(course.grades) <- students # Assign names to the grades
names(course.grades)
[1] "Sean"   "Louisa" "Frank"  "Farhad" "Li"    
course.grades[c("Sean", "Frank","Li")] # Get final grades for 3 students
 Sean Frank    Li 
 78.8  94.2  92.6 

Note the labels in what R prints; these are not actually part of the value

Useful RStudio tips

Keystroke Description
<tab> autocompletes commands and filenames, and lists arguments for functions. Highly useful!
<up> cycle through previous commands in the console prompt
<ctrl-up> lists history of previous commands matching an unfinished one
<ctrl-enter> paste current line from source window to console. Good for trying things out ideas from a source file.
<ESC> as mentioned, abort an unfinished command and get out of the + prompt


Homework & Lab

“Homework” 0:

See campuswire post #11 for link.

What

A hands-on worksheet where you practice basic R within Markdown, in Rstudio. 514-Lab1.Rmd 514-Lab1.html

When

[[was due Thu 2/4]] new due date: Due Fri 2/5

Where

submit on google classroom \[\leftarrow\]make sure you’re logged into your IIT account to access.

Instructions

[These are copied from Google classroom.]

  • you are to start to work on this worksheet in groups in class,

  • Then you are to finish it up by yourself and submit.

  • As for each submission for this course, you need to submit at least two files, and preferably three:

    1. the source .Rmd file
    2. the knitted html
    3. the knitted pdf.
    • 1 and 2 are required.
    • 3 is optional but desired (it is much easier for me to grade it, that’s the reason why). You can create pdf by changing “html_document” to “pdf_document” in the output specification in your .Rmd file.

Appendix

Some resources and further reading.

What is the format of this document?

This document was created using R Markdown. You can read more about it here and check out a cheat sheet here, which will guide you through installing RStudio, and from there the moment you create a new .Rmd document – this will already be a working template to start from. I hope you find this useful!

Installing and loading packages

Just like every other programming language you may be familiar with, R’s capabilities can be greatly extended by installing additional “packages” and “libraries”.

To install a package, use the install.packages() command. You’ll want to run the following commands to get the necessary packages for today’s lab:

install.packages("rmdformats")
install.packages("ggplot2")
install.packages("knitr")

You only need to install packages once. Once they’re installed, you may use them by loading the libraries using the library() command. For today’s lab, you’ll want to run the following code

library(ggplot2) # graphics library
library(knitr)   # contains kable() function

options(scipen = 4)  # Suppresses scientific notation

Full info page on Markdown

Link to extensive resource with many examples you can reference as needed.

License & Acknowledgements

This document is created for Math 563, Spring 2021, at Illinois Tech. While the course materials are generally not to be distributed outside the course without permission of the instructor, this particular set of notes is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Part of it is sourced from materials created by Prof. Alexandra Chouldechova from CMU, also distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


  1. Sonja Petrović, Associate Professor of Applied Mathematics, College of Computing, Illinios Tech. Homepage, Email.↩︎