Interlude 0.1: R, RStudio, Markdown
This is the technology you’ll need and learn in this course
Python will be covered in a next set of notes
Python will be covered in a next set of notes
Objectives
What is the purpose of these notes?
- Introduce you to how to run
R
code in RStuidio’ from within a Markdown document; - Give you the basic
R
syntax and structure; - Provide a tiny Markdown example.
Context
Reminder about :
Flexibility & reproducibility
R Markdown allows the user to integrate R code into a report
When data changes or code changes, so does the report
No more need to copy-and-paste graphics, tables, or numbers
Creates reproducible reports
- Anyone who has your R Markdown (.Rmd) file and input data can re-run your analysis and get the exact same results (tables, figures, summaries)
Can output report in HTML (default), Microsoft Word, or PDF
An example
This example shows an R Markdown (.Rmd) file opened in the Source pane of RStudio:
- To turn an Rmd file into a report, click the Knit HTML button in the Source pane menu bar
- The results will appear in a Preview window, as shown on the right
- You can knit into html (default), MS Word, and pdf format
- These lecture notes are also created in RStudio (using html_document as the output format)
R code chunks
To integrate R output into your report, you need to use R code chunks
All of the code that appears in between the “triple back-ticks” gets executed when you Knit
In-class exercise: Hello world!
Open RStudio on your machine
File > New File > R Markdown …
Change
summary(cars)
in the first code block toprint("Hello world!")
Click
Knit HTML
to produce an HTML file.Save your Rmd file as
helloworld.Rmd
All of your Homework assignments and many of your Labs will take the form of a single Rmd file, which you will edit to include your solutions and then submit on Google Classroom!
R basics
Pro tip: The ideas here apply to Python just as well, but the syntax is slightly different. We will cover those differences in a later lecture.
Everything we’ll do comes down to applying functions to data
Data: things like 7, “seven”, \(7.000\), the matrix \(\left[ \begin{array}{ccc} 7 & 7 & 7 \\ 7 & 7 & 7\end{array}\right]\)
Functions: things like \(\log{}\), \(+\) (two arguments), \(<\) (two), \(\mod{}\) (two),
mean
(one)
A function is a machine which turns input objects (arguments) into an output object (return value), possibly with side effects, according to a definite rule
Data building blocks
You’ll encounter different kinds of data types
Booleans Direct binary values:
TRUE
orFALSE
in RIntegers: whole numbers (positive, negative or zero)
Characters fixed-length blocks of bits, with special coding; strings = sequences of characters
Floating point numbers: a fraction (with a finite number of bits) times an exponent, like \(1.87 \times {10}^{6}\)
Missing or ill-defined values:
NA
,NaN
, etc.
Operators (functions)
You can use R as a very, very fancy calculator
Command | Description |
---|---|
+,-,*,\ |
add, subtract, multiply, divide |
^ |
raise to the power of |
%% |
remainder after division (ex: 8 %% 3 = 2 ) |
( ) |
change the order of operations |
log(), exp() |
logarithms and exponents (ex: log(10) = 2.302 ) |
sqrt() |
square root |
round() |
round to the nearest whole number (ex: round(2.3) = 2 ) |
floor(), ceiling() |
round down or round up |
abs() |
absolute value |
[1] 12
[1] 2
[1] 35
[1] 16807
[1] 1.4
[1] 2
[1] 1
Comparisons are also binary operators; they take two objects, like numbers, and give a Boolean
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE
Boolean operators
Basically “and” and “or”:
[1] FALSE
[1] TRUE
(will see special doubled forms, &&
and ||
, later)
More types
typeof()
function returns the typeis.
foo()
functions return Booleans for whether the argument is of type fooas.
foo()
(tries to) “cast” its argument to type foo — to translate it sensibly into a foo-type value
Special case: as.factor()
will be important later for telling R when numbers are actually encodings and not numeric values. (E.g., 1 = High school grad; 2 = College grad; 3 = Postgrad) ##
[1] "double"
[1] TRUE
[1] FALSE
[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE
Variables
We can give names to data objects; these give us variables
A few variables are built in:
[1] 3.141593
Variables can be arguments to functions or operators, just like constants:
[1] 31.41593
[1] -1
Assignment operator
Most variables are created with the assignment operator, <-
or =
[1] 12
[1] 30
The assignment operator also changes values:
[1] 30
[1] 45
Using names and variables makes code: easier to design, easier to debug, less prone to bugs, easier to improve, and easier for others to read
Avoid “magic constants”; use named variables
Use descriptive variable names
- Good:
num.students <- 35
- Bad:
ns <- 35
- Good:
The workspace
What names have you defined values for?
[1] "time.factor" "time.in.months" "time.in.years"
Getting rid of variables:
[1] "time.factor" "time.in.years"
First data structure: vectors
Group related data values into one object, a data structure
A vector is a sequence of values, all of the same type
c()
function returns a vector containing all its arguments in order
- Typing the variable name at the prompt causes it to display
[1] "Sean" "Louisa" "Frank" "Farhad" "Li"
Indexing
vec[1]
is the first element,vec[4]
is the 4th element ofvec
[1] "Sean" "Louisa" "Frank" "Farhad" "Li"
[1] "Farhad"
vec[-4]
is a vector containing all but the fourth element
[1] "Sean" "Louisa" "Frank" "Li"
Vector arithmetic
Operators apply to vectors “pairwise” or “elementwise”:
[1] 80 90 93 82 95
[1] 158 174 188 164 186
[1] 79 87 94 82 93
[1] 78.8 86.4 94.2 82.0 92.6
Pairwise comparisons
Is the final score higher than the midterm score?
[1] 80 90 93 82 95
[1] 78 84 95 82 91
[1] FALSE FALSE TRUE FALSE FALSE
Boolean operators can be applied elementwise:
[1] FALSE TRUE FALSE FALSE TRUE
Functions on vectors
Command | Description |
---|---|
sum(vec) |
sums up all the elements of vec |
mean(vec) |
mean of vec |
median(vec) |
median of vec |
min(vec), max(vec) |
the largest or smallest element of vec |
sd(vec), var(vec) |
the standard deviation and variance of vec |
length(vec) |
the number of elements in vec |
pmax(vec1, vec2), pmin(vec1, vec2) |
example: pmax(quiz1, quiz2) returns the higher of quiz 1 and quiz 2 for each student |
sort(vec) |
returns the vec in sorted order |
order(vec) |
returns the index that sorts the vector vec |
unique(vec) |
lists the unique elements of vec |
summary(vec) |
gives a five-number summary |
any(vec), all(vec) |
useful on Boolean vectors |
Functions on vectors
[1] 78.8 86.4 94.2 82.0 92.6
[1] 86.8
[1] 86.4
[1] 6.625708
More functions on vectors
[1] 78.8 82.0 86.4 92.6 94.2
[1] 94.2
[1] 78.8
Referencing elements of vectors
[1] "Sean" "Louisa" "Frank" "Farhad" "Li"
Vector of indices:
[1] "Louisa" "Farhad"
Vector of negative indices
[1] "Louisa" "Farhad" "Li"
More referencing
which()
returns the TRUE
indexes of a Boolean vector:
[1] 78.8 86.4 94.2 82.0 92.6
[1] FALSE FALSE TRUE FALSE TRUE
[1] 3 5
[1] "Frank" "Li"
Named components
You can give names to elements or components of vectors
[1] "Sean" "Louisa" "Frank" "Farhad" "Li"
[1] "Sean" "Louisa" "Frank" "Farhad" "Li"
Sean Frank Li
78.8 94.2 92.6
Note the labels in what R prints; these are not actually part of the value
Useful RStudio tips
Keystroke | Description |
---|---|
<tab> |
autocompletes commands and filenames, and lists arguments for functions. Highly useful! |
<up> |
cycle through previous commands in the console prompt |
<ctrl-up> |
lists history of previous commands matching an unfinished one |
<ctrl-enter> |
paste current line from source window to console. Good for trying things out ideas from a source file. |
<ESC> |
as mentioned, abort an unfinished command and get out of the + prompt |
Homework & Lab
“Homework” 0:
See campuswire post #11 for link.
What
A hands-on worksheet where you practice basic R within Markdown, in Rstudio. 514-Lab1.Rmd 514-Lab1.html
When
[[was due Thu 2/4]] new due date: Due Fri 2/5
Where
submit on google classroom \[\leftarrow\]make sure you’re logged into your IIT account to access.
Instructions
[These are copied from Google classroom.]
you are to start to work on this worksheet in groups in class,
Then you are to finish it up by yourself and submit.
As for each submission for this course, you need to submit at least two files, and preferably three:
- the source .Rmd file
- the knitted html
- the knitted pdf.
- 1 and 2 are required.
- 3 is optional but desired (it is much easier for me to grade it, that’s the reason why). You can create pdf by changing “html_document” to “pdf_document” in the output specification in your .Rmd file.
Appendix
Some resources and further reading.
What is the format of this document?
This document was created using R Markdown. You can read more about it here and check out a cheat sheet here, which will guide you through installing RStudio, and from there the moment you create a new .Rmd document – this will already be a working template to start from. I hope you find this useful!
Installing and loading packages
Just like every other programming language you may be familiar with, R’s capabilities can be greatly extended by installing additional “packages” and “libraries”.
To install a package, use the install.packages()
command. You’ll want to run the following commands to get the necessary packages for today’s lab:
install.packages("rmdformats")
install.packages("ggplot2")
install.packages("knitr")
You only need to install packages once. Once they’re installed, you may use them by loading the libraries using the library()
command. For today’s lab, you’ll want to run the following code
Full info page on Markdown
Link to extensive resource with many examples you can reference as needed.
License & Acknowledgements
This document is created for Math 563, Spring 2021, at Illinois Tech. While the course materials are generally not to be distributed outside the course without permission of the instructor, this particular set of notes is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Part of it is sourced from materials created by Prof. Alexandra Chouldechova from CMU, also distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.