+ - 0:00:00
Notes for current slide
Notes for next slide

Dr Ajay Kumar Koli
Head & Educator
School of Information & Data Science
Nalanda Academy - Wardha
@ajay_kolii
koliajaykumar@gmail.com
https://koliajay.netlify.app/





Hello! 😊

2/240
6/240

😍 R is FREE

  • R is a language and environment for statistical computing and graphics. (R project)
7/240

😍 R is FREE

  • R is a language and environment for statistical computing and graphics. (R project)

  • In August 1993, designed by

Ross Ihaka <br>(New Zealand Statistician)

Ross Ihaka
(New Zealand Statistician)

Robert Gentleman<br>(Canadian Statistician)

Robert Gentleman
(Canadian Statistician)

7/240

Download R from CRAN

8/240

R Console

- R version

- R name

- R licence

- prompt >

9/240

Never Save R "Workspace Image":

  • It helps in "freshly minted R sessions".

  • "put more trust in your script than in your memory"

10/240

R as a BIG calc

What you code

1

What you see

## [1] 1
11/240

R as a BIG calc

What you code

1
1 + 1

What you see

## [1] 1
## [1] 2
12/240

R as a BIG calc

What you code

1
1 + 1
34 / 40

What you see

## [1] 1
## [1] 2
## [1] 0.85
13/240

R as a BIG calc

What you code

1
1 + 1
34 / 40
5 < 4

What you see

## [1] 1
## [1] 2
## [1] 0.85
## [1] FALSE
14/240

R as a BIG calc

What you code

1
1 + 1
34 / 40
5 < 4
16 == 16

What you see

## [1] 1
## [1] 2
## [1] 0.85
## [1] FALSE
## [1] TRUE
15/240

Plot using R

plot(1:100)

16/240

R Function

  • "A function, in a programming environment, is a set of instructions. A programmer builds a function to avoid repeating the same task, or reduce complexity."


18/240

R Function

  • "A function, in a programming environment, is a set of instructions. A programmer builds a function to avoid repeating the same task, or reduce complexity."


round(9.1565, 2)
## [1] 9.16
18/240

Structure of R function

19/240

Round Function

round(x = 564.56743, digits = 2)
## [1] 564.57
20/240

Round Function

round(x = 564.56743, digits = 1)
## [1] 564.6
21/240

Square Root Function

sqrt(x = 9)
## [1] 3
22/240

Sequence Function

seq.int(from = 10, to = 30, by = 5)
## [1] 10 15 20 25 30
23/240

R Comment:

  • "Humans will be able to read the comments, but your computer will pass over them."1
25/240

R Comment:

  • "Humans will be able to read the comments, but your computer will pass over them."1

  • In R, # is used as a commenting symbol

25/240

How to add comment?

# secret code of happiness
(1 + 4) - (3 * 5) / 10
## r does not evaluate this
# all these are comments
# comments are very important
## [1] 3.5
26/240

😞 That's okay but you promise to...

27/240

😞 That's okay but you promise to...

  • combine plot, text, tables and images in a single file.
27/240

😞 That's okay but you promise to...

  • combine plot, text, tables and images in a single file.

  • publish my work online or convert into a word, pdf or html file.

27/240

😞 That's okay but you promise to...

  • combine plot, text, tables and images in a single file.

  • publish my work online or convert into a word, pdf or html file.

  • work efficiently with my different projects and save, share and track them.

27/240

😞 That's okay but you promise to...

  • combine plot, text, tables and images in a single file.

  • publish my work online or convert into a word, pdf or html file.

  • work efficiently with my different projects and save, share and track them.

WE NEED A SUPERHERO ...

27/240

R ⇌ RStudio

30/240

R ⇌ RStudio

Imagine RStudio as a stylish car ...

🚗

30/240

R ⇌ RStudio

Imagine RStudio as a stylish car ...

🚗

and R as its powerful engine.

⚙ïļ

30/240

RStudio IDE

31/240

RStudio → Tools → Global Options

32/240

RStudio → Tools → Global Options

33/240
34/240

R Program

35/240

Data Wrangling

36/240

Exploratory Data Analysis

37/240

Modeling

38/240

Data Visualisation

39/240

R Program

40/240

R Packages:

  • On 12 Jan 2022, 18698 R packages were available at CRAN
43/240

R Packages:

  • On 12 Jan 2022, 18698 R packages were available at CRAN

  • "An R package is a collection of functions, data, and documentation that extends the capabilities of base R. Using packages is key to the successful use of R."

43/240

R Packages:

  • On 12 Jan 2022, 18698 R packages were available at CRAN

  • "An R package is a collection of functions, data, and documentation that extends the capabilities of base R. Using packages is key to the successful use of R."

  • Top downloaded packages source

43/240

To Download pkgs

44/240

Name of the R package(s)

45/240

Installed R package(s)

46/240

R Function to Download Package

install.packages("tidyverse")
47/240

R Function to Download Package

install.packages("tidyverse")

R Function to use Package

library(tidyverse)
47/240

About R Packages:

  • You need to install package only once like

    • 📚 We buy books once and use them again and again
48/240

About R Packages:

  • You need to install package only once like

    • 📚 We buy books once and use them again and again

    • ðŸ’Ą Fix the bulb once and use it again and again

48/240

About R Packages:

  • You need to install package only once like

    • 📚 We buy books once and use them again and again

    • ðŸ’Ą Fix the bulb once and use it again and again

  • In every R document you need to call once the package using function library(), for example library(ggplot2).

48/240

About R Packages:

  • You need to install package only once like

    • 📚 We buy books once and use them again and again

    • ðŸ’Ą Fix the bulb once and use it again and again

  • In every R document you need to call once the package using function library(), for example library(ggplot2).

  • Once in a while, you need to update the installed packages as well.

48/240

About R Packages:

  • You need to install package only once like

    • 📚 We buy books once and use them again and again

    • ðŸ’Ą Fix the bulb once and use it again and again

  • In every R document you need to call once the package using function library(), for example library(ggplot2).

  • Once in a while, you need to update the installed packages as well.

  • If you un-install R or RStudio, you will lose all installed packages.
48/240

Tools → Check Package Updates

49/240

Select Package(s) to Update

50/240

Click Install Updates

51/240

To Remove Package(s)

52/240

R Object

- "Just a name that you can use to call up stored data"

Source: RStudio

55/240

Create Object

salary <- c(20, 30, 40, 50, -10)
salary
## [1] 20 30 40 50 -10
56/240

Create Object

name <- c("Ram", "Rani", "Ali", "Preeti", "John")
name
## [1] "Ram" "Rani" "Ali" "Preeti"
## [5] "John"
57/240

Create Object

age <- c(34, 54, 23, 65, 2 )
age
## [1] 34 54 23 65 2
58/240

Create Object

place <- c("ny", "ber", "dhl", "tko", "lon")
place
## [1] "ny" "ber" "dhl" "tko" "lon"
59/240

Create Object

books <- c(4, 0, 3, 24, 5)
books
## [1] 4 0 3 24 5
60/240

Guidelines to name objects in R:

  • a name cannot start with a number
61/240

Guidelines to name objects in R:

  • a name cannot start with a number

  • a name cannot use some special symbols, like ^, !, $, @, +, -, /, or *:

61/240

Guidelines to name objects in R:

  • a name cannot start with a number

  • a name cannot use some special symbols, like ^, !, $, @, +, -, /, or *:

  • avoid caps

61/240

Guidelines to name objects in R:

  • a name cannot start with a number

  • a name cannot use some special symbols, like ^, !, $, @, +, -, /, or *:

  • avoid caps

  • avoid space

61/240

Guidelines to name objects in R:

  • a name cannot start with a number

  • a name cannot use some special symbols, like ^, !, $, @, +, -, /, or *:

  • avoid caps

  • avoid space

  • use dash (like na-me) or underscore (like na_me)
61/240

Guidelines to name objects in R:

  • a name cannot start with a number

  • a name cannot use some special symbols, like ^, !, $, @, +, -, /, or *:

  • avoid caps

  • avoid space

  • use dash (like na-me) or underscore (like na_me)

  • if chronology matters then add date (2020-09-05-file-name)

61/240

RStudio Environment Window

62/240

RStudio Environment Window

ðŸĪ”how to combine these
objects/variables into a data or say tidy data

62/240

Tidy data 👇 ðŸ˜ŧðŸ˜ŧðŸ˜ŧ

## age books name place salary
## 1 34 4 Ram ny 20
## 2 54 0 Rani ber 30
## 3 23 3 Ali dhl 40
## 4 65 24 Preeti tko 50
## 5 2 5 John lon -10
63/240

Tidy data 👇 ðŸ˜ŧðŸ˜ŧðŸ˜ŧ

## age books name place salary
## 1 34 4 Ram ny 20
## 2 54 0 Rani ber 30
## 3 23 3 Ali dhl 40
## 4 65 24 Preeti tko 50
## 5 2 5 John lon -10

63/240

How to create a data object?

social <- data.frame(age, books, name, place, salary)
social
## age books name place salary
## 1 34 4 Ram ny 20
## 2 54 0 Rani ber 30
## 3 23 3 Ali dhl 40
## 4 65 24 Preeti tko 50
## 5 2 5 John lon -10
64/240

How to export data as a csv file?

library(readr)
# to save this data set as a csv file
write_csv(social, "data/social.csv")
65/240

Get a List of all Objects

# names of created objects
objects()
## [1] "age" "books" "capital"
## [4] "foundation" "name" "place"
## [7] "pop" "praw" "salary"
## [10] "soc" "social" "state"
## [13] "world"
67/240

Using Console >

in console type ?your query

69/240

Using Console >

in console type ?your query

for example ?ggplot

69/240

RStudio: pkg Help Docs

70/240

Twitter #rstats

74/240

ðŸ™‹ðŸ―â€â™€ïļðŸ™‹â€â™‚ïļ
Q&A

75/240

Dynamic Documents
Using R Markdown

Next Module - 2

76/240

Open RStudio

80/240

Open RStudio

81/240

Open RStudio

82/240

About RStudio Projects

  • "to divide your work into multiple contexts, each with their own:
84/240

About RStudio Projects

  • "to divide your work into multiple contexts, each with their own:

    • working directory,
84/240

About RStudio Projects

  • "to divide your work into multiple contexts, each with their own:

    • working directory,

    • workspace,

84/240

About RStudio Projects

  • "to divide your work into multiple contexts, each with their own:

    • working directory,

    • workspace,

    • history, and

84/240

About RStudio Projects

  • "to divide your work into multiple contexts, each with their own:

    • working directory,

    • workspace,

    • history, and

    • source documents."

Source & Artwork Source

84/240

ðŸ”Ĩ Create RStudio Project in 4 Steps ðŸ”Ĩ

85/240

Create RStudio Project in 4 Steps

86/240

Create RStudio Project in 4 Steps

87/240

Create RStudio Project in 4 Steps

88/240

Create RStudio Project in 4 Steps

89/240

Open RStudio Project

90/240

Open RStudio Project

91/240

Open RStudio Project

92/240

About R Markdown:

  • "You bring your data, code, and ideas, and R Markdown renders your content into a polished document that can be used to:
94/240

About R Markdown:

  • "You bring your data, code, and ideas, and R Markdown renders your content into a polished document that can be used to:

    • Do data science interactively within the RStudio IDE,
94/240

About R Markdown:

  • "You bring your data, code, and ideas, and R Markdown renders your content into a polished document that can be used to:

    • Do data science interactively within the RStudio IDE,

    • Reproduce your analyses,

94/240

About R Markdown:

  • "You bring your data, code, and ideas, and R Markdown renders your content into a polished document that can be used to:

    • Do data science interactively within the RStudio IDE,

    • Reproduce your analyses,

    • Collaborate and share code with others, and

94/240

About R Markdown:

  • "You bring your data, code, and ideas, and R Markdown renders your content into a polished document that can be used to:

    • Do data science interactively within the RStudio IDE,

    • Reproduce your analyses,

    • Collaborate and share code with others, and

    • Communicate your results with others."

94/240

File → New File → R Markdown

97/240

R Markdown

98/240

R Markdown

99/240

R Markdown

100/240

R Markdown

101/240

Save your .Rmd file

102/240

Name your .Rmd file

103/240

Name your .Rmd file

104/240

Save your .Rmd file

105/240

Saved .Rmd file → in RStudio Project

106/240

107/240
108/240

R Markdown has 3 important parts:

- YAML

- Code chunk

- Text

109/240
110/240
111/240
112/240

ðŸ§ķ Knit a R Markdown File

  • YAML options

  • Headings, subheadings, text & maths equations

  • Code Chunk

    • Include images

    • Include tables

    • Include plot

  • Themes

  • Multiple Output Formats

  • eBook

113/240

ðŸ™‹ðŸ―â€â™€ïļðŸ™‹â€â™‚ïļ
Q&A

114/240

Dynamic Visualisation
Using ggplot2

Next Module - 3

115/240

Course Progress

119/240

Variable types in R:

121/240

Variable types in R:

  • int stands for integers, like 4, 55, 300.
121/240

Variable types in R:

  • int stands for integers, like 4, 55, 300.

  • dbl stands for doubles, or real numbers like 3, 7.45, 1.565, 12.

121/240

Variable types in R:

  • int stands for integers, like 4, 55, 300.

  • dbl stands for doubles, or real numbers like 3, 7.45, 1.565, 12.

  • chr stands for character vectors, or strings like names.

121/240

Variable types in R:

  • int stands for integers, like 4, 55, 300.

  • dbl stands for doubles, or real numbers like 3, 7.45, 1.565, 12.

  • chr stands for character vectors, or strings like names.

  • dttm stands for date-times (a date + a time).

121/240

Variable types in R:

  • int stands for integers, like 4, 55, 300.

  • dbl stands for doubles, or real numbers like 3, 7.45, 1.565, 12.

  • chr stands for character vectors, or strings like names.

  • dttm stands for date-times (a date + a time).

  • lgl stands for logical, vectors that contain only TRUE or FALSE.

121/240

Variable types in R:

  • int stands for integers, like 4, 55, 300.

  • dbl stands for doubles, or real numbers like 3, 7.45, 1.565, 12.

  • chr stands for character vectors, or strings like names.

  • dttm stands for date-times (a date + a time).

  • lgl stands for logical, vectors that contain only TRUE or FALSE.

  • fct stands for factors, which R uses to represent categorical variables with fixed possible values like occupation: student, professional, government, business.

121/240

Variable types in R:

  • int stands for integers, like 4, 55, 300.

  • dbl stands for doubles, or real numbers like 3, 7.45, 1.565, 12.

  • chr stands for character vectors, or strings like names.

  • dttm stands for date-times (a date + a time).

  • lgl stands for logical, vectors that contain only TRUE or FALSE.

  • fct stands for factors, which R uses to represent categorical variables with fixed possible values like occupation: student, professional, government, business.

  • date stands for dates.

121/240

An Overview of Data

glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, â€Ķ
## $ island <fct> Torgersen, Torgeâ€Ķ
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3â€Ķ
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0â€Ķ
## $ flipper_length_mm <int> 181, 186, 195, Nâ€Ķ
## $ body_mass_g <int> 3750, 3800, 3250â€Ķ
## $ sex <fct> male, female, feâ€Ķ
## $ year <int> 2007, 2007, 2007â€Ķ
123/240

An Overview of Data

summary(penguins)
## species island
## Adelie :152 Biscoe :168
## Chinstrap: 68 Dream :124
## Gentoo :124 Torgersen: 52
##
##
##
##
## bill_length_mm bill_depth_mm
## Min. :32.10 Min. :13.10
## 1st Qu.:39.23 1st Qu.:15.60
## Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g
## Min. :172.0 Min. :2700
## 1st Qu.:190.0 1st Qu.:3550
## Median :197.0 Median :4050
## Mean :200.9 Mean :4202
## 3rd Qu.:213.0 3rd Qu.:4750
## Max. :231.0 Max. :6300
## NA's :2 NA's :2
## sex year
## female:165 Min. :2007
## male :168 1st Qu.:2007
## NA's : 11 Median :2008
## Mean :2008
## 3rd Qu.:2009
## Max. :2009
##
124/240

Packages required:

library(palmerpenguins) # to access penguin data
library(tidyverse) # to use ggplot2 pkg
  • Packages recommended:
install.packages(c(
"directlabels", "dplyr", "gameofthrones", "ggforce", "gghighlight",
"ggnewscale", "ggplot2", "ggraph", "ggrepel", "ggtext", "ggthemes",
"hexbin", "mapproj", "maps", "munsell", "ozmaps", "paletteer",
"patchwork", "rmapshaper", "scico", "seriation", "sf", "stars",
"tidygraph", "tidyr", "wesanderson"
))
125/240

ggplot2 by Hadley Wickham


  • "is a system for declaratively creating graphics, based on The Grammar of Graphics" (book by Late Leland Wilkinson)
Late Leland Wilkinson

Late Leland Wilkinson

Hadley Wickham

Hadley Wickham

127/240

Key Components for ggplot2 Plot

  1. data,

  2. aesthetic mapping

  3. at least one layer of geom function

129/240

ggplot(data = penguins)

130/240

ggplot(data = penguins, mapping = aes(x = species))

131/240

ggplot(data = penguins, mapping = aes(x = species)) +
geom_bar()

132/240
ggplot(penguins, aes(x = species)) +
geom_bar()

133/240

How to export plot to your computer?

135/240
ggplot(data = penguins, mapping = aes(x = species)) +
geom_bar()
ggsave("peng-species.pdf") # also try jpg/jpeg/png

## Saving 7 x 7 in image
136/240

How to add color to bars?

137/240
ggplot(data = penguins, mapping = aes(x = species)) +
geom_bar(fill = "blue")

138/240
ggplot(data = penguins, mapping = aes(x = species)) +
geom_bar(fill = c("orange", "white", "green"))
# color names should be equal to the factor levels
# in case of factor species levels are three
# Adele, Chinstrap & Gentoo

139/240

How to add color using palette? ðŸŽĻ

140/240

ðŸŽĻ Color Palette

  • R package RColorBrewer & wesanderson

141/240
library(RColorBrewer)
ggplot(data = penguins,
mapping = aes(x = species,
fill = species)) +
geom_bar() +
scale_fill_brewer(palette = "Dark2")

142/240

How to remove legend or change its position?

143/240
ggplot(data = penguins,
mapping = aes(x = species,
fill = species)) +
geom_bar() +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none") # top, bottom, left

144/240

How to plot title and axis titles?

145/240
ggplot(data = penguins,
mapping = aes(x = species,
fill = species)) +
geom_bar() +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none") +
labs(
title = "Species of palmer penguins",
subtitle = "This data is about penguins",
x = "Species",
y = "Frequency"
)

146/240

How to control size of text?

147/240
ggplot(data = penguins,
mapping = aes(x = species,
fill = species)) +
geom_bar() +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none",
text = element_text(size = 20)) +
labs(
title = "Species of palmer penguins",
subtitle = "This data is about penguins",
x = "Species",
y = "Frequency"
)

148/240

How to plot two numeric variables?

149/240
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = bill_depth_mm,
color = species)) +
geom_point() +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none",
text = element_text(size = 20)) +
labs(
title = "Relationship between bill length \n& depth of palmer penguins",
subtitle = "This data is about penguins",
x = "Bill length (mm)",
y = "Bith depth (mm)"
)

150/240

How to add themes to ggplot?

151/240

ggplot2 themes

https://ggplot2.tidyverse.org/reference/ggtheme.html

  • theme_gray()

  • theme_bw()

  • theme_linedraw()

  • theme_light()

  • theme_dark()

  • theme_minimal()

  • theme_classic()

  • theme_void()

  • theme_test()

152/240
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = bill_depth_mm,
color = species)) +
geom_point() +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none",
text = element_text(size = 20)) +
labs(
title = "Relationship between bill length \n& depth of palmer penguins",
subtitle = "This data is about penguins",
x = "Bill length (mm)",
y = "Bith depth (mm)"
) +
theme_bw()

153/240
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = bill_depth_mm,
color = species)) +
geom_point() +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "none",
text = element_text(size = 20)) +
labs(
title = "Relationship between bill length \n& depth of palmer penguins",
subtitle = "This data is about penguins",
x = "Bill length (mm)",
y = "Bith depth (mm)"
) +
theme_classic()

154/240

How to add regression line to ggplot?

155/240
ggplot(data = penguins,
mapping = aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point() +
theme(legend.position = "none",
text = element_text(size = 24)) +
labs(
title = "Relationship between bill length \n& depth of palmer penguins",
subtitle = "This data is about penguins",
x = "Flipper length (mm)",
y = "Body mass (gm)"
) +
theme_classic() +
geom_smooth()

156/240

More resources

157/240

ðŸ™‹ðŸ―â€â™€ïļðŸ™‹â€â™‚ïļ
Q&A

158/240

Dynamic Wrangling
Using dplyr

Next Module - 4

159/240

Course Progress

163/240

What is Data wrangling?

164/240

What is Data wrangling?

164/240

What is Data wrangling?

164/240

What is Data wrangling?

164/240

"Transforming" data means:

  • "narrowing in on observations of interest ...
165/240

"Transforming" data means:

  • "narrowing in on observations of interest ...

  • creating new variables that are functions of existing variables ... and

165/240

"Transforming" data means:

  • "narrowing in on observations of interest ...

  • creating new variables that are functions of existing variables ... and

  • calculating a set of summary statistics."

Source

165/240

dplyr package

  • "dplyr is a grammar of data manipulation"
167/240

dplyr package

  • "dplyr is a grammar of data manipulation"

  • "providing a consistent set of verbs that help you solve the most common data manipulation challenges:"

167/240

dplyr package

  • "dplyr is a grammar of data manipulation"

  • "providing a consistent set of verbs that help you solve the most common data manipulation challenges:"

  • Few important functions:

    • filter()
    • select()
    • mutate()
    • arrange()
    • summarise()
167/240

filter() function:

  • Picks cases based on their values.

168/240

How to have a data of only Gentoo penguins?

169/240
# there are three species: Chinstrap, Gentoo, Adelie
penguins %>%
filter(species == "Gentoo")
## # A tibble: 124 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Gentoo Biscoe 46.1
## 2 Gentoo Biscoe 50
## 3 Gentoo Biscoe 48.7
## 4 Gentoo Biscoe 50
## 5 Gentoo Biscoe 47.6
## 6 Gentoo Biscoe 46.5
## 7 Gentoo Biscoe 45.4
## 8 Gentoo Biscoe 46.7
## 9 Gentoo Biscoe 43.3
## 10 Gentoo Biscoe 46.8
## # â€Ķ with 114 more rows, and 5 more
## # variables: bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
170/240
# there are three species: Chinstrap, Gentoo, Adelie
praw <- read_csv("data/gentoo-penguins1.csv")
praw %>%
filter(species == "Gentoo") %>%
summary() %>%
kableExtra::kable()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Length:124 Length:124 Min. :40.90 Min. :13.10 Min. :203.0 Min. :3950 Length:124 Min. :2007
Class :character Class :character 1st Qu.:45.30 1st Qu.:14.20 1st Qu.:212.0 1st Qu.:4500 Class :character 1st Qu.:2007
Mode :character Mode :character Median :47.30 Median :15.00 Median :216.0 Median :4925 Mode :character Median :2008
NA NA Mean :47.50 Mean :14.98 Mean :217.2 Mean :4985 NA Mean :2008
NA NA 3rd Qu.:49.55 3rd Qu.:15.70 3rd Qu.:221.0 3rd Qu.:5400 NA 3rd Qu.:2009
NA NA Max. :59.60 Max. :17.30 Max. :231.0 Max. :6050 NA Max. :2009
NA NA NA's :1 NA's :1 NA's :1 NA's :1 NA NA
171/240

How to export data file to your computer?

172/240
# three species are Chinstrap, Gentoo, Adelie
penguins %>%
filter(species == "Gentoo") %>%
write_csv("data/gentoo-penguins.csv")
173/240

✋ WAIT! What is %>%

174/240

✋ WAIT! What is %>%

  • this is called pipe ( %>% = control + shift + m)
174/240

✋ WAIT! What is %>%

  • this is called pipe ( %>% = control + shift + m)

  • "a powerful tool for clearly expressing a sequence of multiple operations"

174/240

✋ WAIT! What is %>%

  • this is called pipe ( %>% = control + shift + m)

  • "a powerful tool for clearly expressing a sequence of multiple operations"

  • interpret/read it as then.

penguins %>%
filter(species == "Gentoo") %>%
summary() %>%
kableExtra::kable()
174/240

Comparison: Relational Operators

x < y

175/240

Comparison: Relational Operators

x < y

x > y

175/240

Comparison: Relational Operators

x < y

x > y

x <= y

175/240

Comparison: Relational Operators

x < y

x > y

x <= y

x >= y

175/240

Comparison: Relational Operators

x < y

x > y

x <= y

x >= y

x == y (equal)

175/240

Comparison: Relational Operators

x < y

x > y

x <= y

x >= y

x == y (equal)

x != y (not equal)

175/240

How to have a data of penguins with bill length more than 43 mm?

176/240
penguins %>%
filter(bill_length_mm > 43)
## # A tibble: 188 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Torgersen 46
## 2 Adelie Dream 44.1
## 3 Adelie Torgersen 45.8
## 4 Adelie Dream 43.2
## 5 Adelie Biscoe 43.2
## 6 Adelie Biscoe 45.6
## 7 Adelie Torgersen 44.1
## 8 Adelie Torgersen 43.1
## 9 Gentoo Biscoe 46.1
## 10 Gentoo Biscoe 50
## # â€Ķ with 178 more rows, and 5 more
## # variables: bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
177/240

How to have a data of Gentoo penguins with bill length more than 50 mm?

178/240
penguins %>%
filter(species == "Gentoo",
bill_length_mm > 55)
## # A tibble: 3 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Gentoo Biscoe 59.6
## 2 Gentoo Biscoe 55.9
## 3 Gentoo Biscoe 55.1
## # â€Ķ with 5 more variables:
## # bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
179/240

How to have data of non-Gentoo penguins with bill length more than 45 mm and weight more than 4 kg?

180/240
penguins %>%
filter(species != "Gentoo",
bill_length_mm > 45,
body_mass_g > 4000)
## # A tibble: 18 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Torgersen 46
## 2 Adelie Torgersen 45.8
## 3 Adelie Biscoe 45.6
## 4 Chinstrap Dream 46
## 5 Chinstrap Dream 52
## 6 Chinstrap Dream 50.5
## 7 Chinstrap Dream 49.2
## 8 Chinstrap Dream 52
## 9 Chinstrap Dream 52.8
## 10 Chinstrap Dream 54.2
## 11 Chinstrap Dream 51
## 12 Chinstrap Dream 52
## 13 Chinstrap Dream 53.5
## 14 Chinstrap Dream 50.8
## 15 Chinstrap Dream 49
## 16 Chinstrap Dream 50.7
## 17 Chinstrap Dream 49.3
## 18 Chinstrap Dream 50.8
## # â€Ķ with 5 more variables:
## # bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
181/240

How to have only top or bottom rows from data?

182/240
penguins %>%
filter(species != "Gentoo",
bill_length_mm > 45,
body_mass_g > 4000) %>%
head()
## # A tibble: 6 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Torgersen 46
## 2 Adelie Torgersen 45.8
## 3 Adelie Biscoe 45.6
## 4 Chinstrap Dream 46
## 5 Chinstrap Dream 52
## 6 Chinstrap Dream 50.5
## # â€Ķ with 5 more variables:
## # bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
183/240
penguins %>%
filter(species != "Gentoo",
bill_length_mm > 45,
body_mass_g > 4000) %>%
tail(3)
## # A tibble: 3 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Chinstrap Dream 50.7
## 2 Chinstrap Dream 49.3
## 3 Chinstrap Dream 50.8
## # â€Ķ with 5 more variables:
## # bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
184/240

select() function: Chooses rows based on column values.

186/240

How to have only species variable in data?

187/240
penguins %>%
select(species)
## # A tibble: 344 × 1
## species
## <fct>
## 1 Adelie
## 2 Adelie
## 3 Adelie
## 4 Adelie
## 5 Adelie
## 6 Adelie
## 7 Adelie
## 8 Adelie
## 9 Adelie
## 10 Adelie
## # â€Ķ with 334 more rows
188/240

How to have a specific range of variables in data?

189/240
penguins %>%
select(species : bill_depth_mm)
## # A tibble: 344 × 4
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Torgersen 39.1
## 2 Adelie Torgersen 39.5
## 3 Adelie Torgersen 40.3
## 4 Adelie Torgersen NA
## 5 Adelie Torgersen 36.7
## 6 Adelie Torgersen 39.3
## 7 Adelie Torgersen 38.9
## 8 Adelie Torgersen 39.2
## 9 Adelie Torgersen 34.1
## 10 Adelie Torgersen 42
## # â€Ķ with 334 more rows, and 1 more
## # variable: bill_depth_mm <dbl>
190/240

How to have variables based upon their location in data?

191/240
penguins %>%
select(4:8)
## # A tibble: 344 × 5
## bill_depth_mm flipper_length_mm
## <dbl> <int>
## 1 18.7 181
## 2 17.4 186
## 3 18 195
## 4 NA NA
## 5 19.3 193
## 6 20.6 190
## 7 17.8 181
## 8 19.6 195
## 9 18.1 193
## 10 20.2 190
## # â€Ķ with 334 more rows, and 3 more
## # variables: body_mass_g <int>,
## # sex <fct>, year <int>
192/240

How to have specific variables in data?

193/240
penguins %>%
select(species, body_mass_g, year)
## # A tibble: 344 × 3
## species body_mass_g year
## <fct> <int> <int>
## 1 Adelie 3750 2007
## 2 Adelie 3800 2007
## 3 Adelie 3250 2007
## 4 Adelie NA 2007
## 5 Adelie 3450 2007
## 6 Adelie 3650 2007
## 7 Adelie 3625 2007
## 8 Adelie 4675 2007
## 9 Adelie 3475 2007
## 10 Adelie 4250 2007
## # â€Ķ with 334 more rows
194/240
penguins %>%
select(-c(species, body_mass_g, year))
## # A tibble: 344 × 5
## island bill_length_mm bill_depth_mm
## <fct> <dbl> <dbl>
## 1 Torgersen 39.1 18.7
## 2 Torgersen 39.5 17.4
## 3 Torgersen 40.3 18
## 4 Torgersen NA NA
## 5 Torgersen 36.7 19.3
## 6 Torgersen 39.3 20.6
## 7 Torgersen 38.9 17.8
## 8 Torgersen 39.2 19.6
## 9 Torgersen 34.1 18.1
## 10 Torgersen 42 20.2
## # â€Ķ with 334 more rows, and 2 more
## # variables: flipper_length_mm <int>,
## # sex <fct>
195/240

mutate() function: Adds new variables that are functions of existing variables

196/240

How to convert penguin body mass from grams to kilograms?

197/240
penguins %>%
mutate(body_mass_kg = body_mass_g / 1000)
## # A tibble: 344 × 9
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Torgersen 39.1
## 2 Adelie Torgersen 39.5
## 3 Adelie Torgersen 40.3
## 4 Adelie Torgersen NA
## 5 Adelie Torgersen 36.7
## 6 Adelie Torgersen 39.3
## 7 Adelie Torgersen 38.9
## 8 Adelie Torgersen 39.2
## 9 Adelie Torgersen 34.1
## 10 Adelie Torgersen 42
## # â€Ķ with 334 more rows, and 6 more
## # variables: bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>, body_mass_kg <dbl>
198/240
penguins %>%
select(body_mass_g) %>%
mutate(body_mass_kg = body_mass_g / 1000)
## # A tibble: 344 × 2
## body_mass_g body_mass_kg
## <int> <dbl>
## 1 3750 3.75
## 2 3800 3.8
## 3 3250 3.25
## 4 NA NA
## 5 3450 3.45
## 6 3650 3.65
## 7 3625 3.62
## 8 4675 4.68
## 9 3475 3.48
## 10 4250 4.25
## # â€Ķ with 334 more rows
199/240
penguins %>%
mutate(body_mass_kg = body_mass_g / 1000,
bill = bill_length_mm * bill_depth_mm)
## # A tibble: 344 × 10
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Torgersen 39.1
## 2 Adelie Torgersen 39.5
## 3 Adelie Torgersen 40.3
## 4 Adelie Torgersen NA
## 5 Adelie Torgersen 36.7
## 6 Adelie Torgersen 39.3
## 7 Adelie Torgersen 38.9
## 8 Adelie Torgersen 39.2
## 9 Adelie Torgersen 34.1
## 10 Adelie Torgersen 42
## # â€Ķ with 334 more rows, and 7 more
## # variables: bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>, body_mass_kg <dbl>,
## # bill <dbl>
200/240
penguins %>%
mutate(body_mass_kg = body_mass_g / 1000,
bill = bill_length_mm * bill_depth_mm) %>%
select(body_mass_kg,
bill)
## # A tibble: 344 × 2
## body_mass_kg bill
## <dbl> <dbl>
## 1 3.75 731.
## 2 3.8 687.
## 3 3.25 725.
## 4 NA NA
## 5 3.45 708.
## 6 3.65 810.
## 7 3.62 692.
## 8 4.68 768.
## 9 3.48 617.
## 10 4.25 848.
## # â€Ķ with 334 more rows
201/240

arrange() function: Changes the order of the rows.

202/240

How to have data arranged by the ascending order of bill length of penguins?

203/240
penguins %>%
arrange(bill_length_mm)
## # A tibble: 344 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Dream 32.1
## 2 Adelie Dream 33.1
## 3 Adelie Torgersen 33.5
## 4 Adelie Dream 34
## 5 Adelie Torgersen 34.1
## 6 Adelie Torgersen 34.4
## 7 Adelie Biscoe 34.5
## 8 Adelie Torgersen 34.6
## 9 Adelie Torgersen 34.6
## 10 Adelie Biscoe 35
## # â€Ķ with 334 more rows, and 5 more
## # variables: bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
204/240
penguins %>%
arrange(desc(bill_length_mm))
## # A tibble: 344 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Gentoo Biscoe 59.6
## 2 Chinstrap Dream 58
## 3 Gentoo Biscoe 55.9
## 4 Chinstrap Dream 55.8
## 5 Gentoo Biscoe 55.1
## 6 Gentoo Biscoe 54.3
## 7 Chinstrap Dream 54.2
## 8 Chinstrap Dream 53.5
## 9 Gentoo Biscoe 53.4
## 10 Chinstrap Dream 52.8
## # â€Ķ with 334 more rows, and 5 more
## # variables: bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
205/240
penguins %>%
arrange(species)
## # A tibble: 344 × 8
## species island bill_length_mm
## <fct> <fct> <dbl>
## 1 Adelie Torgersen 39.1
## 2 Adelie Torgersen 39.5
## 3 Adelie Torgersen 40.3
## 4 Adelie Torgersen NA
## 5 Adelie Torgersen 36.7
## 6 Adelie Torgersen 39.3
## 7 Adelie Torgersen 38.9
## 8 Adelie Torgersen 39.2
## 9 Adelie Torgersen 34.1
## 10 Adelie Torgersen 42
## # â€Ķ with 334 more rows, and 5 more
## # variables: bill_depth_mm <dbl>,
## # flipper_length_mm <int>,
## # body_mass_g <int>, sex <fct>,
## # year <int>
206/240

summarise() function

207/240

summarise() function: Chooses rows based on column values.

208/240

How to find mean bill length of all penguins?

209/240
penguins %>%
drop_na() %>%
summarise(mean_bill_length_mm = mean(bill_length_mm))
## # A tibble: 1 × 1
## mean_bill_length_mm
## <dbl>
## 1 44.0
210/240

How to find species-wise mean bill length of penguins?

211/240
penguins %>%
drop_na() %>%
group_by(species) %>%
summarise(mean_bill_length_mm = mean(bill_length_mm))
## # A tibble: 3 × 2
## species mean_bill_length_mm
## <fct> <dbl>
## 1 Adelie 38.8
## 2 Chinstrap 48.8
## 3 Gentoo 47.6
212/240

How to find species-wise mean bill length of penguins and total number of penguins in each species?

213/240
penguins %>%
drop_na() %>%
group_by(species) %>%
summarise(mean_bill_length_mm = mean(bill_length_mm), n = n())
## # A tibble: 3 × 3
## species mean_bill_length_mm n
## <fct> <dbl> <int>
## 1 Adelie 38.8 146
## 2 Chinstrap 48.8 68
## 3 Gentoo 47.6 119
214/240

ðŸ™‹ðŸ―â€â™€ïļðŸ™‹â€â™‚ïļ
Q&A

215/240

Slide Crafting
using xaringan

Next Module - 5

216/240
219/240

xaringan

  • xaringan package to be a Presentation Ninja ðŸĪš
221/240

xaringan

  • xaringan package to be a Presentation Ninja ðŸĪš

  • "for creating slideshows with remark.js through R Markdown"

221/240

xaringan

  • xaringan package to be a Presentation Ninja ðŸĪš

  • "for creating slideshows with remark.js through R Markdown"

  • Xie Yihui

221/240

Packages required:

library(palmerpenguins) # to access penguin data
library(xaringan)
library(xaringanthemer)
library(xaringanExtra)
222/240

File âŸķ New File âŸķ R Markdown

223/240

Template → Ninja Presentation

224/240

Save this Rmd file

225/240

Addins → Inifinite Moon Reader

226/240

Addins → Inifinite Moon Reader

xaringan output

226/240

Addins → Inifinite Moon Reader

xaringan slide → browser

227/240

Addins → Inifinite Moon Reader

xaringan slide → browser

  • We need to click Inifinite Moon Reader only to start the slideshow. To see the changes made in the slides just save the document ctrl + s
227/240

Using xaringan how to:

228/240

Using xaringan how to:

  1. create a new slide
228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

  4. include images

228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

  4. include images

    • as background
228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

  4. include images

    • as background
    • as part of slide
228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

  4. include images

    • as background
    • as part of slide
  5. make plots

228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

  4. include images

    • as background
    • as part of slide
  5. make plots

  6. include tables

228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

  4. include images

    • as background
    • as part of slide
  5. make plots

  6. include tables

  7. in-text R output

228/240

Using xaringan how to:

  1. create a new slide

  2. hide an existing slide

  3. heading, subheadings, points and normal text

  4. include images

    • as background
    • as part of slide
  5. make plots

  6. include tables

  7. in-text R output

  8. create columns

228/240
  1. Use --- to create a new slide
229/240
  1. Use --- to create a new slide

  2. exclude:true To hide an existing slide

229/240
  1. Use --- to create a new slide

  2. exclude:true To hide an existing slide

  3. Slide text sizes:

229/240
  1. Use --- to create a new slide

  2. exclude:true To hide an existing slide

  3. Slide text sizes:

    • # for main heading
229/240
  1. Use --- to create a new slide

  2. exclude:true To hide an existing slide

  3. Slide text sizes:

    • # for main heading

    • ## for sub-heading

229/240
  1. Use --- to create a new slide

  2. exclude:true To hide an existing slide

  3. Slide text sizes:

    • # for main heading

    • ## for sub-heading

    • #### for sub-sub-heading

229/240
  1. Use --- to create a new slide

  2. exclude:true To hide an existing slide

  3. Slide text sizes:

    • # for main heading

    • ## for sub-heading

    • #### for sub-sub-heading

    • indented * for sub-point1
    • indented * for sub-point2
    • indented * for sub-point3
229/240
  1. Use --- to create a new slide

  2. exclude:true To hide an existing slide

  3. Slide text sizes:

    • # for main heading

    • ## for sub-heading

    • #### for sub-sub-heading

    • indented * for sub-point1
    • indented * for sub-point2
    • indented * for sub-point3
  • - for normal text size
229/240

To include images using:

CSS background option:

230/240

To include images using:

CSS background option:

  • background-image: url("path of the image") = path of the image
230/240

To include images using:

CSS background option:

  • background-image: url("path of the image") = path of the image

  • background-size: contain, cover, 50%, 70% = size of the image

230/240

To include images using:

CSS background option:

  • background-image: url("path of the image") = path of the image

  • background-size: contain, cover, 50%, 70% = size of the image

  • background-position: left top = position of the image

230/240

To include images using:

knitr chunk option:

knitr::include_graphics("path of the image")
231/240

To include plots

library(palmerpenguins)
ggplot(penguins, aes(x = species)) +
geom_bar()
232/240

To include tables

library(kableExtra)
library(tidyverse)
penguins %>%
drop_na() %>%
head() %>%
kable()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
Adelie Torgersen 38.9 17.8 181 3625 female 2007
233/240

In-text R output

  • penguins data have a sample of n = 344 on total 8 variables.
234/240

In-text R output

  • penguins data have a sample of n = 344 on total 8 variables.

  • math expressions

a+b=σ−∑x22

234/240

Column division of slide

- left column

  • a

    • b
    • c

- right column

  • apple
  • boy
  • cat
235/240

Slide class

  • class can be assigned to each slide
236/240

Slide class

  • class can be assigned to each slide

  • it decides how all elements of one particular slide will look like

236/240

Slide class

  • class can be assigned to each slide

  • it decides how all elements of one particular slide will look like

  • class: center

237/240

Slide class

  • class can be assigned to each slide

  • it decides how all elements of one particular slide will look like

  • class: center, middle, inverse, right

238/240

Extend the power of xaringan:

239/240

Extend the power of xaringan:

239/240

Extend the power of xaringan:

239/240

Extend the power of xaringan:

239/240

Dr Ajay Kumar Koli
Head & Educator
School of Information & Data Science
Nalanda Academy - Wardha
@ajay_kolii
koliajaykumar@gmail.com
https://koliajay.netlify.app/





Hello! 😊

2/240
Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k Go to previous slide
↓, →, Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Alt + fFit Slides to Screen
Esc Back to slideshow