forked from TrevorFrench/R-for-Data-Analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
p3exercises.qmd
210 lines (156 loc) · 4.26 KB
/
p3exercises.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# Exercises {.unnumbered}
## Questions
::: callout
## Exercise: 11-A
Create a dataframe named "df" which is equal to the first three columns and the first five rows of the "mtcars" dataset. Next, rename the "mpg" column to "miles_per_gallon".
After printing the resulting dataframe to the console you should have the following results:
``` r
# miles_per_gallon cyl disp
# Mazda RX4 21.0 6 160
# Mazda RX4 Wag 21.0 6 160
# Datsun 710 22.8 4 108
# Hornet 4 Drive 21.4 6 258
# Hornet Sportabout 18.7 8 360
```
:::
::: callout
## Exercise: 12-A
You are given the following dataframe:
``` r
var_1 <- c(3, 4, 2, 9, NA, 2, 7)
var_2 <- c(8, NA, 6, 4, 8, 5, 5)
df <- data.frame(var_1 = var_1, var_2 = var_2)
print(df)
# var_1 var_2
# 1 3 8
# 2 4 NA
# 3 2 6
# 4 9 4
# 5 NA 8
# 6 2 5
# 7 7 5
```
Create a new dataframe called "cleaned_df" which is equal to "df" except with both rows which contain "NA" values removed.
The final output of "cleaned_df" should look like this:
``` r
# var_1 var_2
# 1 3 8
# 3 2 6
# 4 9 4
# 6 2 5
# 7 7 5
```
:::
::: callout
## Exercise: 12-B
Take the original "df" dataframe from exercise 12-A and apply a constant value of "5" to each "NA" value. Store this new dataframe in a variable named "constant_value".
Your final output after printing "constant_value" to the console should look like this:
``` r
print(constant_value)
# var_1 var_2
# 1 3 8
# 2 4 5
# 3 2 6
# 4 9 4
# 5 5 8
# 6 2 5
# 7 7 5
```
:::
::: callout
## Exercise: 12-C
Take the same "df" dataframe from exercises 12-A and 12-B and apply an average value of each column to "NA" values in each respective column. Store this new dataframe in a variable named "mean_value".
Your final output after printing "mean_value" to the console should look like this:
``` r
print(mean_value)
# var_1 var_2
# 1 3.0 8
# 2 4.0 6
# 3 2.0 6
# 4 9.0 4
# 5 4.5 8
# 6 2.0 5
# 7 7.0 5
```
:::
::: callout
## Exercise: 13-A
Use the "Nile" dataset to create a histogram to view the distribution of it's data.
:::
::: callout
## Exercise: 14-A
Take the dataframe created in exercise 11-A and drop any row where the "disp" column is equal to "160".
You should receive the following results when you print the resulting dataframe to the console.
``` r
# miles_per_gallon cyl disp
# Datsun 710 22.8 4 108
# Hornet 4 Drive 21.4 6 258
# Hornet Sportabout 18.7 8 360
```
:::
## Answers
::: callout
## Answer: 11-A
This task could be accomplished in the following way:
```{r}
library(dplyr)
df <- mtcars[1:5, 1:3]
df <- rename(df, "miles_per_gallon" = "mpg")
print(df)
```
:::
::: callout
## Answer: 12-A
This task could be accomplished in the following way:
```{r}
var_1 <- c(3, 4, 2, 9, NA, 2, 7)
var_2 <- c(8, NA, 6, 4, 8, 5, 5)
df <- data.frame(var_1 = var_1, var_2 = var_2)
cleaned_df <- na.omit(df)
print(cleaned_df)
```
:::
::: callout
## Answer: 12-B
There are several ways this task could be accomplished; however, the following example demonstrates one way to do it.
```{r}
var_1 <- c(3, 4, 2, 9, NA, 2, 7)
var_2 <- c(8, NA, 6, 4, 8, 5, 5)
df <- data.frame(var_1 = var_1, var_2 = var_2)
constant_value <- df
constant_value[is.na(constant_value)] <- 5
print(constant_value)
```
:::
::: callout
## Answer: 12-C
There are several ways this task could be accomplished; however, the following example demonstrates one way to do it.
```{r}
var_1 <- c(3, 4, 2, 9, NA, 2, 7)
var_2 <- c(8, NA, 6, 4, 8, 5, 5)
df <- data.frame(var_1 = var_1, var_2 = var_2)
mean_1 <- mean(df$var_1[!is.na(df$var_1)])
mean_2 <- mean(df$var_2[!is.na(df$var_2)])
mean_value <- df
mean_value$var_1[is.na(mean_value$var_1)] <- mean_1
mean_value$var_2[is.na(mean_value$var_2)] <- mean_2
print(mean_value)
```
:::
::: callout
## Answer: 13-A
```{r}
hist(Nile)
```
:::
::: callout
## Answer: 14-A
This task could be accomplished in the following way:
```{r}
library(dplyr)
df <- mtcars[1:5, 1:3]
df <- rename(df, "miles_per_gallon" = "mpg")
df <- filter(df, disp != 160)
print(df)
```
:::