Introduction to Statistics with R

9.1 Running Code and Outliers

Learning Objectives

Review simulation

Useful Functions

Use abs() to calculate absolute values.
Use mean() to calculate sample means.
Use round() to round your answers.
Use sd() to calculate standard deviations.

Introduction

The goal of this lab is to perform some simple calculations in R to get a sense of how outliers influence sample statistics. Work through this code, make sure you understand it, and use it to answer the questions for Lab Assignment 0.

First, because R uses a pseudorandom number generator we use set.seed() to set a seed so everyone gets the same results (this is important for getting the correct answer). Next, we generate random samples of size 10, 100, and 1000 from a bell-shaped distribution called the Normal distribution.

set.seed(5)
x10 <- rnorm(10)
x100 <- rnorm(100)
x1000 <- rnorm(1000)

Important: if you alter any of these variables, re-run this section of code in its entirety so you continue to get results that match the solutions.

We create variables that contain our random vectors above as well as an outlier (with value 10)

x11out <- c(x10, 10)
x101out <- c(x100, 10)
x1001out <- c(x1000, 10)

For this assignment, we supply you with code to answer the questions. You will use R to execute the code and provide the answers.

9.1.1 Sample means

Round all answers to 2 decimal places.

Calculate the mean of the sample of size 10.

round(mean(x10), 2)

Calculate the mean of the sample of size 100.

round(mean(x100), 2)

Calculate the mean of the sample of size 1000.

round(mean(x1000), 2)

Calculate the mean of the sample of size 11 (including 1 outlier).

round(mean(x11out), 2)

Calculate the mean of the sample of size 101 (including 1 outlier).

round(mean(x101out), 2)

Calculate the mean of the sample of size 1001 (including 1 outlier).

round(mean(x1001out), 2)

9.1.2 Absolute differences in sample means

Round all answers to 2 decimal places.

Calculate the absolute difference in the sample means between the sample of size 10 and the sample of size 11 (including 1 outlier).

round(abs(mean(x10) - mean(x11out)), 2)

Calculate the absolute difference in the sample means between the sample of size 100 and the sample of size 101 (including 1 outlier).

round(abs(mean(x100) - mean(x101out)), 2)

Calculate the absolute difference in the sample means between the sample of size 1000 and the sample of size 1001 (including 1 outlier).

round(abs(mean(x1000) - mean(x1001out)), 2)

9.1.3 Standard deviations

Round all answers to 2 decimal places.

Calculate the standard deviation of the sample of size 10.

round(sd(x10), 2)

Calculate the standard deviation of the sample of size 100.

round(sd(x100), 2)

Calculate the standard deviation of the sample of size 1000.

round(sd(x1000), 2)

Calculate the standard deviation of the sample of size 11 (including 1 outlier).

round(sd(x11out), 2)

Calculate the standard deviation of the sample of size 101 (including 1 outlier).

round(sd(x101out), 2)

Calculate the standard deviation of the sample of size 1001 (including 1 outlier).

round(sd(x1001out), 2)

9.1.4 Absolute differences in standard deviations

Round all answers to 2 decimal places.

Calculate the absolute difference in the sample means between the sample of size 10 and the sample of size 11 (including 1 outlier).

round(abs(sd(x10) - sd(x11out)), 2)

Calculate the absolute difference in the sample means between the sample of size 100 and the sample of size 101 (including 1 outlier).

round(abs(sd(x100) - sd(x101out)), 2)

Calculate the absolute difference in the sample means between the sample of size 1000 and the sample of size 1001 (including 1 outlier).

round(abs(sd(x1000) - sd(x1001out)), 2)

9.1.5 Effect of the outliers

What impact does an outlier have?
- There is no impact on the means, only on the standard deviations
- There is no impact on the standard deviations, only on the means
- There is an impact on both means and standard deviations that increases with sample size
- There is an impact on both means and standard deviations that decreases with sample size
- None of these