Introduction to Statistics with R

9.6 Confidence Intervals and Hypothesis Tests

Learning Objectives

Practice calculating confidence intervals.
Practice performing hypothesis tests.
Practice calculating the power of a hypothesis test.

Useful Functions

Use BSDA::z.test()$conf.int to calculate confidence intervals using the Z-statistic.
Use t.test()$conf.int to calculate confidence intervals using the t-statistic,
Use asbio::power.z.test() to calculate the power of a Z-test versus a specific alternative.

Dataset: Weather Data

In this assignment we will look at 4 years of monthly weather data. Download the dataset or copy the dataset’s URL here: weatherData_lab6.csv.

Note: This code as written will only work if you download the dataset to your working directory. You can also replace weatherData_lab6.csv with the URL above.

weatherData <- readr::read_csv('weatherData_lab6.csv')
weatherData

## # A tibble: 54 × 6
##        DATE  Year Month RainDays MaxTemp MinTemp
##       <int> <int> <int>    <int>   <int>   <int>
## 1  19940301  1994     3        0     133      91
## 2  19940401  1994     4        0     159     118
## 3  19940501  1994     5        0     198     150
## 4  19940601  1994     6        0     210     159
## 5  19940701  1994     7        0     260     205
## 6  19940801  1994     8        0     255     200
## 7  19940901  1994     9        0     221     180
## 8  19941001  1994    10        0     163     124
## 9  19941101  1994    11        1      93      61
## 10 19941201  1994    12        4      87      58
## # ... with 44 more rows

For most of this assignment, we will focus on the MaxTemp variable. This represents the maximum temperature observed each month. Note that the MaxTemp data are given in degrees Celsius x 10, so we scale it back down to degrees Celsius by dividing by 10 below.

MaxTemp <- weatherData$MaxTemp / 10
Month <- weatherData$Month

Let us look at the data first.

par(mfrow = c(1, 3), omi = c(0, 0, .3, 0))
boxplot(MaxTemp, ylab = "Maximum Temperature", main = "Box plot")
hist(MaxTemp, freq = FALSE, col = "steelblue",
     xlab = "Maximum Temperature", main = "Histogram")
mtext("Maximum Temperature Distribution", outer = TRUE, side = 3, line = -1)
plot(Month, MaxTemp, col = rgb(0, 0.4, 0, 0.5), pch = 16,
     xlab = "Month Number in Year", ylab = "Maximum Temperature", main="Time Trend")

summary(MaxTemp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.30   11.92   15.90   16.90   21.95   26.90

9.6.1 Confidence intervals and hypothesis tests for means

How many observations are in your sample?

Calculate a 95% confidence interval for the population mean of the MaxTemp variable $\mu$ using a t-interval.

What is the lower bound of this t-interval?
What is the upper bound of this t-interval?

Calculate a 95% confidence interval for the population mean of the MaxTemp variable $\mu$ using a Z-interval, assuming that the population standard deviation is the same as the sample standard deviation.

What is the lower bound of this Z-interval?
What is the upper bound of this Z-interval?

Calculate a 95% confidence interval for the population mean of the MaxTemp variable $\mu$ using the empirical rule (recall the empirical rule tells us that 95% probability is within 2 standard deviations of the mean of a Normal distribution).

What is the lower bound of this empirical interval?
What is the upper bound of this empirical interval?
Which one of the following statements are correct about the two intervals you have calculated?
- The t-interval should not be used since the observations are not normally or symmetrically distributed.
- The Z-interval should not be used since the observations are not normally distributed.
- The Z-interval and t-interval are very close, the resulting confidence intervals are nearly identical.
- Neither should be used since the sample distribution of max temp is bimodal.
Find and report the test statistic you would use to conduct a Z-test at level 0.05 for $H_0: \mu = 15$ versus $H_A: \mu > 15$. (Here, assume that the sample standard deviation is equal to the population standard deviation).
What is the one-tailed p-value for the test statistic in the N(0,1) distribution? (round to three decimal places)
What decision would you make based on the Z-test if you were willing to accept a type 1 error rate of 5%?
- Accept the null hypothesis.
- Reject the null hypothesis.
- The test is wrong.
- The test is indecisive.
Find and report the test statistic you would use to conduct a t-test at level 0.05 for $H_0: \mu = 15$ versus $H_A: \mu > 15$.
What is the one-tailed p-value for the test statistic in the t-distribution. (round to three decimal places)
What decision would you make based on the t-test if you were willing to accept a type 1 error rate of 5%?
- Accept the null hypothesis.
- Reject the null hypothesis.
- The test is wrong.
- The test is indecisive.
Which test should be used?
- t-test should not be used since sample size is not large enough.
- Z-test should not be used since the observations are not normal.
- p-values are very close, the results of both tests will be almost identical.
- Neither should be used since the observations are not normal.
What is the Z-value that defines the upper bound of the Z-interval in question 1.5?
What is the critical Z-value used for the one-tailed Z-test in question 1.9?
Referring back to your test statistic in 1.9, what can you conclude by comparing the results of the 95% Z-interval and the level 0.05 Z-test about whether the true mean is equal to 15?
- The Z-interval is incorrect since it contains the mean but the test rejects the null hypothesis.
- The Z-test is incorrect because it rejects the null hypothesis even though the Z-interval contains the mean.
- Both of them are correct but they give different results because Z-intervals are two sided and symmetric, while the Z-test here is one sided.
- None of the above.

9.6.2 Power Calculations

Now suppose we want to find the power of a hypothesis test against a specific alternative hypothesis. For this problem, assume the population standard deviation $\sigma$ is same as the sample standard deviation sd(MaxTemp), so we can use a Z-test.

Consider testing $H_0:\mu=15$ versus $H_a:\mu>15$ at level 0.05.

Calculate the power of the Z-test against $H_A:\mu=16$.
Now suppose you want to test against a different alternative: $H_A:\mu=18$. Calculate the power of the Z-test against this alternative.
Finally, suppose your sample size is reduced to 15 observations, but the sample standard deviation, sd(MaxTemp), is unchanged. Calculate the power of the test against $H_A:\mu=16$ for this smaller sample.
Which one of the following is correct:
- Power increases with sample size.
- Power increases as the difference between the truth and the null increases.
- Both larger sample size and larger differences between the truth and the null increase the power of a test.
- None of the above.