TLDW logo

Review for Test 3

By Math with Mona

Summary

## Key takeaways - **Method of Moments: Match Sample Mean**: Set first sample moment equal to first population moment; for E[X] = θ̂ and x̄ = 8, solve to estimate θ̂ = 16. [02:51], [05:53] - **MLE for Exponential: Lambda = n / Sum**: For data 2,1,3,4 from exponential, maximize likelihood after log: λ̂ = 4 / 10 = 0.4. [09:30], [14:32] - **95% CI Mean Known Sigma: 20.4 ± 1.86**: With x̄=20.4, n=40, σ=6, z_{0.025}=1.96, interval is 18.54 to 22.26. [17:17], [20:36] - **T-Interval Small Sample: 50 ± 3.29**: For n=6, x̄=50, s=4, 90% t_5=2.015, interval is 46.71 to 53.29. [28:15], [30:59] - **Z-Test Processing Time: Reject H0**: Sample mean 104 vs μ0=100, σ=12, n=50 gives z=2.36 > 1.64 (α=0.05), p=0.009 < 0.05, mean increased. [45:02], [53:19] - **Proportion Two-Sided: Fail to Reject**: 70/240=0.292 vs p0=0.25, z=1.66 < 1.96 (α/2=0.025), p=0.097 > 0.05, no difference. [54:06], [01:01:10]

Topics Covered

  • Match Sample Moments to Estimate Parameters
  • Maximize Likelihood by Log-Derivative Trick
  • T-Distribution Handles Small Sample Uncertainty
  • P-Value Beats Critical Value for Decisions

Full Transcript

Everyone, welcome to our chapter 9 review session. In today's video, we are

review session. In today's video, we are going to summarize everything you need to know about parameter estimation,

confidence intervals, proportions, t intervals and hypothesis testing.

Uh this chapter is very important because it connects uh the ideas from earlier chapters like sampling distribution

and uh prepare you for real statistical inference. So my goal in this session is

inference. So my goal in this session is to explain the concepts uh clearly and give you practice questions.

I want to start with parameter estimation.

So the very first concept is point estimation.

Let's talk about point estimation. A

point estimation uh is just a single number calculated from your sample that approximates

unknown population parameters.

Uh for example, the sample mean estimates the population mean. Sample

proportion estimates population proportion.

But sometimes the parameter is not mean or proportion. So we need general

or proportion. So we need general methods to estimate it. The two main

methods we use in this chapter are method of moments and method of maximum likelihood

or MLE.

So let's start with method of moments.

The idea of method of moment is u very intuitive.

If the sample comes from a known distribution, the moments of the sample should match

the moments of the population.

So we said first sample moment it's equals to first population moment.

Second sample moment is equals to second population moment and so on.

Then uh after we set first or second or third or whatever uh from the sample is equals to first

second uh population moment then we solve uh this uh equation for the unknown parameters.

Let me give you one example. Suppose um

X random variable has distribution where we know that expected value of theta

it's equals um sorry expected value of our random variable is equals to theta hat and theta here is the parameter also

uh from a sample we know that mean of that sample is equals to eight. You

remember that mean of sample we show it with xbar. So I know it's equals to 8.

with xbar. So I know it's equals to 8.

Now I'm looking for this parameter here.

How we can find this parameter? By using

the method of moment.

We know that first sample moment is equals to first population moment.

10 sample moment is equals to 10th population moment. So what is the sample

population moment. So what is the sample moment? what is the population moment in

moment? what is the population moment in general? Um

general? Um if I remove first here

then I can say that okay sample moment let's re write it as k sample moment

it's equals to summation of x i and i starts from 1 to n with exponent k

/ n which is the uh number of our random variable and I

show it with m of k and population moments I show it with mu of k and what

is the mu it's expected value of random variable x with exponent k if k is one

then mu1 is just expected value P of X and M1 is just mean because it's

summation of X Iide by N. So this is Xbar if I'm speaking about the first

sample moment and then first population moment is just expected value. In this

example I have expected value and I have the sample mean. So by method of moment

sample mean. So by method of moment we know that M1 it's equals to mu1. So

we set m_sub_1 to mu1. m_sub_1 is xar and mu1 is expected value of x.

xbar here is given is 8 and expected value is given theta hat. So

by solving you know that theta hat is equals to 16. Why I show um with hat uh

instead of just simple theta because we are estimating the parameter and it's not 16 is not the exact value of this

parameter. We just estimate it by the

parameter. We just estimate it by the method of moments.

Another method is maximum likelihood.

uh this time we pick the parameter value that makes the observed sample the most likely and because of that we call it

maximum likelihood.

how we approach solving the problem and finding the parameter based on the meth um maximum likelihood method

um based on the random variable if it's continuous or if it is a discrete random variable then we have PDF or PMF.

So I have this function f of x1 f of x2 or maybe p of x1 p of x2

okay in both cases if I have pmf or pdf it doesn't matter

let's suppose that I have multiplication of f f ofx1 to f ofxn n and this is

function l or if instead of f I have p then um I have multiplication of p of x1 to p of xn

so no matter the function is continuous or it's discrete in both cases I have new function which

is multiplication of um function of each random variable.

When I want to solve the and find the parameter based on the maximum likelihood after finding this function,

I need to take derivative and set to zero to find what is the maximum.

But here I need to find the derivative of multiplication of n different function which is complicated.

So instead I take natural log from both side. So the next step is to take

side. So the next step is to take natural log from L and natural log from the right hand side. It depends on if

it's continuous or if it's discrete. No

matter we take natural log after we take natural log from both side the next step is to take derivative right now derivative of natural log it's l prime /

l and for the right hand side we calculate what is the derivative we set it to zero to find what is the maximum and then we estimate the parameter.

Okay, for example, suppose that we have exponential distribution uh sorry, we have exponential um

and uh this data 2 1 3 and four belongs to these are x

value from the exponential.

We know that the exponential has the parameter lambda.

So we are looking for lambda actually.

And we have x1, x2, x3 and x4 which is equals to 2, 1, 3 and 4. And we want to

estimate what is the lambda here based on the maximum likelihood method.

So I know that for each of 2 1 3 and four x1 x2 x3 and x4

for each of uh this random variable there is a function and the function is for the exponential and it's equals to

lambda e2 lambda x and x is positive right so I have f of x1 x1 f of x2 f of

x3 and f ofx4.

So function l is multiplication of these functions f of x1 f of x2 f of x3 and f

ofx 4 and each of them has this definition l e2 negative l x.

After that the next step is to take natural log. So before I take natural

natural log. So before I take natural log I want to just substitute function

here. So we have natural log of e uh

here. So we have natural log of e uh sorry natural log of l e e2 negative l x1

natur um I don't have natural log because I just want to substitute.

So here I have x2 x 3 and x four. So I have this multiplication. The

four. So I have this multiplication. The

next step I want to take natural log. So

natural log of L it's equals to natural log of product of these four functions.

But based on the properties of natural log you know that natural log of this term it's natural log of lambda plus

natural log of e to negative lambda x1.

And then you can simplify it. So if I simplify I have natural log of lambda minus lambda x1

from the first one plus then second one again I have natural log of lambda negative lambda x2 and

the third term and the fourth one again is natural log of la minus lambda x4

And here you see that I have four natural log of lambda from the first, second, third and fourth term. So I have

four * natural log of lambda. And then

here you see negative lambda x1 negative lambda x2 here negative lambda x3 and negative lambda x4.

And you can factor negative lambda. The

remaining part is x1, x2, x3 and x4.

And the left hand side is natural log of l. We know that the next step after

l. We know that the next step after taking the natural log from both side is to take derivative from both side.

Derivative here is l prime / l and derivative of the um right hand side.

The first term it's four times derivative of natural log which is 1 / lambda and then derivative of lambda *

x1 + x2 + x3 + x4. Definitely it's

equals to x1 + x2 + x3 + x4. We are

because we are taking derivative respect to lambda. So we set the equation to

to lambda. So we set the equation to zero and solve it for lambda and uh you see that it is obvious that

lambda it's equals to um actually four divide by this summation

and we have x1 x2 x3 and x4 as 2 1 3 and four summation it's equals to 10.

So here is 10 and lambda hat it's equals to 4 / by 10

and then this way we can uh calculate the estimation uh we can actually estimate the parameter lambda.

Now let's move to confidence intervals.

A point estimate gives one number.

We learned how to uh find that parameter. A confidence interval gives a

parameter. A confidence interval gives a range of plausible values for a parameter.

The general form for a confidence interval, it's the uh parameter that we estimated.

So it is estimate plus and minus. And then we have margin of error. And margin of error it's

of error. And margin of error it's quantile times standard error

where estimate uh it's we can call it center and the quantile uh times standard error we can call it margin of error.

So when uh sigma or the standard deviation is noun then for mean

uh equation it's xbar. So center is xbar and the margin of error it's z of alpha half

time sigma over square root of n. So z

of alpha half times standard deviation over root of n. This part is margin of error and xbar is center or it's

estimate for mean because we remember that xbar is estimate uh of population

mean. So this is the sample mean.

mean. So this is the sample mean.

So here in this case um we use Z distribution or normal uh because the population is normal or

uh n or the sample size is greater than 30 then by central limit theorem we can use z transformation.

For example, if I have um sample mean equals to uh for example

20.4 and let's say the sample size is 40 and the standard deviation is six and I want

to construct 95% confidence interval.

95% confidence interval.

How can I um have the confidence interval? I need center which is the

interval? I need center which is the sample mean and then I need z of alpha half times standard deviation time over

the square root of n. I know what is the standard deviation. I know um n it's

standard deviation. I know um n it's equals to 40. So what is alpha in this case?

alpha or or the significant level it's here 95% confidence interval means 1 minus alpha%

uh it's equals to 95 so if 1 minus alpha is 95% means alpha it's 5% and I need z

of alpha half so half of 5% and then to find z of alpha half I and look at the

table of normal distribution to find this value or I can use um syntax of R

by using the Q norm of uh alpha half I can find z of alpha half.

So I can write it as

um xbar or center. This is center.

Then I need margin of error to find margin of error which is z of alpha half time sigma over

root of n.

Here I know alpha is 5%.

So alpha half is half of this.

So right now to find Z of alpha half I use Q norm of

99975 and then sorry it's um equals to 0

and here it's 975 5 Q norm of this then times sigma

over<unk> of 40. So this is the margin of error. I calculate what is this um Q

of error. I calculate what is this um Q norm Q norm of this value it's 1.96

and if I multiply with six and divided by square root of 40 it's almost 1.86 86

and then confidence interval actually 95% confidence interval it's center 20.4

plus and minus 1.86.

So the interval it's 18 54 and 22.26.

Now if I want to find the uh confidence interval for difference between two

means then center instead of um xbar becomes xar minus y bar and xar is the

sample mean for one population. Y bar is sample mean for another population. So

this is the center part and then uh for the margin of error part again we have z

of alpha half um and it's for the normal or when n is greater than 30 instead of sigma here um in the formula because

it's standard deviation just for this specific uh sample um let's call it x.

Now we have a standard deviation for two population. So one of them is for X, the

population. So one of them is for X, the other one is for Y. So

instead we have um square root of sigma of X²

over N ofX plus sigma of Y² over of N of Y. And in this way we can find the

Y. And in this way we can find the confidence interval for a difference between two means.

Sometimes instead of building a confidence interval we want to know how large should my sample be to achieve a

certain margin of error.

So if you want um a margin of error maximum be for example equals to delta uh for given confidence interval for

example 1 minus alpha then um n or the sample size

should be greater than or equals to z of alpha half times the standard deviation

over delta and a square of this value. And here again delta it's the maximum margin of error.

uh for example at the desired margin of error for us um or delta it's equals to

two the standard deviation it's 8 and confidence of 95%

then the lowest or the smallest sample size um it's n greater than or equals to

Z of alpha half. Here alpha is 5%.

Then alpha half is 025 time sigma which is 8 over

2² and to find z of 025 I can use syntax of r and write it as q

norm q norm of 975

* 8 / 2 a square of this value. If you

simplify you get 61.5 it means that n is greater than or equals to 61.5.

N it's the sample size and it's integer.

So it cannot be 61.5.

So the lowest integer that um is greater than or equals to 61.5 is 62.

So n equals to 62 at least. Um we need 62

sample to get 95% confidence interval.

When we want to find the confidence interval when a standard deviation is unknown,

what should we do?

Um, actually we treat uh cases differently depending on the sample size and the shape.

There are two cases when we have large sample and by large sample I mean n it's

greater than 30 or at least is equals to 30.

In this case we replace sigma which is the standard deviation for the population with sample standard deviation. So

instead of sigma we use s which is the sample standard deviation.

So in the previous formula that we learned we just replace s um sigma by s.

If you have a small sample and definitely here n is less than 30.

If you have a small sample from even normal population now we use t distribution

um because replacing sigma uh which is the population standard deviation with s

give us extra uncertainty. So instead

we use t distribution instead of normal distribution. So if I want to write the

distribution. So if I want to write the equation um not equation the formula that um we want to find the confidence

interval for um mean with unknown standard deviation.

center still is Xbar and then instead of Z of alpha half which is normal distribution we use T distribution. So

we have t of alpha half and n minus one which is the degree of freedom for the t distribution.

And then because sigma is unknown definitely we use s over square root of n.

And uh you remember that the difference between t distribution and u z distribution which belongs to the normal

is that t distribution is wider and has heavier tails.

Uh for example, suppose that we have a population of uh six different values and I know that

Xbar or the sample mean is equals to 50 and also the sample standard deviation

or S is four. I'm looking for um 90% confidence interval.

So center here is given. I know what is the center. Then I'm looking for the

the center. Then I'm looking for the margin of error and I want to use this margin of error and t distribution. The

reason that I want to use t distribution it's the amount of uh sample. The sample

size is six and definitely is less than 30. So I use t distribution. to use

30. So I use t distribution. to use

three distribution I need alpha which is the level significance level here alpha

um is 1 minus 90% so it's 10%.

And I need alpha half which is the um half of 10% which is 5% and n minus one

is degree of freedom and it depends on the sample size. So n minus one is five here the graph rhythm. Um so instead

here you can use the graph rhythm as well. So I

well. So I um can write it as margin of error it's equals to t of 5%.

and five degree of freedom times sample standard deviation divide by square root

of six. How we can find t of 5% and five

of six. How we can find t of 5% and five degree of freedom? Uh you can use

uh table to find t of 5% and five degree of freedom or use syntax of r. If you

want to use syntax of r. So it is Q t and 1 minus.05

1 minus.05 which is 95 and five as degree of freedom. In this

way you can find uh t of 5% and five degree of freedom. Then multiply with four over a square root of six. And now

you can find what is the margin of error in this case. uh if you simplify it's 3.29.

So 90% confidence interval it's center which is 50 plus and minus 3.29.

So 90% confidence interval it's 46.71 and 53.29.

Now let's work on another example.

Let's assume a sample of 200 items contains 40 defective items

and I want to find the confidence interval 95% confidence interval.

Okay. Uh the first thing here is the proportion. What is the proportion of

proportion. What is the proportion of defective items? We know that in overall

defective items? We know that in overall we have 200 items and 40 of them are

defective. So P hat here I use P hat

defective. So P hat here I use P hat instead of P for the proportion because I have a sample of 200 items. I don't

know about the whole population. So this

proportion here is just estimate for the proportion for the population.

So P here it's 40 defective in 200 items. So this is the uh estimate

for the proportion of defective which is 0.2 two and then we are looking for 95% confidence interval

to uh construct a 95% confidence interval for the proportion of defective

items again I need center and margin of error here we are speaking about

Um confidence interval for proportion means center is P hat estimate of proportion.

Okay. Now I have P hat. Then I need margin of error. For margin of error because the sample size is 200 then you

are allowed to use Z or normal distribution. So you have z of

normal distribution. So you have z of alpha half and then uh we need sigma

over root of n but we don't know what is the standard deviation right so here I

know I need sigma over root of n but we don't know what is the population standard deviation if we already know what is the

population standard deviation then we don't need to estimate it uh through the p hat. So I don't have

this. So instead I need to use sample

this. So instead I need to use sample standard deviation. So what is the

standard deviation. So what is the sample standard deviation when I have the proportion? Uh it's not anymore mean

the proportion? Uh it's not anymore mean it's it is proportion. So um the formula

here is the square root of phat q hat over n.

So sample standard deviation is a square root of p * 1 - p which is qide by n. So

in this way we can find the margin of error for this uh question. We already

know what is the center and for the rest of the question I need to find Z of alpha half

which is alpha is 5% so it's Z of 0

255 and it is Q norm of 975

and then we know what is P hat is 2.2.

So Q hat or 1 minus P is8.

Then we just substitute to find the final answer. Here it's 1.96.

final answer. Here it's 1.96.

So we have 2 plus and minus 1.96 multiply with.2

multiply with.2 2 *8 over 200.

And then we find the uh confidence interval 95% confidence interval which is

100 4 5 and 0.255.

255.

Here for this example, we've found the 95% confidence interval for proportion of defective items for one specific

sample in population.

If I want to find the difference of proportions and confidence interval for difference of proportions.

For example, for two different population. Uh if you remember the

population. Uh if you remember the previous one was difference between mean we mentioned Xar and Y bar. Now if you have two different population and we are

looking for the confidence interval for difference of proportion then instead of uh p hat for the center here I have

difference of for example p1 hat and p2 hat.

So estimate for the proportion of two populations difference of these two. And

then for margin of error again we have Z of alpha half if the population is greater than 30. And for the standard

error we have combination of two standard deviations. So I have P1 hat Q1

standard deviations. So I have P1 hat Q1

hat / N1 plus P2 hat Q2 hatide by N2.

So it's pretty similar because it's um but um the only difference is right now we have two um

estimate for the proportion and two standard deviation.

The question is uh previously we learned how to find the sample size when sigma is known. If sigma is unknown we don't

is known. If sigma is unknown we don't have the standard deviation

how we can um find the sample size when we have the proportion of a population.

the relation between um margin of error when we know what is the um standard deviation and the sample is this one.

But when we don't know what is the standard deviation and instead we have the sample standard deviation related to the proportion of a

population. Um and n which is the sample

population. Um and n which is the sample size actually margin of error in this case is z of alpha half and then the sample

standard deviation is the square root of p q over n. This is the margin of error and I want my margin of error be um less

than or equals to some specific value.

This delta is the maximum. So we can conclude that when we know what is P then definitely we know what is the Q

here and if you simplify and solve it for N and it's greater than or equals to

P Q and then a square of Z of alpha half over delta.

This is the inequality for finding the sample size when you know what is the proportion of the population. But in case of that you

the population. But in case of that you don't know what is the proportion and still you want to find the sample size.

If you don't know what is P definitely don't know what is Q and the multiplication of P * Q. But the maximum

of P * Q is 25. So you replace it by 0.25. Then multiply with the square of Z

0.25. Then multiply with the square of Z of alpha half over delta to find the sample size.

The next thing is hypothesis testing.

This is the last major part of chapter nine.

So for a hypothesis uh test there are several steps the first step is to state no and

alternate hypothesis. So we need to

alternate hypothesis. So we need to state H0 and H A as alternative hypothesis. We

know that for H0 or for the null hypothesis always we have one equally

statement like mu it's equals to mu 0 and for h a it can be um mu it's not

equals to mu0 mu is less than mu0 or mu is greater than mu0. So this is the first step to recognize which one is

valid for the question.

The second step is to compute test the statistic.

How we can compute test the statistic?

Um actually it depends on the given part of the question.

We use Z test for means when sigma is noun or we have a large sample size. So

when n is large or sigma is noun then we use z test. And what is z test?

Z test is xar minus mu0ide by sigma over root of n.

Then we have t test again we can use t test for mean when sigma is unknown we don't have a

standard deviation anymore or the sample size is small n is less than 30 then we use t test

and t test again is xbar minus mu0 and in the denominator because we don't have the standard deviation we use sample

standard deviation. So it's s over root

standard deviation. So it's s over root of n.

We have these two test statistic for mean and one more for the proportion. So

if we have proportion then again we use Z and Z test here it's equals to uh P

hat minus P 0 over uh square root of P 0 * Q0

/ N. So this is the second step that we

/ N. So this is the second step that we need to compute the test statistic based on the problem.

The third step for the hypothesis testing is to find the p value.

This is one approach to solve um and to check the hypothesis testing to find p value or we can look at the acceptance

region and rejection region. So a step tree it's using the p value and

computing the p value and um or uh another method is to find the

as rejection region. And in step three, we compare alpha with the p value.

Compare alpha with p value or we compare z of alpha with the acceptance region or

the rejection region to see that alpha it's z of alpha it's uh in which area.

And then the next step step five is to make the decision. So I can say that the

hypothesis testing has these five steps.

So the last one is make decision.

Let's work on this problem. It says a computer system historically has an average processing time of 100

mills.

So here we have 100 uh for the average processing time per task with a standard deviation of 12

milliseconds.

After a software update, a random sample of 50 task times produced a sample mean

of 104 milliseconds. Using a right tailed test at 5% significance level, test whether the

mean processing time has increased.

Okay. When you read the question, you see that um clearly it says that you have uh you

need to use the right tail test.

Okay. So the first step to solve the problem is to um state what is the no and alternative hypothesis.

So the no hypothesis here for uh for us is to mu is equals to 100 and then we have null hypothesis.

Clearly it says that uh we are using the right tail. It means that mu is greater

right tail. It means that mu is greater than 100.

But if we don't have this term here using a right tail, still we know that it is a right tail and mu should be greater than 100 from this sentence

because it says test whether the mean processing time has increased. So mu is greater than 100 and increase.

So this is the problem that I had and by right tail I mean that we have uh for example a normal distribution or a t

distribution based on the sample size we have normal because n is 50 and then um here basically is zero and we have a

symmetric normal or t distribution and uh when mu is greater than 100 then

we are in the right part and this part is the rejection area.

Okay, this is the first step. We state

h0 and h a. The second step is to compute the test statistic because here n is greater than 30. So we have normal

distribution. When we have normal

distribution. When we have normal distribution we use Z test.

So we use Z test and the formula it's XAR - mu andide by

S over root of n and um S or sigma here

sigma is given. So sigma over root of n.

So xbar for us here it's 104 based on the given part because it says that the sample mean is

104 and mu0 is 100.

The standard deviation is 12 and the sample size is 50.

Now we compute this uh and we have 2.36 for the second step. The next step is to

compute P value or if you remember I mentioned that you can find Z of alpha and then compare Z of alpha uh with your

Z test value with the test statistic and see if Z of alpha is greater than this

value or it's less than this value.

I do both way. So let's find Z of alpha.

And this point here is Z of alpha.

Z of alpha it's Z of and alpha is 5%.

Because alpha is 5% I need Z of 5%.

we can compute it by Q norm of 95.

This value is 1.64 almost.

If I want to just uh keep two decimal places is just 1.64.

Now our job is to compare Z test with Z of alpha.

If Z test is greater than Z of alpha or it's less if it's greater than Z of alpha then we are in the rejection area

and we need to reject the null hypothesis.

Let's see here I know that uh Z of alpha is 1.64.

So here right now it's 1.64.

and Z test is 2.36.

So I can say that okay Z test from the test statistic is here and as you see is greater than Z of alpha and

we are in the rejection area because this part is rejection area.

So we need to reject reject no hypothesis.

Another way and another approach to solve the problem after finding the z test is to find p value.

when we want to find the p value what we need to do to find the p uh p value after we

finding the z test now I don't know what is the z of alpha I have the test statistic is here and

now I'm trying to compute the probability of all variable greater than z test so If this is my graph, I want to

see what is the probability of all random variable greater than Z test. So

to find the P value, I compute the probability of all random variable greater than 2.36.

This is the normal distribution and we know how to find the uh probability

greater than some specific value 2.36.

So it's equals to 1 minus p norm and p norm of 2.36 because you remember for the normal distribution when we are computing the

probability of random variable less than some constant for example 2.36 we use p norm. Now all random variable greater

norm. Now all random variable greater than 2.36 so it's complement and is one minus p norm. Now we compute this to

find what is the p value. It's equals to uh 09.

So this is the p value.

After we found the p value, now we compare p value with alpha.

Alpha is 5%.

As you see P value which is 0.9 is less than 5%. Which is alpha or the

significance level. If p value is less

significance level. If p value is less than alpha we reject the null hypothesis.

So it means that there is evidence that the mean uh processing time increase. So

again we reject no hypothesis.

Okay. The next question is a company believes that 25% of customers experience a minor complaint with their

service.

In a recent sample of 240 customers, 70 reported a complaint.

Now um at the 5% level determine whether the true complaint rate differs from 20%.

Okay. The first step is to um state no and alternative hypothesis.

So what is the new and alternative?

The no hypothesis is the proportion it's equals to 25% or 25.

And then it says that determine whether the true complaint range differs from 25. So it can be less or greater than

25. So it can be less or greater than 25.

So P it's not equals to 25. Actually

this is a two-sided test.

When we have two-sided test then instead of considering just Z of alpha

here and have rejection area in one direction we have two rejection area and area

in total is alpha. So alpha half is here and alpha half is here. So you need to remember that we have two rejection

area.

This one and this one.

Here is Z of alpha half and here is negative Z of alpha half.

Now the second step we want to compute the test statistic.

How you can compute the test statistic?

It's Z or T test or Proportion as you see right now we are speaking about the proportion and the sample size is 240.

So we are using normal and Z uh test for the proportion.

So what is the P hat here? We know that among 240

there is 70 complaint. So 70 / 240 gives us P hat equals to almost 209. If I just

keep two decimal places.

Okay.

Now we want to compute Z test for the proportion.

Um so it's P hat minus let's call it P 0 and in the denominator

we have P * Q over and under the square root. Now if we substitute P hat is 259

root. Now if we substitute P hat is 259 and P 0 is 25.

In the denominator we have 25 and q it's 75

over<unk> of n which is 240.

If I simplify the answer is 1 66 and this is the z test. Now we have uh two approaches.

We can find z of alpha half and negative z of alpha half and then check that if z test it is positive. So I just care

about this part definitely is not negative. Um so I want to check that z

negative. Um so I want to check that z test is here or here.

If it's greater than Z of alpha half then it's in the rejection area. So we

need to reject the no hypothesis and if it's less than Z of alpha half then uh we cannot reject the no hypothesis

or I need to compute p value. Uh to

compute the p value now we have z test. uh let's suppose that as I don't know where is the Z of

alpha half but uh here is 1.66 then I'm going to check probability of

all um random variable greater than 1.66 66 and compare it with the 5% level. If the

p value is less then we reject the no hypothesis. If

it's greater greater than alpha then we cannot reject.

Okay. Here I'm going to do both and compare the two methods with each other.

If you compute p value um we have z test and uh let's say z

it's here 1.66 66 because we have two-sided um normal uh distribution.

One thing is to check that if Z test is here and the negative of Z test is here, what is the probability of all Z greater

than Z test and then what is the probability of all Z less than negative of Z test? So I'm going to find the

probability of all random variable greater than 1.66 and less than -1.66.

So these two parts are equals to each other because the graph is symmetric. So

if you uh find the probability of one part and multiply with two then you have the probability of both. So it is 2 * of

probability of Z greater than 1.66 or 2 * of probability of Z less than -1.66

and this is equals to 0.97.

You see that uh alpha here is 5% and 5% is again um alpha here is less than p value.

So p value is greater than alpha which is 5%.

And we know that when p value is greater than alpha we cannot reject no hypothesis. So we

fail to reject no hypothesis. Okay. Now

second approach when we want to compute Z of alpha half and negative Z of alpha half.

For the second approach we compute Z of alpha half. Alpha is 5% so alpha half is 025.

I use Q norm of 975 which is 1.96.

It means that this point is 1.96 and this point here is 1.96.

Now where is 1.66 respect to Z of alpha half and negative Z of alpha half? If here it's 1.96

this point is Z test and it's 1.66.

These two region here and here they are rejection region and this is acceptness region because right now um

the Z test is less than Z of alpha half and also it's greater than

greater than negative Z of alpha half.

It is in the acceptness region and we cannot reject null hypothesis. So we

fail to reject null hypothesis.

Next question.

A juice bottle is supposed to contain 80 ml of juice on average. A sample of uh

10 bottles is measured. Sample mean is 84 ml and sample standard deviation is 12 ml. At 5% significance level,

12 ml. At 5% significance level, determine whether the machine is overfitting.

Okay. To solve the problem, the first step is to um see what is the no and alternative

hypothesis.

So the no hypothesis here is given and it says that the average um is 80 ml. It

means mu0 is 80. And then we want to see at 5% significance level if the machine is overfitting.

Overfitting means mu0 is greater than 80 m.

Second step is to uh compute the test statistic. In this

case we have 10 bottle of sample. So the

sample size is 10 and is small and also we have sample standard deviation. When we don't know what is

deviation. When we don't know what is the standard deviation and the sample size is a small definitely we use

t test instead of z test.

So I need to use t test and what is the formula? It's xar minus mu0 / by s over

formula? It's xar minus mu0 / by s over roo<unk> of n.

We have t distribution that means that if this is my distribution and

I'm looking for mu0 greater than 80 it means that it's right tail.

So this is the rejection area. Here we reject null

rejection area. Here we reject null hypothesis and here we fail to reject the null hypothesis.

Okay, it means that this point is t of alpha. So one approach is to compute t

alpha. So one approach is to compute t test and compare it with t of alpha. and

t of alpha here is t of 5%.

And to calculate t of 5% remember that when we have t distribution it's important for us that what is the

degraph of freedom and degraph freedom it's n minus one in this case n is 10

and n minus one is 9.

Okay.

So the syntax here is QT and

1 minus alpha is 95 and um degree of freedom is 9. Now I

need to compute this almost is it equals to 1

83 if I keep two decimal places.

Now if I compute t test and I see that it is um um greater than t of alpha it means it's is

in the rejection area if it's here and if it's less than t of alpha then we

fail to reject no hypothesis. So let's

see xbar is 84 and mu0 is 80 s is 12 and n is 10. So

I have a square root of 10 which is 1 0 54.

1.05 respect to 1.83 is smaller. So

t test is here and you see that is less than t of alpha. So it means that it's in the uh area that we failed to reject

the null hypothesis. So one approach is because t test is less than t of alpha

then we fail to reject no hypothesis.

Another approach instead of comparing t test with t of alpha is to compute p

value. And p value is the probability of

value. And p value is the probability of all random variable greater than this

t test. Test is here and I want to see

t test. Test is here and I want to see what is the probability of all t greater than 1.0. 0

1.054.

This is the t distribution. To compute t greater than some constant, it's 1 minus pt t 1.054.

And do not forget that we need to know degree of freedom which is nine. Now if

I compute this it almost is equals to 16. This is the p value. And now I want

16. This is the p value. And now I want to compare p value uh respect to alpha.

So let's suppose that alpha which is 5% is here. This is alpha. We know that if

is here. This is alpha. We know that if p value is less than alpha

we can reject null hypothesis and if p is greater than alpha then we fail to reject the null hypothesis.

This is 16 and this is 05. You see that p value

is greater than alpha. So when p value is greater than alpha then we fail to reject the null hypothesis. It means

there is no significant evidence that the machine is overfeitting.

All right everyone that concludes our chapter nine review. The covered method of moments, maximum likelihood, confidence interval with known and

unknown standard deviation, uh proportions and difference of proportions, difference of mean,

uh hypothesis test, p values, and how we can interpret the result and make a decision.

Make sure practice the problem and uh thank you for watching and good luck on your test.

Loading...

Loading video analysis...