Lesson 8 - 抽樣分布與中心極限定理

概述

請(qǐng)注意我們本來(lái)要找的是什么,我們要找的是特定樣本在樣本均值分布的什么位置,不僅僅針對(duì)是這個(gè)簡(jiǎn)單的總體更是針對(duì)龐大的總體。

現(xiàn)在可以找到了因?yàn)楝F(xiàn)在我們知道對(duì)于均值分布,其中每個(gè)均值都是樣本量為 n 的均值闪幽,該分布的標(biāo)準(zhǔn)偏差就等于總體標(biāo)準(zhǔn)偏差除以平方根 n,這就叫做中心極限定理涡匀。

它不僅適用于這些簡(jiǎn)單的總體盯腌,更是適用于任何總體。正是因?yàn)橹行臉O限定理陨瘩,我們的總體可以是任何形狀腕够。

假設(shè)我們從中抽取一個(gè)樣本并計(jì)算出均值,然后再抽取出一個(gè)樣本并計(jì)算出均值舌劳,持續(xù)這么操作帚湘。

如果畫出均值分布圖的話,形狀會(huì)是相對(duì)正態(tài)的甚淡,其中標(biāo)準(zhǔn)偏差等于總體標(biāo)準(zhǔn)偏差除以樣本量的平方根我們一直都叫它 SE

02-Video: Descriptive vs. Inferential Statistics

In this section, we learned about how Inferential Statistics differs from Descriptive Statistics.


Descriptive Statistics

Descriptive statistics is about describing our collected data.


Inferential Statistics

Inferential Statistics is about using our collected data to draw conclusions to a larger population.

image.png

We looked at specific examples that allowed us to identify the

  1. Population - our entire group of interest.
  2. Parameter - numeric summary about a population
  3. Sample - subset of the population
  4. Statistic numeric summary about a sample

05-Text: Descriptive vs. Inferential Statistics\

Descriptive vs. Inferential Statistics

In this section, we learned about how Inferential Statistics differs from Descriptive Statistics.

Descriptive Statistics

Descriptive statistics is about describing our collected data using the measures discussed throughout this lesson: measures of center, measures of spread, shape of our distribution, and outliers. We can also use plots of our data to gain a better understanding.

Inferential Statistics

Inferential Statistics is about using our collected data to draw conclusions to a larger population. Performing inferential statistics well requires that we take a sample that accurately represents our population of interest.

A common way to collect data is via a survey. However, surveys may be extremely biased depending on the types of questions that are asked, and the way the questions are asked. This is a topic you should think about when tackling the the first project.

We looked at specific examples that allowed us to identify the

  1. Population - our entire group of interest.
  2. Parameter - numeric summary about a population
  3. Sample - subset of the population
  4. Statistic numeric summary about a sample

10-Text: Sampling Distribution Notes

Sampling Distributions Notes(抽樣分布時(shí)統(tǒng)計(jì)值的分布)

We have already learned some really valuable ideas about sampling distributions:


First, we have defined sampling distributions as the distribution of a statistic.

This is fundamental - I cannot stress the importance of this idea. We simulated the creation of sampling distributions in the previous ipython notebook for samples of size 5 and size 20, which is something you will do more than once in the upcoming concepts and lessons.

選擇不同的組合統(tǒng)計(jì)量會(huì)不相同

image.png

如果選擇所有的組合將會(huì)出現(xiàn)下面的結(jié)果

image.png

如果將不同的組合產(chǎn)生的統(tǒng)計(jì)量進(jìn)行繪圖可得

image.png

以上的分布就為抽樣分布


Second, we found out some interesting ideas about sampling distributions that will be iterated later in this lesson as well. We found that for proportions (and also means, as proportions are just the mean of 1 and 0 values), the following characteristics hold.

  • The sampling distribution is centered on the original parameter value.

  • The sampling distribution decreases its variance depending on the sample size used. Specifically, the variance of the sampling distribution is equal to the variance of the original data divided by the sample size used(抽樣分布的方差等于原始數(shù)據(jù)的方差除以樣本量). This is always true for the variance of a sample mean!

image.png

樣本均值的抽樣分布圖, 其方差為σ平方(原始數(shù)據(jù)的)除以樣本量

練習(xí)

image.png

Looking Ahead

The rest of this lesson will reinforce some of these ideas that you saw at work in this notebook, but you are already being introduced to some big ideas that will continue to show up again and again.

12-Video: Notation for Parameters vs. Statistics

image.png
image.png

As you saw in this video, we commonly use Greek symbols as parameters and lowercase letters as the corresponding statistics. Sometimes in the literature, you might also see the same Greek symbols with a "hat" to represent that this is an estimate of the corresponding parameter.

Below is a table that provides some of the most common parameters and corresponding statistics, as shown in the video.

Remember that all parameters pertain to a population, while all statistics pertain to a sample.

image.png

注意

總體參數(shù)不會(huì)因樣本的不同發(fā)生變化, 只有統(tǒng)計(jì)量會(huì)因樣本的不同而不同.

15-Video: Two Useful Theorems - Law of Large Numbers

Two important mathematical theorems for working with sampling distributions include:

image.png
  1. Law of Large Numbers(大數(shù)定理)
  2. Central Limit Theorem(中心極限定理)

The Law of Large Numbers says that as our sample size increases, the sample mean gets closer to the population mean, but how did we determine that the sample mean would estimate a population mean in the first place? How would we identify another relationship between parameter and statistic like this in the future?


Three of the most common ways are with the following estimation techniques:

Though these are beyond the scope of what is covered in this course, these are techniques that should be well understood for Data Scientist's that may need to understand how to estimate some value that isn't as common as a mean or variance. Using one of these methods to determine a "best estimate", would be a necessity.

17-Video: Two Useful Theorems - Central Limit Theorem

The Central Limit Theorem states that with a large enough sample size the sampling distribution of the mean will be normally distributed.

The Central Limit Theorem actually applies for these well known statistics:

image.png

And it applies for additional statistics, but it doesn't apply for all statistics! . You will see more on this towards the end of this lesson.

20-Video: When Does the Central Limit Theorem Not Work?

In the previous example, you saw how the Central Limit Theorem applies to the sample mean of 100 draws from a right-skewed distribution. However, it did not apply to a sample size of 3 draws from this same distribution.(并不適用所有的抽樣分布)

適用于:


image.png

不適用于:


image.png

In the next concepts, you will see that the with large sample sizes the sampling distribution of certain statistics will never become normally distributed. So how do we know which statistics will follow normal distributions, and which will not?

So, you might be wondering already why is the Central Limit Theorem such a big deal? In our new age of computers, it probably isn't as big of a deal, but more on this coming up soon!

22-Video: Bootstrapping(自助法)

Bootstrapping is sampling with replacement.(已放回方式進(jìn)行抽樣, 也就是說(shuō)被抽取的個(gè)體有可能在下一次接著被抽到, 也有可能被一直抽到, 但是這個(gè)可能性非常小)Using random.choice in python actually samples in this way. Where the probability of any number in our set stays the same regardless of how many times it has been chosen. Flipping a coin and rolling a die are kind of like bootstrap sampling as well, as rolling a 6 in one scenario doesn't mean that 6 is less likely later.

23-Video: Bootstrapping & The Central Limit Theorem

image.png

在推論統(tǒng)計(jì)學(xué)中, 使用統(tǒng)計(jì)量去推斷總體參數(shù), 假設(shè)我們讓樣本當(dāng)作一個(gè)總體, 上圖中的21個(gè)杯子, 雖然只有總體的一個(gè)樣本, 但是假設(shè)它們是總體, 可以從中對(duì)其進(jìn)行自助抽樣, 在一個(gè)樣本和另一個(gè)樣本之間, 喝咖啡的人之間比例有什么變化.

image.png

從上圖中可以看出, 兩次的均值不同, 因?yàn)榈诙坞m然還是21個(gè)樣本數(shù), 但是每一個(gè)個(gè)體都是從新從原始的21個(gè)個(gè)體中抽取.

You actually have been bootstrapping to create sampling distributions in earlier parts of this lesson, but this can be extended to a bigger idea.

It turns out, we can do a pretty good job of finding out where a parameter is by using a sampling distribution created from bootstrapping from only a sample. This will be covered in depth in the next lessons.

Three of the most common ways are with the following estimation techniques for finding "good statistics" are as shown previously:

Though these are beyond the scope of what is covered in this course, these are techniques that should be well understood for data scientists who may need to understand how to estimate some value that isn't as common as a mean or variance. Using one of these methods to determine a "best estimate" would be a necessity.

25-Video: The Background of Bootstrapping

Two helpful links:

  • You can learn more about Bradley Efron here.

  • Additional notes on why bootstrapping works as a technique for inference can be found here.

26-Video: Why are Sampling Distributions Important

27-Quiz + Text: Recap & Next Steps

Recap

In this lesson, you have learned a ton! You learned:

image.png

Sampling Distributions

  • Sampling Distributions are the distribution of a statistic (any statistic).

  • There are two very important mathematical theorems that are related to sampling distributions: The Law of Large Numbers and The Central Limit Theorem.

  • The Law of Large Numbers states that as a sample size increases, the sample mean will get closer to the population mean. In general, if our statistic is a "good" estimate of a parameter, it will approach our parameter with larger sample sizes.

  • The Central Limit Theorem states that with large enough sample sizes our sample mean will follow a normal distribution, but it turns out this is true for more than just the sample mean.


Bootstrapping

  • Bootstrapping is a technique where we sample from a group with replacement.

  • We can use bootstrapping to simulate the creation of sampling distribution, which you did many times in this lesson.

  • By bootstrapping and then calculating repeated values of our statistics, we can gain an understanding of the sampling distribution of our statistics.


Looking Ahead

In this lesson you gained the fundamental ideas that will help you with the next two lessons by learning about sampling distributions and bootstrapping. These are going provide the basis for confidence intervals and hypothesis testing in the next two lessons.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末大诸,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子贯卦,更是在濱河造成了極大的恐慌资柔,老刑警劉巖,帶你破解...
    沈念sama閱讀 206,214評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件撵割,死亡現(xiàn)場(chǎng)離奇詭異贿堰,居然都是意外死亡,警方通過(guò)查閱死者的電腦和手機(jī)啡彬,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,307評(píng)論 2 382
  • 文/潘曉璐 我一進(jìn)店門羹与,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)故硅,“玉大人,你說(shuō)我怎么就攤上這事纵搁∑跫” “怎么了?”我有些...
    開封第一講書人閱讀 152,543評(píng)論 0 341
  • 文/不壞的土叔 我叫張陵诡渴,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我菲语,道長(zhǎng)妄辩,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,221評(píng)論 1 279
  • 正文 為了忘掉前任山上,我火速辦了婚禮眼耀,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘佩憾。我一直安慰自己哮伟,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,224評(píng)論 5 371
  • 文/花漫 我一把揭開白布妄帘。 她就那樣靜靜地躺著楞黄,像睡著了一般。 火紅的嫁衣襯著肌膚如雪抡驼。 梳的紋絲不亂的頭發(fā)上鬼廓,一...
    開封第一講書人閱讀 49,007評(píng)論 1 284
  • 那天,我揣著相機(jī)與錄音致盟,去河邊找鬼碎税。 笑死,一個(gè)胖子當(dāng)著我的面吹牛馏锡,可吹牛的內(nèi)容都是我干的雷蹂。 我是一名探鬼主播,決...
    沈念sama閱讀 38,313評(píng)論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼杯道,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼匪煌!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起党巾,我...
    開封第一講書人閱讀 36,956評(píng)論 0 259
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤虐杯,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后昧港,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體擎椰,經(jīng)...
    沈念sama閱讀 43,441評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,925評(píng)論 2 323
  • 正文 我和宋清朗相戀三年创肥,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了达舒。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片值朋。...
    茶點(diǎn)故事閱讀 38,018評(píng)論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖巩搏,靈堂內(nèi)的尸體忽然破棺而出昨登,到底是詐尸還是另有隱情,我是刑警寧澤贯底,帶...
    沈念sama閱讀 33,685評(píng)論 4 322
  • 正文 年R本政府宣布丰辣,位于F島的核電站,受9級(jí)特大地震影響禽捆,放射性物質(zhì)發(fā)生泄漏笙什。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,234評(píng)論 3 307
  • 文/蒙蒙 一胚想、第九天 我趴在偏房一處隱蔽的房頂上張望琐凭。 院中可真熱鬧,春花似錦浊服、人聲如沸统屈。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,240評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)愁憔。三九已至,卻和暖如春孽拷,著一層夾襖步出監(jiān)牢的瞬間惩淳,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,464評(píng)論 1 261
  • 我被黑心中介騙來(lái)泰國(guó)打工乓搬, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留思犁,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 45,467評(píng)論 2 352
  • 正文 我出身青樓进肯,卻偏偏與公主長(zhǎng)得像激蹲,于是被迫代替她去往敵國(guó)和親。 傳聞我的和親對(duì)象是個(gè)殘疾皇子江掩,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,762評(píng)論 2 345

推薦閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,292評(píng)論 0 10
  • 單身人士大概都會(huì)遭遇到被父母好友环形,街坊鄰居策泣,各方親戚等追問(wèn)何時(shí)脫單,抬吟,“你有對(duì)象沒有萨咕?”“有喜歡的人了嗎?”“...
    染墨葙閱讀 221評(píng)論 0 1
  • 這幾天生活中一直在看我的大樹火本,累時(shí)看危队,有力量聪建;動(dòng)搖時(shí)看,很堅(jiān)定茫陆。我很愛它金麸。回想那天在武夷山尋到它時(shí)簿盅,它安安靜靜的站...
    心寬者閱讀 133評(píng)論 0 1
  • “ 是想也是像挥下。”
    空集一原閱讀 124評(píng)論 0 0
  • 想我的貓了 朋友家被嫌棄的貓桨醋,來(lái)到了我家棚瘟,這是第一天,我們懷著忐忑的心情等待他吃飯拉屎讨盒,用來(lái)確認(rèn)他的意愿,寬慰大家...
    貓本貓的貓和貓閱讀 196評(píng)論 1 0