3.3 概率分布
3.3.1 離散型隨機(jī)變量
若隨機(jī)變量的取值為有限個(gè)或可列個(gè)薪棒,則稱(chēng)此隨機(jī)變量為離散型(discrete)隨機(jī)變量手蝎,簡(jiǎn)稱(chēng)離散量。
比如你拋擲一枚硬幣兩次俐芯,那么結(jié)果只有4種可能性:
HH棵介,HT,TH和TT(H:正面吧史;T:反面)
如果用一個(gè)隨機(jī)變量X表示該試驗(yàn)中出現(xiàn)H結(jié)果的次數(shù)鞍时,那么X只有0,1,2三種可能。因此扣蜻,X為離散型隨機(jī)變量逆巍。具體地:
P(X=0)=0.25
P(X=1)=0.5
P(X=2)=0.25
P(X):Probability Distribution Function(PDF) of variable X 為X的概率分布律,滿足下列性質(zhì):
3.3.2 連續(xù)型隨機(jī)變量
對(duì)于隨機(jī)變量X莽使,若存在一個(gè)非負(fù)的實(shí)函數(shù)f(x)锐极,使X落在任意區(qū)域D上的概率
則稱(chēng)為X的連續(xù)型隨機(jī)變量,簡(jiǎn)稱(chēng)連續(xù)量芳肌,稱(chēng)f(x)為X的概率密度函數(shù)灵再,簡(jiǎn)稱(chēng)密度肋层。
由定義知,密度函數(shù)具有以下性質(zhì):
(1)f(x)≥0
(2)
(3)
離散型變量和連續(xù)型變量的總結(jié):
Mean and variance for discrete variable with a given PDF
3.3.3 0-1(p)分布
E(X)=1×p+0×(1-p)=p
Var(X)=E(X2)-(E(X))2=(12×p+02×(1-p))-p2=p-p2=p(1-p)
3.3.4 貝努里分布 Bernoulli distribution
定義:在n次獨(dú)立重復(fù)的試驗(yàn)中翎迁,每次試驗(yàn)都只有兩個(gè)結(jié)果:A,A‘,且每次試驗(yàn)中A發(fā)生的概率不變栋猖,記P(A)=p,0<p<1汪榔,稱(chēng)這一系列試驗(yàn)為n重貝努里(Bernoulli)試驗(yàn)蒲拉。
在n重貝努里試驗(yàn)中,若記事件A發(fā)生的概率為P(A)=p痴腌,0<p<1雌团,設(shè)X為在n次試驗(yàn)中A發(fā)生的次數(shù),則:
E(x)=E(x1+x2+...+xn)=E(x1)+E(x2)+...+E(xn)=p+p+...+p=np
Var(x)=Var(x1+x2+...+xn)=Var(x1)+Var(x2)+...+Var(xn)=p(1-p)+p(1-p)+...+p(1-p)=np(1-p)
Example of a Binomial distribution
When a fair coin is flipped, the probability of it being Head or Tail is the same, i.e.,p=0.5.
If we flip the coin 5 times, what is the probability of having 5 Head?
Answer.png
Example of a Binomial distribution
After a genome wide Chip-seq experiment, a transcription factor was found to bind to the promoter region of 100 genes(out of 26,000). Now, if we do another experiment with a second TF and identify also 100 genes, what is the probability of finding at least 5 of them with the first TF binding site?
Suppose the first TF binds to gene without any preference, then the probability of a gene randomly selected from the genome that is bound by the first TF is 100/26000=0.039
For a given gene, it is either bound by the first TF('success') or not ('failure'),i.e.,a Bernoulli trail.
If the second TF is independent of the first TF, then the number of genes bound by the second TF that are also bound by the first TF follows a binomial distribution.
Binomial distribution:n=100,p=0.0039
P(k=0)=0.6765408
P(k=1)=0.2648840
P(k=2)=0.05133606
P(k=3)=0.006565821
P(k=4)=0.0006233937
P(k>=5)=1-P(k=0)-P(k=1)-P(k=2)-P(k=3)-P(k=4)=4.992756e-05
3.3.5 負(fù)貝努里分布 Negative Binomial distribution
定義:實(shí)驗(yàn)包含一系列獨(dú)立的試驗(yàn)士聪,每個(gè)試驗(yàn)都有成功锦援、失敗兩種結(jié)果,成功的概率p是恒定的剥悟,實(shí)現(xiàn)持續(xù)到r次成功灵寺,r為正整數(shù)。滿足上述條件的稱(chēng)為負(fù)貝努里分布区岗。
Mean and Variance of Negative Binomial Distribution
Alternative formulation of Negative Binomial distribution
Example of negative binomial distribution
If a predator must capture 10 prey before it can grow large enough to reproduce, what would the mean age of onset of reproduction be if the probability of capturing a prey on any given day is 0.1?Answer.png
The expected time is 100 days. However, the variance is quite high (900) and that the distribution looks quite skewed. Some predators will reach reproductive age much sooner and some much later than the average.
3.3.6 幾何分布 Geometric distribution
定義:在n次貝努里試驗(yàn)中略板,試驗(yàn)k次才得到第一次成功的機(jī)率。即躏尉,前k-1次皆失敗蚯根,第k次成功的概率。
Example of geometric distribution
If the probability of extinction of an endangered population is estimated to be 0.1 every year, what is the expected time until extinction?Answer.png
The expected time is 10 year. However, because of large variance, it will be difficult to predict the actual year in which the population go to extinct accurately.