2019年12月16日:
之前只是知道有一個算sample size的calculator(https://www.surveysystem.com/sscalc.htm#one)
后面,看到了<Determining sample size for research activities>文章的時候石挂,才知道這個calculator計算的方式就是來自這篇論文拯腮。也算是有依據(jù)了琼懊。直接給了個table做ref削祈,不容易褪猛。再大的population size笑旺,也基本會在380+上收斂乌妙。是一個喜人的發(fā)現(xiàn)。只是盯荤,這個文章中的公示灼卢,怎么得來的(所引論文沒有下載到),想要知道特幔。
ICSE 2019:Software Documentation Issues Unveiled
這個里面sample的數(shù)量跟這個對上了遭赂,https://www.surveysystem.com/sscalc.htm
然后逆粹,第一次知道confidence level和confidence interval不是要加起來等于100?
想怎么sample就怎么sample么。瓮下。
需要check下迷捧。
原文如下:
2) Manual Classification of Documentation Issues: Once we collected the candidate artifacts, we manually analyzed a statistically significant sample ensuring a 99% confidence level ± 5%. This resulted in the selection of 665 artifacts for our manual analysis, out of the 805,939 artifacts collected from the four sources.?
Since the number of collected artifacts is substantially different between the four sources (Table II), we decided to randomly select the 665 artifacts by considering these proportions. A simple proportional selection would basically discard SO and mailing lists from our study, since issues and pull requests account for over 90% of our dataset. Indeed, this would result in the selection of 311 pull requests, 326 issues, 24 SO discussions and 6 mailing list threads.?
For this reason, we adopted the following sampling procedure: for SO and mailing lists, we targeted the analysis of 96 artifacts each, ensuring a 95% confidence level ± 10% within those two sources. For issues and pull requests, we adopted the proportional selection as explained above. This resulted in 829 artifacts to be manually analyzed (99% confidence ± 4.5%).