Digit Recognizer
Data Introduction--數(shù)據(jù)說明
The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.
數(shù)據(jù)文件train.csv及test.csv包含從0到9幾個手寫數(shù)字的灰度圖像镀裤。
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
每張圖片長寬各28像素竞阐,共784個像素點。每個像素點均關(guān)聯(lián)一個像素值暑劝,像素值表明其明暗程度骆莹,數(shù)值越高標(biāo)明像素點越暗。像素值取值區(qū)間[0,255]中的整數(shù)铃岔。
The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.
訓(xùn)練數(shù)據(jù)(train.csv)包含785列汪疮。第一列為“標(biāo)簽”,代表實際手寫數(shù)字.其余列包含所有該圖片的像素值數(shù)據(jù)毁习。
Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).*
訓(xùn)練集中的每列名字為“pixelx”智嚷,其中字母“x”代表一個位于區(qū)間[0,783]的整數(shù)。使用如下方式定位一個像素在圖像中的位置纺且,將整數(shù)x分解為x=i*28+j盏道,其中i和j取區(qū)間[0,27]的整數(shù)载碌,則“pixelx”像素位于28X28矩陣的i行和j列(索引起始于0)猜嘱。
For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below.
例如,“pixel31”表明該值為“ascii-diagram”從上到下第二行和從左到右第四列的像素值嫁艇。
Visually, if we omit the "pixel" prefix, the pixels make up the image like this:
具體的朗伶,如果我們忽略“pixel”前綴,則像素值如下圖一樣組成一個圖片:
000 001 002 003 ... 026 027
028 029 030 031 ... 054 055
056 057 058 059 ... 082 083
………
728 729 730 731 ... 754 755
756 757 758 759 ... 782 783
The test data set, (test.csv), is the same as the training set, except that it does not contain the "label" column.
測試數(shù)據(jù)集(text.csv)除了不包含“標(biāo)簽”列二外與訓(xùn)練數(shù)據(jù)集具有同樣的格式步咪。
Your submission file should be in the following format: For each of the 28000 images in the test set, output a single line containing the ImageId and the digit you predict. For example, if you predict that the first image is of a 3, the second image is of a 7, and the third image is of a 8, then your submission file would look like:
你的提交文件需要滿足如下的格式:對于28000張測試圖片的每張輸出一行论皆,該行包括圖片ID“ImageId”和預(yù)測數(shù)字。例如猾漫,如果預(yù)測的第一幅圖數(shù)字為3点晴,第二幅圖數(shù)字為7,第三幅圖數(shù)字為8悯周,則提交文件如下所示:
ImageId,Label
1,3
2,7
3,8
(27997 more lines)
The evaluation metric for this contest is the categorization accuracy, or the proportion of test images that are correctly classified. For example, a categorization accuracy of 0.97 indicates that you have correctly classified all but 3% of the images.
此比賽的評價標(biāo)準(zhǔn)為分類的準(zhǔn)確性粒督,或者說是正確分類的圖片的比例。例如禽翼,分類準(zhǔn)確度為0.97表明成功分類了除了3%的圖片以外的所有圖片屠橄。