#2.1.1 Getting Started With NumPy

1. Introducing NumPy

在前兩個課程中，我們使用Python中的嵌套列表來表示數據集拢军。 Python列表在表示數據時提供了一些優(yōu)勢：

列表可以包含混合類型
列表可以縮小并動態(tài)增長

使用Python列表來表示和處理數據也有一些關鍵的缺點：

為了支持他們的靈活性庸汗，列表往往消耗大量的內存
他們努力與中型和大型數據集合（they struggle to work with medium and larger sized datasets）

盡管有許多不同的方式對編程語言進行分類凳寺，但考慮到性能的一個重要方式是低級和高級語言之間的區(qū)別。Python是一種高級編程語言套啤，允許我們快速編寫宽气，原型和測試我們的邏輯。另一方面纲岭，C編程語言是一種低級編程語言抹竹，性能非常高，但工作流程慢得多止潮。

NumPy是一個將Python的靈活性和易用性與C速度相結合的庫窃判。在這個任務中，我們將首先熟悉NumPy的核心數據結構喇闸，然后建立使用NumPy來處理數據集world_alcohol.csv袄琳，其中包含了每個國家人均消費的數據。

2. Creating arrays

Learn

NumPy的核心數據結構是對象 ndarray燃乍，代表N維數組唆樊。數組(array)是值的集合，類似于列表刻蟹。N維是指從對象中選擇各個值所需的索引數（N-dimensional refers to the number of indices needed to select individual values from the object.）逗旁。

通常將1維陣列稱為向量，而2維陣列通常被稱為矩陣舆瘪。這兩個術語都是從稱為線性代數的數學分支中借用的片效。它們也經常用于數據科學文獻，所以我們將在整個課程中使用這些單詞英古。
為使用NumPy淀衣，我們首先需要將它導入到我們的環(huán)境中。NumPy通常使用別名np導入：

import numpy as np

我們可以使用numpy.array()函數直接從列表構造數組召调。要構建一個向量膨桥，我們需要傳遞一個列表（沒有嵌套）：

vector = np.array([5, 10, 15, 20])

numpy.array()函數也接受一個列表，我們用來創(chuàng)建一個矩陣（有嵌套）：

matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])

Instructions

從列表中創(chuàng)建一個向量[10唠叛，20只嚣，30]
- 將結果分配給變量vector
從列表[[5,10,15]，[20艺沼，25册舞，30]，[35澳厢，40环础，45]]列表中創(chuàng)建一個矩陣。
- 將結果分配給變量matrix

import numpy as np
vector = np.array([10, 20, 30])
matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
print(matrix)
[[ 5 10 15]
 [20 25 30]
 [35 40 45]]

3. Array shape

Learn

數組有一定數量的元素剩拢。下面的數組有5個元素：

Paste_Image.png

矩陣代替使用行和列线得，這符合我們在前幾個課程中對數據集的想法。下面的矩陣有3行和5列徐伐，通常稱為3×5矩陣：

Paste_Image.png

了解數組包含的元素通常很有用贯钩。我們可以使用ndarray.shape屬性來確定數組中有多少個元素。

vector = numpy.array([1, 2, 3, 4])
print(vector.shape)

上面的代碼將導致元組(4, )办素。該元組表示數組向量具有一個維度角雷，長度為4，這與我們的直覺相匹配性穿，該向量具有4個元素勺三。
對于矩陣，shape屬性包含一個包含2個元素的元組需曾。

matrix = numpy.array([[5, 10, 15], [20, 25, 30]])
print(matrix.shape)

上述代碼將導致元組(2, 3)表示矩陣具有2行和3列吗坚。

Instructions

導入numpy并分配給別名np；
將向量的長度(shape)分配給vector_shape呆万；
將矩陣的長度(shape)分配給matrix_shape商源；
使用print()函數顯示vector_shape和matrix_shape。

import numpy as np
vector = np.array([10, 20, 30])
matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
vector_shape = vector.shape
matrix_shape = matrix.shape
print(vector_shape, matrix_shape)
(3,) (3, 3)

4. Using NumPy

Learn

我們可以使用numpy.genfromtxt()函數讀取數據集谋减。我們的數據集牡彻，world_alcohol.csv是一個逗號分隔值數據集。我們可以使用delimiter參數指定分隔符：

import numpy
nfl = numpy.genfromtxt("data.csv", delimiter=",")

上述代碼將在名為data.csv文件的文件中讀入NumPy數組出爹。NumPy數組使用numpy.ndarray類來表示庄吼。我們將在我們的材料中引用ndarray對象作為NumPy數組。
以下是我們將使用的數據集的前幾行：

Paste_Image.png

每一行規(guī)定了一個國家每一個公民在某一年內喝了多少升酒精以政。第一排顯示霸褒，1986年越南一般人喝了多少升葡萄酒。
以下是每列表示的內容：

Year - 該行中的數據的年份盈蛮。
WHO Region - 該國所在的地區(qū)废菱。
Country - 數據所在的國家。
Beverage Types - 數據所用的飲料類型抖誉。
Display Value - 該國公民在一年中飲用的飲料類型的平均數殊轴。

Instructions

使用numpy.genfromtxt()函數將“world_alcohol.csv”讀入名為world_alcohol的NumPy數組；
使用type()和print()函數顯示world_alcohol的類型袒炉。

world_alcohol = np.genfromtxt('world_alcohol.csv', delimiter=',')
print(type(world_alcohol))
print(world_alcohol)
<class 'numpy.ndarray'>
[[             nan              nan              nan              nan
               nan]
 [  1.98600000e+03              nan              nan              nan
    0.00000000e+00]
 [  1.98600000e+03              nan              nan              nan
    5.00000000e-01]
 ..., 
 [  1.98600000e+03              nan              nan              nan
    2.54000000e+00]
 [  1.98700000e+03              nan              nan              nan
    0.00000000e+00]
 [  1.98600000e+03              nan              nan              nan
    5.15000000e+00]]

5. Data types

Learn

NumPy數組中的每個值必須具有相同的數據類型旁理。NumPy數據類型與Python數據類型相似，但差別很小我磁。您可以在這里找到完整的NumPy數據類型列表孽文。這里有一些常見的：

bool: Boolean.
- Can be True or False.
int: Integer values.
- Can be int16, int32, or int64. The suffix 16, 32, or 64 indicates the number of bits.
float: Floating point values.
- Can be float16, float32, or float64. The suffix 16, 32, or 64 indicates how many numbers after the decimal point the number can have.
string: String values.
- Can be string or unicode, which are two different ways a computer can store text.

NumPy將在讀取數據或將列表轉換為數組時自動找出適當的數據類型驻襟。你可以使用dtype屬性檢查NumPy數組的數據類型。

numbers = np.array([1, 2, 3, 4])
numbers.dtype

因為數字只包含整數芋哭，它的數據類型是int64沉衣。

Instructions

將world_alcohol的數據類型分配給變量world_alcohol_dtype；
使用print()函數顯示world_alcohol_dtype减牺。

world_alcohol_dtype = world_alcohol.dtype
print(world_alcohol_dtype)
float64

6. Inspecting the data

NumPy代表數據集的前幾行如下：

array([[             nan,              nan,              nan,              nan,              nan],
       [  1.98600000e+03,              nan,              nan,              nan,   0.00000000e+00],
       [  1.98600000e+03,              nan,              nan,              nan,   5.00000000e-01]])

我們還沒有介紹幾個概念豌习，我們將深入研究：

world_alcohol中的許多項目都是nan，包括整個第一行拔疚。nan代表“不是數字”肥隆，是用于表示缺失值的數據類型；
一些數字寫成1.98600000e+03稚失。

world_alcohol的數據類型是float栋艳。因為NumPy數組中的所有值都必須具有相同的數據類型，所以NumPy嘗試在讀入時將所有列轉換為浮點數墩虹。numpy.genfromtxt()函數將嘗試猜測其創(chuàng)建的數組的正確數據類型嘱巾。

在這種情況下，WHO Region,Country和Beverage Types實際上是字符串诫钓，不能轉換為浮點數旬昭。當NumPy不能將值轉換為像float或integer這樣的數值數據類型時，它使用一個代表“不是數字”的特殊的nan值菌湃。當值不存在時问拘，NumPy會分配一個na值，代表“不可用”惧所。nan和na值是丟失數據的類型骤坐。我們將在以后的任務中更多地了解如何處理丟失的數據。

world_alcohol.csv的整個第一行是一個標題行下愈，其中包含每列的名稱纽绍。這實際上不是數據的一部分，完全由字符串势似。由于該字符串不能轉換為適當浮動拌夏，NumPy的使用nan值來代表他們。

如果你還沒有看到科學計數法之前履因，你可能不認識數字一樣1.98600000e+03障簿。科學記數法是凝聚大或非常精確的數字是如何非常顯示的方式栅迄。我們可以代表100在科學記數法1e+02站故。

在這種情況下，1.98600000e+03實際上比1986長毅舆，但是NumPy默認以科學記譜法顯示數值，以計算更大或更精確的數字。

7. Reading in the data correctly

Learn

當使用numpy.genfromtxt()函數讀取數據時蔬浙，我們可以使用參數來自定義我們想要讀取數據的方式。在我們處理的時候顶滩，我們也可以指定我們要跳過標題行world_alcohol.csv。

要指定整個NumPy數組的數據類型寸爆，我們使用關鍵字參數dtype并將其設置為“U75”。這指定我們要讀取每個值作為75字節(jié)的unicode數據類型盐欺。我們稍后會更多地了解unicode和字節(jié)赁豆，但現(xiàn)在只要知道這將正確地讀入數據就足夠了。
要在讀取數據時跳過標題冗美，我們使用skip_header參數魔种。skip_header參數接受一個整數值，指定我們想要NumPy忽略的文件頂部的行數粉洼。

Instructions

當使用numpy.genfromtxt()讀入world_alcohol.csv時：
- 使用“U75”數據類型
- 跳過數據集中的第一行
- 使用逗號分隔符节预。
將結果分配給world_alcohol。
-使用print()函數顯示world_alcohol属韧。

world_alcohol = np.genfromtxt('world_alcohol.csv', delimiter=',', dtype='U75', skip_header=True )
print(type(world_alcohol))
print(world_alcohol)
<class 'numpy.ndarray'>
[['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']
 ['1986' 'Americas' 'Uruguay' 'Other' '0.5']
 ['1985' 'Africa' "Cte d'Ivoire" 'Wine' '1.62']
 ..., 
 ['1986' 'Europe' 'Switzerland' 'Spirits' '2.54']
 ['1987' 'Western Pacific' 'Papua New Guinea' 'Other' '0']
 ['1986' 'Africa' 'Swaziland' 'Other' '5.15']]

8. Indexing arrays

Learn

現(xiàn)在數據是正確的格式安拟，我們來學習如何探索它。我們可以像我們如何索引普通Python列表一樣索引NumPy數組宵喂。以下是我們如何索引NumPy向量：

vector = np.array([5, 10, 15, 20])
print(vector[0])

上面的代碼將打印向量的第一個元素5糠赦。
索引矩陣類似于列表的索引列表。以下是索引列表列表的更新：

first_item = list_of_lists[0]
first_item[2]

我們也可以把這樣的符號縮泄亍：

list_of_lists[0][2]

我們可以以類似的方式索引矩陣拙泽，但是我們將兩個索引都放在方括號內。第一個索引指定數據來自哪個行裸燎，第二個索引指定數據來自哪個列：

>> matrix = np.array([
                        [5, 10, 15], 
                        [20, 25, 30]
                     ])
>> matrix[1,2]
30

在上面的代碼中顾瞻，當我們索引矩陣時，我們將兩個索引傳遞到方括號中德绿。

Instructions

將烏拉圭人飲用的酒類在1986年的人均飲用量分配給uruguay_other_1986荷荤。這是第二行和第五列。
將第三行中的國家/地區(qū)分配給third_country脆炎。Country是第三列梅猿。

uruguay_other_1986 = world_alcohol[1,4]
third_country = world_alcohol[2,2]
print(uruguay_other_1986)
print(third_country)
0.5
Cte d'Ivoire

9. Slicing arrays

Learn

我們可以使用值切片來選擇數組的子集，就像我們可以使用列表一樣：

>> vector = np.array([5, 10, 15, 20])
>> vector[0:3]
array([ 5, 10, 15])

像列表一樣秒裕，向量切片是從第一個索引到但不包括第二個索引袱蚓。矩陣切片有點復雜，有四種形式：

matrix[:,1] （第2列的所有元素）
matrix[0:3,1] （第1-3行第2列的元素）
matrix[0:4,0:3] （第1-4行第1-3列的元素）
matrix[2, 3] （第1行第2列的元素）

我們將在此屏幕中進入第一個窗體几蜻。當我們要選擇一個整體喇潘，另一個元素体斩，我們可以這樣做：
（We'll dive into the first form in this screen. When we want to select one whole dimension, and an element from the other, we can do this:）

>> matrix = np.array([
                    [5, 10, 15], 
                    [20, 25, 30],
                    [35, 40, 45]
                 ])
>> matrix[:,1]
array([10, 25, 40])

這將選擇所有行，但只能選擇索引為1的列颖低。冒號本身：指定應選擇單個維度的整體絮吵。將冒號設為從第一個元素中選擇，直到并包括最后一個元素忱屑。（This will select all of the rows, but only the column with index 1. The colon by itself : specifies that the entirety of a single dimension should be selected. Think of the colon as selecting from the first element in a dimension up to and including the last element.）

Instructions

將整個第三列從world_alcohol分配給變量變量countries蹬敲。
將world_alcohol的第五列分配給變量alcohol_consumption。

countries = world_alcohol[:, 2]
alcohol_consumption = world_alcohol[:, 4]
print(countries)
print(alcohol_consumption)
['Viet Nam' 'Uruguay' "Cte d'Ivoire" ..., 'Switzerland' 'Papua New Guinea'
 'Swaziland']
['0' '0.5' '1.62' ..., '2.54' '0' '5.15']

10. Slicing one dimension

Learn

When we want to select one whole dimension, and a slice of the other, we need to use special notation:

>> matrix = np.array([
                    [5, 10, 15], 
                    [20, 25, 30],
                    [35, 40, 45]
                 ])
>> matrix[:,0:2]
array([[ 5, 10],
       [20, 25],
       [35, 40]])

We can select rows by specifying a colon in the columns area. The code below selects rows 1 and 2, and all of the columns.

>> matrix[1:3,:]
array([[20, 25, 30],
       [35, 40, 45]])

We can also select a single value along an entire dimension. The code belows selects rows 1 and 2 and column 1:

>> matrix[1:3,1]
array([25, 40])

Instructions

Assign all the rows and the first 2 columns of world_alcohol to first_two_columns
Assign the first 10 rows and the first column of world_alcohol to first_ten_years.
Assign the first 10 rows and all of the columns of world_alcohol to first_ten_rows.

first_two_columns = world_alcohol[:, 0:2]
first_ten_years   = world_alcohol[0:10, 0]
first_ten_rows    = world_alcohol[0:10, :] 
print(first_two_columns)
print(first_ten_years)
print(first_ten_rows)
[['1986' 'Western Pacific']
 ['1986' 'Americas']
 ['1985' 'Africa']
 ..., 
 ['1986' 'Europe']
 ['1987' 'Western Pacific']
 ['1986' 'Africa']]
['1986' '1986' '1985' '1986' '1987' '1987' '1987' '1985' '1986' '1984']
[['1986' 'Western Pacific' 'Viet Nam' 'Wine' '0']
 ['1986' 'Americas' 'Uruguay' 'Other' '0.5']
 ['1985' 'Africa' "Cte d'Ivoire" 'Wine' '1.62']
 ['1986' 'Americas' 'Colombia' 'Beer' '4.27']
 ['1987' 'Americas' 'Saint Kitts and Nevis' 'Beer' '1.98']
 ['1987' 'Americas' 'Guatemala' 'Other' '0']
 ['1987' 'Africa' 'Mauritius' 'Wine' '0.13']
 ['1985' 'Africa' 'Angola' 'Spirits' '0.39']
 ['1986' 'Americas' 'Antigua and Barbuda' 'Spirits' '1.55']
 ['1984' 'Africa' 'Nigeria' 'Other' '6.1']]

11. Slicing arrays

Learn

We can also slice along both dimensions simultaneously. The following code selects rows with index 1 and 2, and columns with index 0 and 1:

>> matrix = np.array([
                    [5, 10, 15], 
                    [20, 25, 30],
                    [35, 40, 45]
                 ])
>> matrix[1:3,0:2]
array([[20, 25],
       [35, 40]])

Instructions

Assign the first 20 rows of the columns at index 1 and 2 of world_alcohol to first_twenty_regions.

first_twenty_regions = world_alcohol[0:20, 1:3]
print(first_twenty_regions)
[['Western Pacific' 'Viet Nam']
 ['Americas' 'Uruguay']
 ['Africa' "Cte d'Ivoire"]
 ['Americas' 'Colombia']
 ['Americas' 'Saint Kitts and Nevis']
 ['Americas' 'Guatemala']
 ['Africa' 'Mauritius']
 ['Africa' 'Angola']
 ['Americas' 'Antigua and Barbuda']
 ['Africa' 'Nigeria']
 ['Africa' 'Botswana']
 ['Americas' 'Guatemala']
 ['Western Pacific' "Lao People's Democratic Republic"]
 ['Eastern Mediterranean' 'Afghanistan']
 ['Western Pacific' 'Viet Nam']
 ['Africa' 'Guinea-Bissau']
 ['Americas' 'Costa Rica']
 ['Africa' 'Seychelles']
 ['Europe' 'Norway']
 ['Africa' 'Kenya']]

12. Next steps

我們已經學到了NumPy庫的一些基礎知識莺戒，以及如何使用NumPy數組伴嗡。在接下來的任務中，我們將在此基礎上从铲，確定哪個國家消費最多的酒精瘪校。

最后編輯于：2017.12.10 11:51:58

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市名段，隨后出現(xiàn)的幾起案子阱扬，更是在濱河造成了極大的恐慌，老刑警劉巖伸辟，帶你破解...
沈念sama閱讀 221,888評論 6贊 515
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件麻惶，死亡現(xiàn)場離奇詭異，居然都是意外死亡信夫，警方通過查閱死者的電腦和手機用踩，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 94,677評論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來忙迁，“玉大人脐彩，你說我怎么就攤上這事℃⑷樱” “怎么了惠奸？”我有些...
開封第一講書人閱讀 168,386評論 0贊 360
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長恰梢。經常有香客問我佛南，道長，這世上最難降的妖魔是什么嵌言？我笑而不...
開封第一講書人閱讀 59,726評論 1贊 297
?港島之戀（遺憾婚禮）
正文為了忘掉前任嗅回，我火速辦了婚禮，結果婚禮上摧茴，老公的妹妹穿的比我還像新娘绵载。我一直安慰自己，他們只是感情好，可當我...
茶點故事閱讀 68,729評論 6贊 397
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布娃豹。她就那樣靜靜地躺著焚虱，像睡著了一般。火紅的嫁衣襯著肌膚如雪懂版。梳的紋絲不亂的頭發(fā)上鹃栽，一...
開封第一講書人閱讀 52,337評論 1贊 310
城市分裂傳說
那天，我揣著相機與錄音躯畴，去河邊找鬼民鼓。笑死，一個胖子當著我的面吹牛蓬抄，可吹牛的內容都是我干的摹察。我是一名探鬼主播，決...
沈念sama閱讀 40,902評論 3贊 421
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼倡鲸，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了黄娘？” 一聲冷哼從身側響起峭状，我...
開封第一講書人閱讀 39,807評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎逼争，沒想到半個月后优床，有當地人在樹林里發(fā)現(xiàn)了一具尸體，經...
沈念sama閱讀 46,349評論 1贊 318
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡誓焦，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 38,439評論 3贊 340
?白月光啟示錄
正文我和宋清朗相戀三年胆敞，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片杂伟。...
茶點故事閱讀 40,567評論 1贊 352
活死人
序言：一個原本活蹦亂跳的男人離奇死亡移层，死狀恐怖，靈堂內的尸體忽然破棺而出赫粥，到底是詐尸還是另有隱情观话，我是刑警寧澤，帶...
沈念sama閱讀 36,242評論 5贊 350
?日本核電站爆炸內幕
正文年R本政府宣布越平，位于F島的核電站频蛔，受9級特大地震影響，放射性物質發(fā)生泄漏秦叛。R本人自食惡果不足惜晦溪，卻給世界環(huán)境...
茶點故事閱讀 41,933評論 3贊 334
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望挣跋。院中可真熱鬧三圆，春花似錦、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,420評論 0贊 24
一樁弒父案哀澈，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至度气，卻和暖如春割按，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背磷籍。一陣腳步聲響...
開封第一講書人閱讀 33,531評論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工适荣，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人院领。一個月前我還...
沈念sama閱讀 48,995評論 3贊 377
代替公主和親
正文我出身青樓弛矛，卻偏偏與公主長得像，于是被迫代替她去往敵國和親比然。傳聞我的和親對象是個殘疾皇子丈氓，可洞房花燭夜當晚...
茶點故事閱讀 45,585評論 2贊 359

#2.1.1 Getting Started With NumPy

1. Introducing NumPy

2. Creating arrays

Learn

Instructions

3. Array shape

Learn

Instructions

4. Using NumPy

Learn

Instructions

5. Data types

Learn

Instructions

6. Inspecting the data

7. Reading in the data correctly

Learn

Instructions

8. Indexing arrays

Learn

Instructions

9. Slicing arrays

Learn

Instructions

10. Slicing one dimension

Learn

Instructions

11. Slicing arrays

Learn

Instructions

12. Next steps

推薦閱讀更多精彩內容