Python DataScience Handbook 學習筆記1
第一部分 numpy
相比較于python內(nèi)置的數(shù)據(jù)類型,numpy提供了更為高效的數(shù)據(jù)操作.
首先我們要了解一下python內(nèi)置的數(shù)據(jù)類型.以Integer為例纳决,C代碼的實現(xiàn)如下
# This code illustrates why python allows dynamic typing
struct _longobject {
long ob_refcnt;
PyTypeObject *ob_type;
size_t ob_size;
long ob_digit[1];
};
int 類型在實現(xiàn)中是一個指向上述結構體的指針塞蹭;
numpy中的核心:array
numpy array 與 list的對比可以通過下圖來體會:
創(chuàng)建
接下來我們通過實例來看一下在numpy中如何簡單優(yōu)雅地創(chuàng)建數(shù)組
In [1]: import numpy as np
In [2]: np.__version__
Out[2]: '1.13.3'
In [3]: np?
In [4]: np.array([3.14, 3, 1, 2])
Out[4]: array([ 3.14, 3. , 1. , 2. ])
In [5]: np.zeros((3, 5), dtype=int)
Out[5]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
In [6]: np.arange(0,20,2)
Out[6]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
In [7]: np.linspace(0,1,5)
Out[7]: array([ 0. , 0.25, 0.5 , 0.75, 1. ])
In [8]: np.random.random((3,3))
Out[8]:
array([[ 0.43170959, 0.10099413, 0.45859315],
[ 0.62548971, 0.57233299, 0.6632921 ],
[ 0.74947709, 0.31867245, 0.05988924]])
In [9]: np.random.normal(0, 1, (3,3))
Out[9]:
array([[-1.45242445, -1.27771487, 1.39220407],
[-0.66294773, -1.56926783, -0.02177722],
[ 1.0318081 , -0.87103441, 0.78930381]])
In [10]: np.random.randint(0, 10, (3, 3))
Out[10]:
array([[0, 5, 8],
[2, 7, 7],
[5, 0, 5]])
In [11]: np.zeros(10, dtype=np.complex128)
Out[11]:
array([ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
0.+0.j, 0.+0.j, 0.+0.j])
基本操作
對于一維的數(shù)組,與python原生的操作非常相似姻几,在此不在贅述。我在這里列出了一些較為fancy的部分.
In [5]: x2 = np.random.randint(15, size = (3,5), dtype='int')
In [6]: x2
Out[6]:
array([[ 8, 8, 5, 11, 13],
[ 2, 14, 2, 9, 6],
[ 8, 14, 6, 4, 9]])
In [7]: x2[::-1, ::-1]
Out[7]:
array([[ 9, 4, 6, 14, 8],
[ 6, 9, 2, 14, 2],
[13, 11, 5, 8, 8]])
與matlab類似,numpy可以通過:符號來實現(xiàn)整行整列的訪問
x2[:, 0] # first column of x2
x2[0, :] # first row of x2
接下來我們要強調非常重要的一點:在對numpy中的array作slice等操作時臭埋,與原生列表有很大的不同,主要表現(xiàn)為它會產(chǎn)生一個"view"而非一個"copy"臀玄。通俗的說就是它不重新分配內(nèi)存瓢阴,創(chuàng)建列表,而是直接在原始數(shù)據(jù)上操作健无。
In [8]: x = [1,2,3,4,5]
In [9]: y = np.array([1,2,3,4,5])
In [10]: copy = x[1:3]
In [12]: copy[1] = 1
In [13]: copy
Out[13]: [2, 1]
In [14]: not_copy = y[1:3]
In [16]: not_copy[1] = 1
In [17]: not_copy
Out[17]: array([2, 1])
In [18]: x
Out[18]: [1, 2, 3, 4, 5]
In [19]: y
Out[19]: array([1, 2, 1, 4, 5])
當然荣恐,只要顯式地調用copy()就能創(chuàng)建一個copy而非view.
x2_sub_copy = x2[:2, :2].copy()
reshape
x = np.array([1, 2, 3])
# row vector via reshape
x.reshape((1, 3))
Out[39]:
array([[1, 2, 3]])
In [40]:
# row vector via newaxis
x[np.newaxis, :]
Out[40]:
array([[1, 2, 3]])
In [41]:
# column vector via reshape
x.reshape((3, 1))
Out[41]:
array([[1],
[2],
[3]])
In [42]:
# column vector via newaxis
x[:, np.newaxis]
Out[42]:
array([[1],
[2],
[3]])
Concatenation
grid = np.array([[1, 2, 3],
[4, 5, 6]])
In [46]:
# concatenate along the first axis
np.concatenate([grid, grid])
Out[46]:
array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
In [47]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)
Out[47]:
array([[1, 2, 3, 1, 2, 3],
[4, 5, 6, 4, 5, 6]])
Splitting
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
[1 2 3] [99 99] [3 2 1]
In [23]: grid = np.arange(16).reshape((4,4))
In [24]: grid
Out[24]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [25]: a, b = np.vsplit(grid, [3])
In [26]: a
Out[26]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [27]: b
Out[27]: array([[12, 13, 14, 15]])