CS190 Scalable Machine Learning Spark - 1 Python基礎

CS190 Scalable Machine Learning Spark - 1

Python 基礎

Part 1: NumPy

NumPy is a Python library for working with arrays.

     # It is convention to import NumPy with the alias np
     import numpy as np

(1a) 標量相乘 Scalar multiplication

$ a $ is the scalar (constant) and $ \mathbf{v} $ is the vector
$$ a \mathbf{v} = \begin{bmatrix} a v_1 \\ a v_2 \\ \vdots \\ a v_n \end{bmatrix} $$

# Create a numpy array with the values 1, 2, 3
simpleArray = np.array([1,2,3])
# Perform the scalar product of 5 and the numpy array
timesFive = simpleArray * 5
print simpleArray
print timesFive
-----
#result
[1 2 3]
[5 10 15

(1b) 點乘 Element-wise multiplication and dot product

The element-wise calculation is as follows:

$$ \mathbf{x} \odot \mathbf{y} = \begin{bmatrix} x_1 y_1 \\ x_2 y_2 \\ \vdots \\ x_n y_n \end{bmatrix} $$

dot product is equivalent to performing element-wise multiplication and then summing the result匾二。

$ w \cdot x$ 也可以表示為 $ w^\top x $

$$ w \cdot x = \sum_{i=1}^n w_i x_i $$

Element-wise multiplication use the ***** operator to multiply two ndarray objects of the same length.
Dot product you can use either np.dot() or np.ndarray.dot()


# Create a ndarray based on a range and step size.
u = np.arange(0, 5, .5)
v = np.arange(5, 10, .5)

elementWise = u * v 
dotProduct = np.dot(u,v)

print 'u: {0}'.format(u)
print 'v: {0}'.format(v)
print '\nelementWise\n{0}'.format(elementWise)
print '\ndotProduct\n{0}'.format(dotProduct)

----
#result
u: [ 0.   0.5  1.   1.5  2.   2.5  3.   3.5  4.   4.5]
v: [ 5.   5.5  6.   6.5  7.   7.5  8.   8.5  9.   9.5]

elementWise
[  0.     2.75   6.     9.75  14.    18.75  24.    29.75  36.    42.75]

dotProduct
183.75

(1c) 矩陣計算 Matrix math

np.matrix() 生成矩陣

matrix math on NumPy matrices using `*`

轉置矩陣 transpose a matrix by calling numpy.matrix.transpose() or by using `.T` on the matrix object (e.g. `myMatrix.T`).

Transposing a matrix produces a matrix where the new rows are the columns from the old matrix. For example: $$ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}^\mathbf{\top} = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} $$

逆矩陣 Inverting a matrix can be done using numpy.linalg.inv().

Note that only square matrices can be inverted, and square matrices are not guaranteed to have an inverse. If the inverse exists, then multiplying a matrix by its inverse will produce the identity matrix. $ \scriptsize ( \mathbf{A}^{-1} \mathbf{A} = \mathbf{I_n} ) $ The identity matrix $ \scriptsize \mathbf{I_n} $ has ones along its diagonal and zero elsewhere. $$ \mathbf{I_n} = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix} $$

For this exercise, multiply $ \mathbf{A} $ times its transpose $ ( \mathbf{A}^\top ) $ and then calculate the inverse of the result $ ( [ \mathbf{A} \mathbf{A}^\top ]^{-1} ) $.

from numpy.linalg import inv

A = np.matrix([[1,2,3,4],[5,6,7,8]])
print 'A:\n{0}'.format(A)
# Print A transpose
print '\nA transpose:\n{0}'.format(A.T)

# Multiply A by A transpose
AAt = A * A.T
print '\nAAt:\n{0}'.format(AAt)

# Invert AAt with np.linalg.inv()
AAtInv = np.linalg.inv(AAt)
print '\nAAtInv:\n{0}'.format(AAtInv)

# Show inverse times matrix equals identity
# We round due to numerical precision
print '\nAAtInv * AAt:\n{0}'.format((AAtInv * AAt).round(4))
print '\nAAtInv * AAt:\n{0}'.format((AAtInv * AAt).round(4))

result

A:
[[1 2 3 4]
[5 6 7 8]]

A transpose:
[[1 5]
[2 6]
[3 7]
[4 8]]

AAt:
[[ 30 70]
[ 70 174]]

AAtInv:
[[ 0.54375 -0.21875]
[-0.21875 0.09375]]

AAtInv * AAt:
[[ 1. 0.]
[-0. 1.]]

Part 2: Additional NumPy and Spark linear algebra

(2a) Slices

features = np.array([1, 2, 3, 4])
print 'features:\n{0}'.format(features)

# The first three elements of features
firstThree = features[0:3]

# The last three elements of features
lastThree = features[-3:]

(2b) Combining `ndarray` objects

np.hstack(), which allows you to combine arrays column-wise,
np.vstack(), which allows you to combine arrays row-wise.
Note that both np.hstack() and np.vstack() take in a tuple of arrays as their first argument.
To horizontally combine three arrays a, b, and c, you would run np.hstack((a, b, c)).

If we had two arrays: `a = [1, 2, 3, 4]` and `b = [5, 6, 7, 8]`, we could use `np.vstack((a, b))` to produce the two-dimensional array: $$ \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \end{bmatrix} $$

zeros = np.zeros(8)
ones = np.ones(8)
print 'zeros:\n{0}'.format(zeros)
print '\nones:\n{0}'.format(ones)

zerosThenOnes = np.hstack((zeros,ones))   # A 1 by 16 array
zerosAboveOnes = np.vstack((zeros,ones)) # A 2 by 8 array

print '\nzerosThenOnes:\n{0}'.format(zerosThenOnes)
print '\nzerosAboveOnes:\n{0}'.format(zerosAboveOnes)

result:
zeros:
[ 0. 0. 0. 0. 0. 0. 0. 0.]
ones:
[ 1. 1. 1. 1. 1. 1. 1. 1.]
zerosThenOnes:
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1.]

zerosAboveOnes:
[[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 1. 1. 1. 1. 1. 1. 1. 1.]]

(2c) PySpark's DenseVector

PySpark provides a DenseVector class within the module pyspark.mllib.linalg.

DenseVector is used to store arrays of values for use in PySpark. DenseVector actually stores values in a NumPy array and delegates calculations to that object. You can create a new DenseVector using DenseVector() and passing in an NumPy array or a Python list.

Note that `DenseVector` stores all values as `np.float64`

DenseVector objects exist locally and are not inherently distributed. DenseVector objects can be used in the distributed setting by either passing functions that contain them to resilient distributed dataset (RDD) transformations or by distributing them directly as RDDs.

from pyspark.mllib.linalg import DenseVector

numpyVector = np.array([-3, -4, 5])
print '\nnumpyVector:\n{0}'.format(numpyVector)

# Create a DenseVector consisting of the values [3.0, 4.0, 5.0]
myDenseVector = DenseVector([3,4,5])
# Calculate the dot product between the two vectors.
denseDotProduct = DenseVector.dot(myDenseVector,numpyVector)

print 'myDenseVector:\n{0}'.format(myDenseVector)
print '\ndenseDotProduct:\n{0}'.format(denseDotProduct)

numpyVector:
[-3 -4 5]
myDenseVector:
[3.0,4.0,5.0]
denseDotProduct:
0.0

Part 3: Python lambda expressions

Lambda 是匿名函數(shù)

一些鏈接： Lambda Functions, Lambda Tutorial, and Python Functions.

# Example function
def addS(x):
    return x + 's'
#lambda 形式
addSLambda = lambda x: x + 's'

# 乘法
multiplyByTen = lambda x: x * 10
print multiplyByTen(5)

#lambda fewer steps than def 
# The first function should add two values, while the second function should subtract the second  value from the first value.
def plus(x, y):
    return x + y

def minus(x, y):
    return x - y

functions = [plus, minus]
print functions[0](4, 5)
print functions[1](4, 5)

# lambda
lambdaFunctions = [lambda x,y : x+y ,  lambda x,y: x-y]
print lambdaFunctions[0](4, 5)
print lambdaFunctions[1](4, 5)

Lambda expressions consist of a single expression statement and cannot contain other simple statements. In short, this means that the lambda expression needs to evaluate to a value and exist on a single logical line. If more complex logic is necessary, use def in place of lambda.
Expression statements evaluate to a value (sometimes that value is None). Lambda expressions automatically return the value of their expression statement. In fact, a return statement in a lambda would raise a SyntaxError.
The following Python keywords refer to simple statements that cannot be used in a lambda expression: assert, pass, del, print, return, yield, raise, break, continue, import, global, and exec. Also, note that assignment statements (=) and augmented assignment statements (e.g. +=) cannot be used either.

最后編輯于：2017.11.27 03:08:00

?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末态罪，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌湾戳，老刑警劉巖，帶你破解...
沈念sama閱讀 218,858評論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件广料，死亡現(xiàn)場離奇詭異院塞，居然都是意外死亡，警方通過查閱死者的電腦和手機性昭，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,372評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來县遣，“玉大人糜颠，你說我怎么就攤上這事∠羟螅” “怎么了其兴？”我有些...
開封第一講書人閱讀 165,282評論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長夸政。經(jīng)常有香客問我元旬，道長，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,842評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任匀归，我火速辦了婚禮坑资，結果婚禮上，老公的妹妹穿的比我還像新娘穆端。我一直安慰自己袱贮，他們只是感情好，可當我...
茶點故事閱讀 67,857評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布体啰。她就那樣靜靜地躺著攒巍，像睡著了一般。火紅的嫁衣襯著肌膚如雪荒勇。梳的紋絲不亂的頭發(fā)上柒莉，一...
開封第一講書人閱讀 51,679評論 1贊 305
城市分裂傳說
那天，我揣著相機與錄音沽翔，去河邊找鬼兢孝。笑死，一個胖子當著我的面吹牛搀擂，可吹牛的內(nèi)容都是我干的西潘。我是一名探鬼主播，決...
沈念sama閱讀 40,406評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼哨颂，長吁一口氣：“原來是場噩夢啊……” “哼喷市！你這毒婦竟也來了？” 一聲冷哼從身側響起威恼，我...
開封第一講書人閱讀 39,311評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤品姓，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后箫措，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體腹备，經(jīng)...
沈念sama閱讀 45,767評論 1贊 315
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,945評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年斤蔓，在試婚紗的時候發(fā)現(xiàn)自己被綠了植酥。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 40,090評論 1贊 350
活死人
序言：一個原本活蹦亂跳的男人離奇死亡弦牡，死狀恐怖友驮，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情驾锰，我是刑警寧澤卸留，帶...
沈念sama閱讀 35,785評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站椭豫，受9級特大地震影響耻瑟，放射性物質發(fā)生泄漏旨指。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 41,420評論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一喳整、第九天我趴在偏房一處隱蔽的房頂上張望谆构。院中可真熱鬧，春花似錦算柳、人聲如沸低淡。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,988評論 0贊 22
一樁弒父案瞬项，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽蔗蹋。三九已至，卻和暖如春囱淋，著一層夾襖步出監(jiān)牢的瞬間猪杭，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 33,101評論 1贊 271
情欲美人皮
我被黑心中介騙來泰國打工妥衣，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留皂吮，地道東北人。一個月前我還...
沈念sama閱讀 48,298評論 3贊 372
代替公主和親
正文我出身青樓税手，卻偏偏與公主長得像蜂筹，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子芦倒，可洞房花燭夜當晚...
茶點故事閱讀 45,033評論 2贊 355