CS190 Scalable Machine Learning Spark - 1 Python基礎

CS190 Scalable Machine Learning Spark - 1


Python 基礎

Part 1: NumPy

NumPy is a Python library for working with arrays.

     # It is convention to import NumPy with the alias np
     import numpy as np

(1a) 標量相乘 Scalar multiplication

$ a $ is the scalar (constant) and $ \mathbf{v} $ is the vector
$$ a \mathbf{v} = \begin{bmatrix} a v_1 \\ a v_2 \\ \vdots \\ a v_n \end{bmatrix} $$

# Create a numpy array with the values 1, 2, 3
simpleArray = np.array([1,2,3])
# Perform the scalar product of 5 and the numpy array
timesFive = simpleArray * 5
print simpleArray
print timesFive
-----
#result
[1 2 3]
[5 10 15

(1b) 點乘 Element-wise multiplication and dot product

The element-wise calculation is as follows:

$$ \mathbf{x} \odot \mathbf{y} = \begin{bmatrix} x_1 y_1 \\ x_2 y_2 \\ \vdots \\ x_n y_n \end{bmatrix} $$

dot product is equivalent to performing element-wise multiplication and then summing the result匾二。

$ w \cdot x$ 也可以表示為 $ w^\top x $

$$ w \cdot x = \sum_{i=1}^n w_i x_i $$

Element-wise multiplication use the ***** operator to multiply two ndarray objects of the same length.
Dot product you can use either np.dot() or np.ndarray.dot()


# Create a ndarray based on a range and step size.
u = np.arange(0, 5, .5)
v = np.arange(5, 10, .5)

elementWise = u * v 
dotProduct = np.dot(u,v)

print 'u: {0}'.format(u)
print 'v: {0}'.format(v)
print '\nelementWise\n{0}'.format(elementWise)
print '\ndotProduct\n{0}'.format(dotProduct)

----
#result
u: [ 0.   0.5  1.   1.5  2.   2.5  3.   3.5  4.   4.5]
v: [ 5.   5.5  6.   6.5  7.   7.5  8.   8.5  9.   9.5]

elementWise
[  0.     2.75   6.     9.75  14.    18.75  24.    29.75  36.    42.75]

dotProduct
183.75

(1c) 矩陣計算 Matrix math

np.matrix() 生成矩陣

matrix math on NumPy matrices using *

轉置矩陣 transpose a matrix by calling numpy.matrix.transpose() or by using .T on the matrix object (e.g. myMatrix.T).

Transposing a matrix produces a matrix where the new rows are the columns from the old matrix. For example: $$ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}^\mathbf{\top} = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} $$

逆矩陣 Inverting a matrix can be done using numpy.linalg.inv().

Note that only square matrices can be inverted, and square matrices are not guaranteed to have an inverse. If the inverse exists, then multiplying a matrix by its inverse will produce the identity matrix. $ \scriptsize ( \mathbf{A}^{-1} \mathbf{A} = \mathbf{I_n} ) $ The identity matrix $ \scriptsize \mathbf{I_n} $ has ones along its diagonal and zero elsewhere. $$ \mathbf{I_n} = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix} $$

For this exercise, multiply $ \mathbf{A} $ times its transpose $ ( \mathbf{A}^\top ) $ and then calculate the inverse of the result $ ( [ \mathbf{A} \mathbf{A}^\top ]^{-1} ) $.

from numpy.linalg import inv

A = np.matrix([[1,2,3,4],[5,6,7,8]])
print 'A:\n{0}'.format(A)
# Print A transpose
print '\nA transpose:\n{0}'.format(A.T)

# Multiply A by A transpose
AAt = A * A.T
print '\nAAt:\n{0}'.format(AAt)

# Invert AAt with np.linalg.inv()
AAtInv = np.linalg.inv(AAt)
print '\nAAtInv:\n{0}'.format(AAtInv)

# Show inverse times matrix equals identity
# We round due to numerical precision
print '\nAAtInv * AAt:\n{0}'.format((AAtInv * AAt).round(4))
print '\nAAtInv * AAt:\n{0}'.format((AAtInv * AAt).round(4))

result

A:
[[1 2 3 4]
[5 6 7 8]]

A transpose:
[[1 5]
[2 6]
[3 7]
[4 8]]

AAt:
[[ 30 70]
[ 70 174]]

AAtInv:
[[ 0.54375 -0.21875]
[-0.21875 0.09375]]

AAtInv * AAt:
[[ 1. 0.]
[-0. 1.]]

AAtInv * AAt:
[[ 1. 0.]
[-0. 1.]]


Part 2: Additional NumPy and Spark linear algebra

(2a) Slices

features = np.array([1, 2, 3, 4])
print 'features:\n{0}'.format(features)

# The first three elements of features
firstThree = features[0:3]

# The last three elements of features
lastThree = features[-3:]

(2b) Combining ndarray objects

np.hstack(), which allows you to combine arrays column-wise,
np.vstack(), which allows you to combine arrays row-wise.
Note that both np.hstack() and np.vstack() take in a tuple of arrays as their first argument.
To horizontally combine three arrays a, b, and c, you would run np.hstack((a, b, c)).

If we had two arrays: a = [1, 2, 3, 4] and b = [5, 6, 7, 8], we could use np.vstack((a, b)) to produce the two-dimensional array: $$ \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \end{bmatrix} $$

zeros = np.zeros(8)
ones = np.ones(8)
print 'zeros:\n{0}'.format(zeros)
print '\nones:\n{0}'.format(ones)

zerosThenOnes = np.hstack((zeros,ones))   # A 1 by 16 array
zerosAboveOnes = np.vstack((zeros,ones)) # A 2 by 8 array

print '\nzerosThenOnes:\n{0}'.format(zerosThenOnes)
print '\nzerosAboveOnes:\n{0}'.format(zerosAboveOnes)

result:
zeros:
[ 0. 0. 0. 0. 0. 0. 0. 0.]
ones:
[ 1. 1. 1. 1. 1. 1. 1. 1.]
zerosThenOnes:
[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1.]

zerosAboveOnes:
[[ 0. 0. 0. 0. 0. 0. 0. 0.]
[ 1. 1. 1. 1. 1. 1. 1. 1.]]


(2c) PySpark's DenseVector

PySpark provides a DenseVector class within the module pyspark.mllib.linalg.

DenseVector is used to store arrays of values for use in PySpark. DenseVector actually stores values in a NumPy array and delegates calculations to that object. You can create a new DenseVector using DenseVector() and passing in an NumPy array or a Python list.

Note that DenseVector stores all values as np.float64

DenseVector objects exist locally and are not inherently distributed. DenseVector objects can be used in the distributed setting by either passing functions that contain them to resilient distributed dataset (RDD) transformations or by distributing them directly as RDDs.

from pyspark.mllib.linalg import DenseVector

numpyVector = np.array([-3, -4, 5])
print '\nnumpyVector:\n{0}'.format(numpyVector)

# Create a DenseVector consisting of the values [3.0, 4.0, 5.0]
myDenseVector = DenseVector([3,4,5])
# Calculate the dot product between the two vectors.
denseDotProduct = DenseVector.dot(myDenseVector,numpyVector)

print 'myDenseVector:\n{0}'.format(myDenseVector)
print '\ndenseDotProduct:\n{0}'.format(denseDotProduct)

numpyVector:
[-3 -4 5]
myDenseVector:
[3.0,4.0,5.0]
denseDotProduct:
0.0


Part 3: Python lambda expressions

Lambda 是匿名函數(shù)

一些鏈接: Lambda Functions, Lambda Tutorial, and Python Functions.

# Example function
def addS(x):
    return x + 's'
#lambda 形式
addSLambda = lambda x: x + 's'

# 乘法
multiplyByTen = lambda x: x * 10
print multiplyByTen(5)

#lambda fewer steps than def 
# The first function should add two values, while the second function should subtract the second  value from the first value.
def plus(x, y):
    return x + y

def minus(x, y):
    return x - y

functions = [plus, minus]
print functions[0](4, 5)
print functions[1](4, 5)

# lambda
lambdaFunctions = [lambda x,y : x+y ,  lambda x,y: x-y]
print lambdaFunctions[0](4, 5)
print lambdaFunctions[1](4, 5)

Lambda expressions consist of a single expression statement and cannot contain other simple statements. In short, this means that the lambda expression needs to evaluate to a value and exist on a single logical line. If more complex logic is necessary, use def in place of lambda.
Expression statements evaluate to a value (sometimes that value is None). Lambda expressions automatically return the value of their expression statement. In fact, a return statement in a lambda would raise a SyntaxError.
The following Python keywords refer to simple statements that cannot be used in a lambda expression: assert, pass, del, print, return, yield, raise, break, continue, import, global, and exec. Also, note that assignment statements (=) and augmented assignment statements (e.g. +=) cannot be used either.

最后編輯于
?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末态罪,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌湾戳,老刑警劉巖,帶你破解...
    沈念sama閱讀 218,858評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件广料,死亡現(xiàn)場離奇詭異院塞,居然都是意外死亡,警方通過查閱死者的電腦和手機性昭,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,372評論 3 395
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來县遣,“玉大人糜颠,你說我怎么就攤上這事∠羟螅” “怎么了其兴?”我有些...
    開封第一講書人閱讀 165,282評論 0 356
  • 文/不壞的土叔 我叫張陵,是天一觀的道長夸政。 經(jīng)常有香客問我元旬,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,842評論 1 295
  • 正文 為了忘掉前任匀归,我火速辦了婚禮坑资,結果婚禮上,老公的妹妹穿的比我還像新娘穆端。我一直安慰自己袱贮,他們只是感情好,可當我...
    茶點故事閱讀 67,857評論 6 392
  • 文/花漫 我一把揭開白布体啰。 她就那樣靜靜地躺著攒巍,像睡著了一般。 火紅的嫁衣襯著肌膚如雪荒勇。 梳的紋絲不亂的頭發(fā)上柒莉,一...
    開封第一講書人閱讀 51,679評論 1 305
  • 那天,我揣著相機與錄音沽翔,去河邊找鬼兢孝。 笑死,一個胖子當著我的面吹牛搀擂,可吹牛的內(nèi)容都是我干的西潘。 我是一名探鬼主播,決...
    沈念sama閱讀 40,406評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼哨颂,長吁一口氣:“原來是場噩夢啊……” “哼喷市!你這毒婦竟也來了?” 一聲冷哼從身側響起威恼,我...
    開封第一講書人閱讀 39,311評論 0 276
  • 序言:老撾萬榮一對情侶失蹤品姓,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后箫措,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體腹备,經(jīng)...
    沈念sama閱讀 45,767評論 1 315
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,945評論 3 336
  • 正文 我和宋清朗相戀三年斤蔓,在試婚紗的時候發(fā)現(xiàn)自己被綠了植酥。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,090評論 1 350
  • 序言:一個原本活蹦亂跳的男人離奇死亡弦牡,死狀恐怖友驮,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情驾锰,我是刑警寧澤卸留,帶...
    沈念sama閱讀 35,785評論 5 346
  • 正文 年R本政府宣布,位于F島的核電站椭豫,受9級特大地震影響耻瑟,放射性物質發(fā)生泄漏旨指。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,420評論 3 331
  • 文/蒙蒙 一喳整、第九天 我趴在偏房一處隱蔽的房頂上張望谆构。 院中可真熱鬧,春花似錦算柳、人聲如沸低淡。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,988評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽蔗蹋。三九已至,卻和暖如春囱淋,著一層夾襖步出監(jiān)牢的瞬間猪杭,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,101評論 1 271
  • 我被黑心中介騙來泰國打工妥衣, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留皂吮,地道東北人。 一個月前我還...
    沈念sama閱讀 48,298評論 3 372
  • 正文 我出身青樓税手,卻偏偏與公主長得像蜂筹,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子芦倒,可洞房花燭夜當晚...
    茶點故事閱讀 45,033評論 2 355

推薦閱讀更多精彩內(nèi)容

  • 我是一只小螞蟻艺挪,或許在你的眼里我連小螞蟻都不是,因為我沒有被你就這個問題去想象過兵扬,所以我連一只小螞蟻也不是麻裳。我一點...
    菜小齊閱讀 327評論 0 2
  • 是第一次不在家里住津坑,沒有了媽媽的陪伴?是第一次遠行傲霸,看見了前方的高山疆瑰?是第一次戀愛,體驗了愛情的美好昙啄?是第一次...
    衛(wèi)小花閱讀 183評論 0 1
  • 這是一段我很有感觸的短文穆役。就幾句。很喜歡跟衅。 女人一定要做一個手心朝下的女人,不管你多漂亮播歼,在你伸手要錢的那一刻你就...
    幽藍閱讀 178評論 0 0