scipy常見數(shù)據(jù)結構：coo_matrix、csc_matrix與csr_matrix

scipy.sparse.coo_matrix

coo_matrix全稱是A sparse matrix in COOrdinate format沪铭，一種基于坐標格式的稀疏矩陣，每一個矩陣項是一個三元組（行,列,值)峦朗。
該矩陣的常見構造方法有如下幾種：

coo_matrix(D)
舉例如下:

import numpy as np
from scipy.sparse import coo_matrix
coo = coo_matrix(np.array([1, 2, 3, 4, 5, 6]).reshape((2,3)))
print(coo)

輸出為:

image.png

使用稠密二維數(shù)組構造

coo_matrix(S)
使用另外一個稀疏矩陣S構造。
coo_matrix((M, N), [dtype])
舉例如下：

from scipy.sparse import coo_matrix
coo_matrix((3, 4), dtype=np.int8).toarray()

輸出為：
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)

coo_matrix((data, (i, j)), [shape=(M, N)])
data即矩陣存儲的數(shù)據(jù)排龄，i為行下標波势，j為列下標，
data,i,j的關系為：A[i[k], j[k]] = data[k]
舉例如下：

from scipy.sparse import coo_matrix
row  = np.array([0, 3, 1, 0])
col  = np.array([0, 3, 1, 2])
data = np.array([4, 5, 7, 9])
coo_matrix((data, (row, col)), shape=(4, 4)).toarray()
pr

輸出為：
array([[4, 0, 9, 0],
[0, 7, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 5]])
如果行列坐標有重復橄维，對應的值直接累加尺铣，舉例如下：

row  = np.array([0, 0, 1, 3, 1, 0, 0])
col  = np.array([0, 2, 1, 3, 1, 0, 0])
data = np.array([1, 1, 1, 1, 1, 1, 1])
coo = coo_matrix((data, (row, col)), shape=(4, 4))
np.max(coo.data)
coo.toarray()

輸出為：
array([[3, 0, 1, 0],
[0, 2, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1]])

scipy.sparse.csr_matrix

csr是Compressed Sparse Row matrix的縮寫即壓縮稀疏基于行存儲的矩陣，好繞口争舞，該矩陣有如下幾種構造方法：

csr_matrix(D)
D是一個稠密矩陣或2維的ndarray
舉例如下:

import numpy as np
from scipy.sparse import csr_matrix
csr = csr_matrix(np.array([1, 2, 3, 4, 5, 6]).reshape((2,3)))
print(csr)

輸出為：

image.png

csr_matrix(S)
使用另外一個csr即S構造
csr_matrix((M, N), [dtype])
構造一個shape為(M,N)的dtype類型空矩陣
舉例如下:

import numpy as np
from scipy.sparse import csr_matrix
import numpy as np
from scipy.sparse import csr_matrix
csr_matrix((3, 4), dtype=np.int8).toarray()

輸出為：
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)

csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])
data,row_ind,col_ind的關系為：a[row_ind[k], col_ind[k]] = data[k]

row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csr_matrix((data, (row, col)), shape=(3, 3)).toarray()

輸出為：
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
按行存儲凛忿，即先存儲第0行，然后第1行竞川，依次到最后一行店溢，即先掃描row數(shù)組的數(shù)據(jù)，第一個數(shù)據(jù)是0即第0行委乌，然后掃描col的第一個數(shù)據(jù)是0即第0列床牧，那么第0行第0列存儲的值就是data的第一個數(shù)據(jù)即1，然后繼續(xù)掃描row的第二個數(shù)據(jù)還是0即還是第0行遭贸，col對應的第二個數(shù)據(jù)是2即第2列戈咳，data的第二個數(shù)據(jù)是2，即第0行第2列的數(shù)據(jù)是2壕吹，依次掃描row著蛙，找對應的col和data構造稀疏矩陣。

csr_matrix((data, indices, indptr), [shape=(M, N)])
這是標準的CSR表示方法算利，其中第i行的列下標存儲在indices[indptr[i]:indptr[i+1]]册踩，根據(jù)該公式可以得到行數(shù)即為indptr的長度減1，對應的列列值存儲在data[indptr[i]:indptr[i+1]]效拭。
舉例如下：

import numpy as np
from scipy.sparse import csr_matrix
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()

輸出為：
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])

scipy.sparse.csc_matrix

csc是Compressed Sparse Column matrix的縮寫即基于列存儲的壓縮稀疏矩陣暂吉，該矩陣有如下幾種構造方法：

csc_matrix(D)
使用一個二維數(shù)組構造胖秒，舉例如下：

import numpy as np
from scipy.sparse import csr_matrix
csc = csc_matrix(np.array([1, 2, 3, 4, 5, 6]).reshape((2,3)))
print(csc)

輸出如下：

image.png

和前面的csr的輸出對比可以看出該矩陣是按列逐個存儲。

csc_matrix(S)
使用另外一個csc構造慕的。
csc_matrix((M, N), [dtype])

import numpy as np
from scipy.sparse import csc_matrix
csc_matrix((3, 4), dtype=np.int8).toarray()

輸出如下：
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)

csc_matrix((data, (row_ind, col_ind)), [shape=(M, N)])
舉例如下：

import numpy as np
from scipy.sparse import csc_matrix
row = np.array([0, 2, 2, 0, 1, 2])
col = np.array([0, 0, 1, 2, 2, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csc_matrix((data, (row, col)), shape=(3, 3)).toarray()

輸出如下：
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6]])

csc_matrix((data, indices, indptr), [shape=(M, N)])
這是標準的csc矩陣表示方法阎肝，其中第i列的行下標存儲在indices[indptr[i]:indptr[i+1]]，對應的行值存儲在data[indptr[i]:indptr[i+1]]肮街。
舉例如下：

import numpy as np
from scipy.sparse import csc_matrix
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csc_matrix((data, indices, indptr), shape=(3, 3)).toarray()

輸出如下：
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6]])

coo_matrix风题、csc_matrix與csr_matrix的關系與用法

coo_matrix由于構造方便容易理解，所以通常都是先構造該矩陣然后調用tocsr和tocsc函數(shù)來獲取另外兩種矩陣的存儲嫉父。
csr_matrix支持快速的按行切片沛硅，而csc_matrix則支持快速按列切片操作。

scipy常見數(shù)據(jù)結構：coo_matrix矫限、csc_matrix與csr_matrix

scipy常見數(shù)據(jù)結構：coo_matrix彭雾、csc_matrix與csr_matrix

scipy.sparse.coo_matrix

scipy.sparse.csr_matrix

scipy.sparse.csc_matrix

coo_matrix风题、csc_matrix與csr_matrix的關系與用法