scikit-learn數(shù)據(jù)集

我們將介紹sklearn中的數(shù)據(jù)集類绣檬，模塊包括用于加載數(shù)據(jù)集的實(shí)用程序拓诸，包括加載和獲取流行參考數(shù)據(jù)集的方法。它還具有一些人工數(shù)據(jù)生成器。

sklearn數(shù)據(jù)集

sklearn數(shù)據(jù)集.png

sklearn.datasets

（1）datasets.load_*()

獲取小規(guī)模數(shù)據(jù)集蝶棋，數(shù)據(jù)包含在datasets里

（2）datasets.fetch_*()

獲取大規(guī)模數(shù)據(jù)集坷衍，需要從網(wǎng)絡(luò)上下載疙渣，函數(shù)的第一個參數(shù)是data_home且轨，表示數(shù)據(jù)集下載的目錄，默認(rèn)是 ~/scikit_learn_data/魄健，要修改默認(rèn)目錄赋铝，可以修改環(huán)境變量SCIKIT_LEARN_DATA

（3）datasets.make_*()

本地生成數(shù)據(jù)集

load*和 fetch* 函數(shù)返回的數(shù)據(jù)類型是 datasets.base.Bunch，本質(zhì)上是一個 dict沽瘦，它的鍵值對可用通過對象的屬性方式訪問柬甥。主要包含以下屬性：
- data：特征數(shù)據(jù)數(shù)組，是 n_samples * n_features 的二維 numpy.ndarray 數(shù)組
- target：標(biāo)簽數(shù)組其垄，是 n_samples 的一維 numpy.ndarray 數(shù)組
- DESCR：數(shù)據(jù)描述
- feature_names：特征名
- target_names：標(biāo)簽名
數(shù)據(jù)集目錄可以通過datasets.get_data_home()獲取苛蒲，clear_data_home(data_home=None)刪除所有下載數(shù)據(jù)
- datasets.get_data_home(data_home=None)
返回scikit學(xué)習(xí)數(shù)據(jù)目錄的路徑。這個文件夾被一些大的數(shù)據(jù)集裝載器使用绿满，以避免下載數(shù)據(jù)臂外。默認(rèn)情況下，數(shù)據(jù)目錄設(shè)置為用戶主文件夾中名為“scikit_learn_data”的文件夾喇颁÷┙。或者，可以通過“SCIKIT_LEARN_DATA”環(huán)境變量或通過給出顯式的文件夾路徑以編程方式設(shè)置它橘霎。'?'符號擴(kuò)展到用戶主文件夾蔫浆。如果文件夾不存在，則會自動創(chuàng)建姐叁。
- sklearn.datasets.clear_data_home(data_home=None)
刪除存儲目錄中的數(shù)據(jù)

獲取小數(shù)據(jù)集

用于分類

sklearn.datasets.load_iris

鳶尾花數(shù)據(jù)集采集的是鳶尾花的測量數(shù)據(jù)以及其所屬的類別瓦盛。測量數(shù)據(jù)包括：萼片長度洗显、萼片寬度、花瓣長度原环、花瓣寬度挠唆。類別共分為三類：Iris Setosa，Iris Versicolour嘱吗，Iris Virginica玄组。該數(shù)據(jù)集可用于多分類問題。
加載數(shù)據(jù)集其參數(shù)有：
? return_X_y:

若為True谒麦，則以（data, target）元組形式返回?cái)?shù)據(jù)俄讹；默認(rèn)為False，表示以字典形式返回?cái)?shù)據(jù)全部信息（包括data和target）绕德。

from sklearn.datasets import  load_iris
data = load_iris(return_X_y=True)

from sklearn.datasets import  load_iris
data = load_iris()
#查看data所具有的屬性或方法
print(dir(data))
print('*'*80)
#查看數(shù)據(jù)集的描述
print(data.DESCR)
print('*'*80)
#查看數(shù)據(jù)的特征名
print(data.feature_names)
#print(data.data)
print('*'*80)
#查看數(shù)據(jù)的分類名
print(data.target_names)
print('*'*80)
print(data.target)
print('*'*80)
#查看第2颅悉、11、101個樣本的目標(biāo)值
print(data.target[[1,10, 100]])

['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']
********************************************************************************
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
    ============== ==== ==== ======= ===== ====================

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988
            
   '''       部分省略      '''

********************************************************************************
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
********************************************************************************
['setosa' 'versicolor' 'virginica']
********************************************************************************
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
********************************************************************************
[0 0 2]

sklearn.datasets.load_digits

手寫數(shù)字?jǐn)?shù)據(jù)集包括1797個0-9的手寫數(shù)字?jǐn)?shù)據(jù)迁匠，每個數(shù)字由8*8大小的矩陣構(gòu)成，矩陣中值的范圍是0-16驹溃，代表顏色的深度城丧。
加載數(shù)據(jù)集其參數(shù)包括：
? return_X_y:若為True，則以（data, target）形式返回?cái)?shù)據(jù)豌鹤；默認(rèn)為False亡哄，表示以字典形式返回?cái)?shù)據(jù)全部信息（包括data和target）；
? n_class：表示返回?cái)?shù)據(jù)的類別數(shù)布疙，默認(rèn)= 10蚊惯，如：n_class=5,則返回0到4的數(shù)據(jù)樣本。

from sklearn.datasets import load_digits
digits = load_digits(n_class=5,return_X_y=False)
#查看第1-10個樣本的目標(biāo)值
print(digits.target[0:10])

[0 1 2 3 4 0 1 2 3 4]

import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
digits = load_digits(n_class=10,return_X_y=False)
print(dir(digits))
print('*'*80)
print(digits.DESCR)
print('*'*80)
print(digits.data)
print('*'*80)
print(digits.target_names)
print('*'*80)
print(digits.target[[2,20,200]])
print('*'*80)
print(digits.images.shape)
plt.matshow(digits.images[1])
plt.savefig('手寫數(shù)字1')
plt.show()

['DESCR', 'data', 'images', 'target', 'target_names']
********************************************************************************
.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 5620
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998
'''       部分省略      '''
********************************************************************************
[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 ...
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]
********************************************************************************
[0 1 2 3 4 5 6 7 8 9]
********************************************************************************
[2 0 1]
********************************************************************************
(1797, 8, 8)

手寫數(shù)字1.png

用于回歸

sklearn.datasets.load_boston

波士頓房價數(shù)據(jù)集包含506組數(shù)據(jù)灵临，每條數(shù)據(jù)包含房屋以及房屋周圍的詳細(xì)信息截型。其中包括城鎮(zhèn)犯罪率、一氧化氮濃度儒溉、住宅平均房間數(shù)宦焦、到中心區(qū)域的加權(quán)距離以及自住房平均房價等。
波士頓房價數(shù)據(jù)集屬性描述
CRIM：城鎮(zhèn)人均犯罪率顿涣。
ZN：住宅用地超過 25000 sq.ft. 的比例波闹。
INDUS：城鎮(zhèn)非零售商用土地的比例。
CHAS：查理斯河空變量（如果邊界是河流涛碑，則為1精堕；否則為0）
NOX：一氧化氮濃度。
RM：住宅平均房間數(shù)蒲障。
AGE：1940 年之前建成的自用房屋比例歹篓。
DIS：到波士頓五個中心區(qū)域的加權(quán)距離瘫证。
RAD：輻射性公路的接近指數(shù)。
TAX：每 10000 美元的全值財(cái)產(chǎn)稅率滋捶。
PTRATIO：城鎮(zhèn)師生比例痛悯。
B：1000（Bk-0.63）^ 2，其中 Bk 指代城鎮(zhèn)中黑人的比例重窟。
LSTAT：人口中地位低下者的比例载萌。
MEDV：自住房的平均房價，以千美元計(jì)巡扇。
加載數(shù)據(jù)集其參數(shù)有：
? return_X_y:

若為True扭仁，則以（data, target）元組形式返回?cái)?shù)據(jù)；默認(rèn)為False厅翔，表示以字典形式返回?cái)?shù)據(jù)全部信息（包括data和target）乖坠。

from sklearn.datasets import load_boston
boston = load_boston()
print(dir(boston))
print('*'*80)
print(boston.DESCR)
print('*'*80)
print(boston.feature_names)
print(boston.data)
print('*'*80)
print(boston.filename)
print('*'*80)
print(boston.target)

['DESCR', 'data', 'feature_names', 'filename', 'target']
********************************************************************************
.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.
'''       部分省略      '''
********************************************************************************
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
[[6.3200e-03 1.8000e+01 2.3100e+00 ... 1.5300e+01 3.9690e+02 4.9800e+00]
 [2.7310e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9690e+02 9.1400e+00]
 [2.7290e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9283e+02 4.0300e+00]
 ...
 [6.0760e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 5.6400e+00]
 [1.0959e-01 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9345e+02 6.4800e+00]
 [4.7410e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 7.8800e+00]]
********************************************************************************
D:\Anaconda3\lib\site-packages\sklearn\datasets\data\boston_house_prices.csv
********************************************************************************
[24.  21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 15.  18.9 21.7 20.4
 18.2 19.9 23.1 17.5 20.2 18.2 13.6 19.6 15.2 14.5 15.6 13.9 16.6 14.8
 '''       部分省略      '''
 16.7 12.  14.6 21.4 23.  23.7 25.  21.8 20.6 21.2 19.1 20.6 15.2  7.
  8.1 13.6 20.1 21.8 24.5 23.1 19.7 18.3 21.2 17.5 16.8 22.4 20.6 23.9
 22.  11.9]

sklearn.datasets.load_diabetes

from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
print(dir(diabetes))
print('*'*80)
print(diabetes.DESCR)
print('*'*80)
print(diabetes.data_filename)
print('*'*80)
print(diabetes.feature_names)
print(diabetes.data)
print('*'*80)
print(diabetes.target_filename)

['DESCR', 'data', 'data_filename', 'feature_names', 'target', 'target_filename']
********************************************************************************
.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - Age
      - Sex
      - Body mass index
      - Average blood pressure
      - S1
      - S2
      - S3
      - S4
      - S5
      - S6

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
'''       部分省略      '''
********************************************************************************
D:\Anaconda3\lib\site-packages\sklearn\datasets\data\diabetes_data.csv.gz
********************************************************************************
['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
[[ 0.03807591  0.05068012  0.06169621 ... -0.00259226  0.01990842
  -0.01764613]
 [-0.00188202 -0.04464164 -0.05147406 ... -0.03949338 -0.06832974
  -0.09220405]
 [ 0.08529891  0.05068012  0.04445121 ... -0.00259226  0.00286377
  -0.02593034]
 ...
 [ 0.04170844  0.05068012 -0.01590626 ... -0.01107952 -0.04687948
   0.01549073]
 [-0.04547248 -0.04464164  0.03906215 ...  0.02655962  0.04452837
  -0.02593034]
 [-0.04547248 -0.04464164 -0.0730303  ... -0.03949338 -0.00421986
   0.00306441]]
********************************************************************************
D:\Anaconda3\lib\site-packages\sklearn\datasets\data\diabetes_target.csv.gz

獲取大數(shù)據(jù)集

sklearn.datasets.fetch_20newsgroups
加載數(shù)據(jù)集其參數(shù)有：

subset: 'train'或者'test','all'，可選刀闷，選擇要加載的數(shù)據(jù)集：訓(xùn)練集的“訓(xùn)練”熊泵，測試集的“測試”，兩者的“全部”

data_home: 可選甸昏，默認(rèn)值：無顽分。指定數(shù)據(jù)集的下載路徑。如果沒有施蜜，所有scikit學(xué)習(xí)數(shù)據(jù)都存儲在'?/ scikit_learn_data'子文件夾中

categories: 選取哪一類數(shù)據(jù)集[類別列表]卒蘸，默認(rèn)20類

shuffle: 是否對數(shù)據(jù)進(jìn)行隨機(jī)排序

random_state: numpy隨機(jī)數(shù)生成器或種子整數(shù)

download_if_missing: 可選，默認(rèn)為True翻默，如果沒有下載過缸沃，重新下載

remove: ('headers','footers','quotes')刪除部分文本

from sklearn.datasets import fetch_20newsgroups
data_test=fetch_20newsgroups(subset='test',data_home=None,categories=None,                          shuffle=True,random_state=42,remove=(),download_if_missing=True)

from sklearn.datasets import fetch_20newsgroups
data_test = fetch_20newsgroups(subset='test',shuffle=True,random_state=42)
data_train = fetch_20newsgroups(subset='train',shuffle=True,random_state=42)
print(dir(data_train))
print('*'*80)
#print(data_train.DESCR)
print('*'*80)
print(data_test.data[0]) #測試集中的第一篇文檔
print('-'*80)
print('訓(xùn)練集數(shù)據(jù)分類名稱：{} '.format(data_train.target_names))
print(data_test.target[:10])
print('*'*80)
print('訓(xùn)練集數(shù)據(jù)：{} 條'.format(data_train.target.shape))
print('測試集數(shù)據(jù):{} 條'.format(data_test.target.shape))

['DESCR', 'data', 'filenames', 'target', 'target_names']
********************************************************************************
********************************************************************************
From: v064mb9k@ubvmsd.cc.buffalo.edu (NEIL B. GANDLER)
Subject: Need info on 88-89 Bonneville
Organization: University at Buffalo
Lines: 10
News-Software: VAX/VMS VNEWS 1.41
Nntp-Posting-Host: ubvmsd.cc.buffalo.edu

 I am a little confused on all of the models of the 88-89 bonnevilles.
I have heard of the LE SE LSE SSE SSEI. Could someone tell me the
differences are far as features or performance. I am also curious to
know what the book value is for prefereably the 89 model. And how much
less than book value can you usually get them for. In other words how
much are they in demand this time of year. I have heard that the mid-spring
early summer is the best time to buy.

                        Neil Gandler

--------------------------------------------------------------------------------
訓(xùn)練集數(shù)據(jù)分類名稱：['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc'] 
[ 7  5  0 17 19 13 15 15  5  1]
********************************************************************************
訓(xùn)練集數(shù)據(jù)：(11314,) 條
測試集數(shù)據(jù):(7532,) 條

sklearn.datasets.fetch_20newsgroups_vectorized

? 加載20個新聞組數(shù)據(jù)集并將其轉(zhuǎn)換為tf-idf向量，這是一個方便的功能; 使用sklearn.feature_ extraction.text.Vectorizer的默認(rèn)設(shè)置完成tf-idf 轉(zhuǎn)換修械。

from sklearn.datasets import fetch_20newsgroups_vectorized
from sklearn.utils import shuffle
bunch = fetch_20newsgroups_vectorized(subset='all')
X,y = shuffle(bunch.data,bunch.target)
print(X.shape)
# 數(shù)據(jù)集劃分為訓(xùn)練集0.7和測試集0.3
offset = int(X.shape[0]*0.7)
X_train, y_train = X[0:offset], y[0:offset]
X_test, y_test = X[offset:], y[offset:]
print(X_train.shape)
print(X_test.shape)

(18846, 130107)
(13192, 130107)
(5654, 130107)

獲取本地生成數(shù)據(jù)

生成本地分類數(shù)據(jù)：
- sklearn.datasets.make_classification
- 加載數(shù)據(jù)集其參數(shù)有：
  
  n_samples:int趾牧，optional（default = 100)，樣本數(shù)量
  
  n_features:int肯污，可選（默認(rèn)= 20）武氓，特征總數(shù)= n_informative + n_redundant + n_repeated
  
  n_informative：多信息特征的個數(shù)
  
  n_redundant：冗余信息，informative特征的隨機(jī)線性組合
  
  n_repeated ：重復(fù)信息仇箱，隨機(jī)提取n_informative和n_redundant 特征
  
  n_classes:int县恕，可選（default = 2),分類類別
  
  n_clusters_per_class ：某一個類別是由幾個cluster構(gòu)成的
  
  random_state:int，RandomState實(shí)例剂桥，可選（默認(rèn)=無）如果int忠烛，random_state是隨機(jī)數(shù)生成器使用的種子
```
from sklearn import datasets
import matplotlib.pyplot as plt 
 
data,target = datasets.make_classification(n_samples=100,n_features=2,
                                           n_informative=2,n_redundant=0,n_repeated=0,
                                           n_classes=2,n_clusters_per_class=1,
                                           random_state=0)
print(data.shape)
print(target.shape)
#print(data)
#print(target)
plt.scatter(data[:,0],data[:,1],c=target)
plt.show()
```
```
(100, 2)
(100,)
```
111.png

生成本地回歸數(shù)據(jù)：
- sklearn.datasets.make_regression
- 加載數(shù)據(jù)集其參數(shù)有：
  
  n_samples: int，optional（default = 100)权逗，樣本數(shù)量
  
  n_features: int,optional（default = 100)美尸，特征數(shù)量
  
  coef: boolean冤议，optional（default = False），如果為True师坎，則返回底層線性模型的系數(shù)
  
  random_state: int恕酸，RandomState實(shí)例，可選（默認(rèn)=無）
```
from sklearn.datasets.samples_generator import make_regression
X, y = make_regression(n_samples=100, n_features=10, random_state=1)
print(X.shape)
print(y.shape)
```
圖像數(shù)據(jù)

在Anaconda中sklearn中的圖像在該目錄下

D:\Anaconda3\Lib\site-packages\sklearn\datasets\images

存在china.jpg和flower.jpg

from sklearn.datasets import load_sample_image
import matplotlib.pyplot as plt
img = load_sample_image('china.jpg')
plt.imshow(img)

china.png

參考資料：

網(wǎng)址：

https://blog.csdn.net/wangdong2017/article/details/81326341

視頻：

《python機(jī)器學(xué)習(xí)應(yīng)用》《黑馬程序員之機(jī)器學(xué)習(xí)》

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末胯陋，一起剝皮案震驚了整個濱河市蕊温，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌遏乔，老刑警劉巖义矛，帶你破解...
沈念sama閱讀 223,126評論 6贊 520
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異盟萨，居然都是意外死亡凉翻，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 95,421評論 3贊 400
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門捻激，熙熙樓的掌柜王于貴愁眉苦臉地迎上來制轰，“玉大人，你說我怎么就攤上這事胞谭±龋” “怎么了？”我有些...
開封第一講書人閱讀 169,941評論 0贊 366
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵韭赘，是天一觀的道長。經(jīng)常有香客問我势就，道長泉瞻，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 60,294評論 1贊 300
?港島之戀（遺憾婚禮）
正文為了忘掉前任苞冯，我火速辦了婚禮袖牙，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘舅锄。我一直安慰自己鞭达，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 69,295評論 6贊 398
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布皇忿。她就那樣靜靜地躺著畴蹭，像睡著了一般。火紅的嫁衣襯著肌膚如雪鳍烁。梳的紋絲不亂的頭發(fā)上叨襟，一...
開封第一講書人閱讀 52,874評論 1贊 314
城市分裂傳說
那天，我揣著相機(jī)與錄音幔荒，去河邊找鬼糊闽。笑死梳玫，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的右犹。我是一名探鬼主播提澎，決...
沈念sama閱讀 41,285評論 3贊 424
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼念链！你這毒婦竟也來了盼忌？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 40,249評論 0贊 277
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤钓账，失蹤者是張志新（化名）和其女友劉穎碴犬，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體梆暮，經(jīng)...
沈念sama閱讀 46,760評論 1贊 321
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡服协，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 38,840評論 3贊 343
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了啦粹。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片偿荷。...
茶點(diǎn)故事閱讀 40,973評論 1贊 354
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖唠椭，靈堂內(nèi)的尸體忽然破棺而出跳纳，到底是詐尸還是另有隱情，我是刑警寧澤贪嫂，帶...
沈念sama閱讀 36,631評論 5贊 351
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布寺庄，位于F島的核電站，受9級特大地震影響力崇，放射性物質(zhì)發(fā)生泄漏斗塘。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 42,315評論 3贊 336
男人毒藥：我在死后第九天來索命
文/蒙蒙一亮靴、第九天我趴在偏房一處隱蔽的房頂上張望馍盟。院中可真熱鬧，春花似錦茧吊、人聲如沸贞岭。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,797評論 0贊 25
一樁弒父案搓侄，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽瞄桨。三九已至，卻和暖如春讶踪，著一層夾襖步出監(jiān)牢的瞬間讲婚，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 33,926評論 1贊 275
情欲美人皮
我被黑心中介騙來泰國打工俊柔，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留筹麸，地道東北人活合。一個月前我還...
沈念sama閱讀 49,431評論 3贊 379
代替公主和親
正文我出身青樓，卻偏偏與公主長得像物赶，于是被迫代替她去往敵國和親白指。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 45,982評論 2贊 361

scikit-learn數(shù)據(jù)集

scikit-learn數(shù)據(jù)集

sklearn數(shù)據(jù)集

sklearn.datasets

獲取小數(shù)據(jù)集

獲取大數(shù)據(jù)集

獲取本地生成數(shù)據(jù)

圖像數(shù)據(jù)

推薦閱讀更多精彩內(nèi)容