由于numpy 對于元素類別的限制(必須得是同一類型元素)哨颂,因此在存儲多種類別信息時,就顯得有些捉襟見肘了相种。
而pandas
威恼,則應(yīng)運而生。其存儲數(shù)據(jù)格式寝并,非常類似于R中的data_frame箫措。
pandas
構(gòu)建dataframe
1)構(gòu)建字典example_dict
,字典值為鍵信息所對應(yīng)的列表食茎。
2)將構(gòu)建的字典轉(zhuǎn)化為pandas包中的dataframe形式蒂破。example = pd.DataFrame(example_dict)
牲蜀。
也可以通過導(dǎo)入外部文件的方式怪蔑,如example = pd.read_csv('example.csv')
3)若外部文件中不包含行注釋苞轿,可以為dataframe 構(gòu)建標(biāo)簽速警,example.index = row_labels
咧最。若引入的文件本身包含row_labels
蒙挑,則在導(dǎo)入文件時需要增加選項index_col = 0
俯抖,否則pandas 會默認(rèn)為表格添加一行注釋颊糜。
import pandas as pd
# Build cars DataFrame
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict)
print(cars)
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars
cars.index = row_labels
# Print cars again
print(cars)
選擇dataframe 中的信息
data_frame[]
會生成一個panda_series
類型內(nèi)容拨与。
而如果想將結(jié)果返回為dataframe稻据,需要使用雙方括號,data_frame[[]]
。
data_frame 也是支持切片操作的捻悯,且行使用名稱匆赃,列使用且僅使用切片。
loc 與iloc
data_frame.loc[]
今缚,通過向其中輸入列表算柳,[row_label_dict, col_label_dict]
,從而指定輸出選擇的行與列的信息姓言。
iloc
與loc
一樣瞬项,只不過由名稱選擇變成了位置選擇。
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out drives_right column as Series
print(cars.iloc[:, 2])
# Print out drives_right column as DataFrame
print(cars.iloc[:, [2]])
# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:, ['cars_per_cap', 'drives_right']])
使用比較運算符進(jìn)行篩選
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Create car_maniac: observations that have a cars_per_cap over 500
cpc = cars['cars_per_cap']
many_cars = cpc > 500 # 返回布爾值
car_maniac = cars[many_cars] # 只返回True 的row
# Print car_maniac
print(car_maniac)
- 還可以結(jié)合numpy 結(jié)合and, or, not 這些比較字符何荚,實現(xiàn)更高效的篩選囱淋。
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Import numpy, you'll need this
import numpy as np
# Create medium: observations with cars_per_cap between 100 and 500
medium = cars[np.logical_and(cars['cars_per_cap'] > 100, cars['cars_per_cap'] < 500)]
# Print medium
print(medium)