關(guān)于空難數(shù)據(jù)集的探索分析

寫在前面:
這是我見(jiàn)過(guò)的最嚴(yán)肅的數(shù)據(jù)集,幾乎每一行數(shù)據(jù)背后都是生命和鮮血的代價(jià)疯汁。這次探索分析并不妄圖說(shuō)明什么乒融,僅僅是對(duì)數(shù)據(jù)處理能力的鍛煉。因此本次的探索分析只會(huì)展示數(shù)據(jù)該有的樣子而不會(huì)進(jìn)行太多的評(píng)價(jià)欠橘。有一句話叫“因?yàn)檎鋹?ài)和平矩肩,我們回首戰(zhàn)爭(zhēng)”。這里也是肃续,因?yàn)檎鋹?ài)生命黍檩,所以回首空難。現(xiàn)在安全的飛行是10萬(wàn)多無(wú)辜的人通過(guò)性命換來(lái)的始锚,向這些偉大的探索者致敬刽酱。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

導(dǎo)入數(shù)據(jù)集

crash = pd.read_csv("./Airplane_Crashes_and_Fatalities_Since_1908.csv")
crash.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5268 entries, 0 to 5267
Data columns (total 13 columns):
Date            5268 non-null object
Time            3049 non-null object
Location        5248 non-null object
Operator        5250 non-null object
Flight #        1069 non-null object
Route           3562 non-null object
Type            5241 non-null object
Registration    4933 non-null object
cn/In           4040 non-null object
Aboard          5246 non-null float64
Fatalities      5256 non-null float64
Ground          5246 non-null float64
Summary         4878 non-null object
dtypes: float64(3), object(10)
memory usage: 535.1+ KB
crash = crash.drop(["Summary","cn/In","Flight #","Route","Location"],axis=1)
crash.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5268 entries, 0 to 5267
Data columns (total 8 columns):
Date            5268 non-null object
Time            3049 non-null object
Operator        5250 non-null object
Type            5241 non-null object
Registration    4933 non-null object
Aboard          5246 non-null float64
Fatalities      5256 non-null float64
Ground          5246 non-null float64
dtypes: float64(3), object(5)
memory usage: 329.3+ KB
print(crash[2200:2205])
            Date   Time                      Operator              Type  \
2200  03/06/1968   8:00     Military - U.S. Air Force  Fairchild C-123K   
2201  03/08/1968  19:18                    Air Manila    Fairchild F-27   
2202  03/09/1968  23:20   Military - French Air Force      Douglas DC6B   
2203  03/19/1968  19:37     Viking Airways - Air Taxi        Cessna 182   
2204  03/23/1968  13:00  Fortaire Aviation - Air Taxi       Brantly 305   

     Registration  Aboard  Fatalities  Ground  
2200      54-0590    49.0        49.0     0.0  
2201      PI-C871    14.0        14.0     0.0  
2202        43748    20.0        19.0     0.0  
2203       N2623F     2.0         2.0     0.0  
2204       N2224U     5.0         3.0     0.0  

傷亡分析

傷亡排序

print(crash["Fatalities"].sum())
fatal_crash = crash[crash["Fatalities"].notnull()]
fatal_crash = fatal_crash.sort_values(by="Fatalities")
print(fatal_crash[-5:])
105479.0
            Date   Time                                    Operator  \
3562  06/23/1985   7:15                                   Air India   
2726  03/03/1974  11:41                      Turkish Airlines (THY)   
4455  11/12/1996  18:40  Saudi Arabian Airlines / Kazastan Airlines   
3568  08/12/1985  18:56                             Japan Air Lines   
2963  03/27/1977  17:07            Pan American World Airways / KLM   

                                      Type    Registration  Aboard  \
3562                     Boeing B-747-237B          VT-EFO   329.0   
2726            McDonnell Douglas DC-10-10          TC-JAV   346.0   
4455  Boeing B-747-168B / Ilyushin IL-76TD  HZAIH/UN-76435   349.0   
3568                     Boeing B-747-SR46          JA8119   524.0   
2963  Boeing B-747-121 / Boeing B-747-206B   N736PA/PH-BUF   644.0   

      Fatalities  Ground  
3562       329.0     0.0  
2726       346.0     0.0  
4455       349.0     0.0  
3568       520.0     0.0  
2963       583.0     0.0  
  • 內(nèi)特里費(fèi)空難:兩架波音-747相撞,死亡583人瞧捌,又稱世紀(jì)大空難
  • 日航123空難:波音747撞富士山棵里,單架飛機(jī)失事最高死亡記錄
  • 恰爾基達(dá)德里撞機(jī)事件润文,最嚴(yán)重的的空中撞機(jī)事件
  • 土耳其航空981號(hào)班機(jī)空難:貨艙門未鎖定導(dǎo)致爆炸性施壓
  • 印度航空182號(hào)班機(jī):恐怖襲擊

傷亡概率

print(fatal_crash["Fatalities"].sum() / fatal_crash["Aboard"].sum())
0.729700936002
print(fatal_crash[fatal_crash["Fatalities"] != 0]["Fatalities"].sum() / fatal_crash[fatal_crash["Fatalities"] != 0]["Aboard"].sum())
0.754191781605    

機(jī)型處理

處理函數(shù)

type_crash = fatal_crash["Type"]
def type_handle(x):
    x = str(x)
    if "McDonnell Douglas" in x:
        return "McDonnell Douglas"
    elif "Douglas" in x:
        return "Douglas"
    elif "Boeing" in x:
        return "Boeing"
    elif "Airbus" in x:
        return "Airbus"
    elif "Embraer" in x:
        return "Embraer"
    elif "Ilyushin" in x:
        return "Ilyushin"
    else:
        return "other"
company_crash = type_crash.map(type_handle)
print(pd.value_counts(company_crash))
other                3581
Douglas               984
Boeing                376
McDonnell Douglas     123
Ilyushin               96
Embraer                61
Airbus                 35
Name: Type, dtype: int64
fatal_crash["company"] = company_crash
print(fatal_crash[:2])
            Date   Time          Operator               Type Registration  \
108   10/21/1926  13:15  Imperial Airways  Handley Page W-10       G-EBMS   
5178  11/08/2007   8:00    Juba Air Cargo         Antonov 12       ST-JUA   

      Aboard  Fatalities  Ground  year month company  
108     12.0         0.0     0.0  1926    10   other  
5178     4.0         0.0     2.0  2007    11   other  

處理結(jié)果

def airplane_counte(x):
    fatal_ratio = x["Fatalities"].sum() / x["Aboard"].sum()
    crash_time = x.shape[0]
    fatal_num = x["Fatalities"].sum()
    return pd.Series({"fatal_num":fatal_num,"crash_time":crash_time,"fatal_ratio":fatal_ratio})

company = fatal_crash.groupby(['company']).apply(airplane_counte)
print(company)
plt.close()
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
company['crash_time'].drop("other").plot(kind='bar',title="time")
plt.subplot(1,3,2)
company['fatal_num'].drop("other").plot(kind='bar',title="fatal_num")
plt.subplot(1,3,3)
company['fatal_ratio'].plot(kind='bar',title="fatal_ratio")
plt.show()
                   crash_time  fatal_num  fatal_ratio
company                                              
Airbus                   35.0     2980.0     0.510711
Boeing                  376.0    18705.0     0.649434
Douglas                 984.0    16899.0     0.794350
Embraer                  61.0      644.0     0.779661
Ilyushin                 96.0     4547.0     0.883084
McDonnell Douglas       123.0     6827.0     0.531946
other                  3581.0    54877.0     0.785854
分廠商分析結(jié)果

時(shí)間分析

def get_year(x):
    return x.split("/")[-1]
fatal_crash['year'] = fatal_crash["Date"].map(get_year)
year_fatal = fatal_crash[fatal_crash["year"] != np.NaN][["year","Fatalities","Aboard"]]
year_fatal.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5256 entries, 108 to 2963
Data columns (total 3 columns):
year          5256 non-null object
Fatalities    5256 non-null float64
Aboard        5246 non-null float64
dtypes: float64(2), object(1)
memory usage: 164.2+ KB
def year_analysis(x):
    return pd.Series({"fatal_num":x["Fatalities"].sum(),"time":x.shape[0],"fatal_ratio":x["Fatalities"].sum() / x["Aboard"].sum()})
year = year_fatal.groupby(["year"]).apply(year_analysis)
year = year.sort_index()
year.info()
<class 'pandas.core.frame.DataFrame'>
Index: 98 entries, 1908 to 2009
Data columns (total 3 columns):
fatal_num      98 non-null float64
fatal_ratio    98 non-null float64
time           98 non-null float64
dtypes: float64(3)
memory usage: 3.1+ KB
plt.close()
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
year["fatal_num"].plot(title="fatal_num")
plt.subplot(1,3,2)
year["time"].plot(title="crash_time")
plt.subplot(1,3,3)
year["fatal_ratio"].plot(title="fatal_ratio")
plt.show()
按年分析結(jié)果

def get_month(x):
    return x.split("/")[0]
fatal_crash['month'] = fatal_crash["Date"].map(get_month)
month_fatal = fatal_crash[fatal_crash["month"] != np.NaN][["month","Fatalities","Aboard"]]
month_fatal.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5256 entries, 108 to 2963
Data columns (total 3 columns):
month         5256 non-null object
Fatalities    5256 non-null float64
Aboard        5246 non-null float64
dtypes: float64(2), object(1)
memory usage: 164.2+ KB
def month_analysis(x):
    return pd.Series({"fatal_num":x["Fatalities"].sum(),"time":x.shape[0],"fatal_ratio":x["Fatalities"].sum() / x["Aboard"].sum()})
month = month_fatal.groupby(["month"]).apply(year_analysis)
month = month.sort_index()
print(month)
       fatal_num  fatal_ratio   time
month                               
01        8425.0     0.768354  494.0
02        7966.0     0.693057  395.0
03        8708.0     0.787057  453.0
04        6769.0     0.711852  378.0
05        7130.0     0.731807  370.0
06        7909.0     0.681399  385.0
07        9232.0     0.700349  427.0
08       10174.0     0.729162  474.0
09       10286.0     0.760349  458.0
10        8388.0     0.778758  452.0
11       10033.0     0.766522  454.0
12       10459.0     0.668478  516.0
plt.close()
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
month["fatal_num"].plot(kind="bar",title="fatal_num")
plt.subplot(1,3,2)
month["time"].plot(kind="bar",title="crash_time")
plt.subplot(1,3,3)
month["fatal_ratio"].plot(kind="bar",title="fatal_ratio")
plt.show()
按月份分析

小時(shí)

def get_hour(x):
    hour = x.split(":")[0]
    try:
        hour = float(hour)
        if int(hour) == hour and hour < 24:
            return hour
        else:
            return np.nan
    except:
        return np.nan
    
time_fatal = fatal_crash[fatal_crash["Time"].isnull() == False]
time_fatal["hour"] = time_fatal["Time"].map(get_hour)
time_fatal.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3049 entries, 108 to 2963
Data columns (total 13 columns):
Date            3049 non-null object
Time            3049 non-null object
Operator        3046 non-null object
Type            3048 non-null object
Registration    2952 non-null object
Aboard          3049 non-null float64
Fatalities      3049 non-null float64
Ground          3046 non-null float64
airplane        3049 non-null object
company         3049 non-null object
year            3049 non-null object
month           3049 non-null object
hour            3036 non-null float64
dtypes: float64(4), object(9)
memory usage: 333.5+ KB


c:\users\qiank\appdata\local\programs\python\python35\lib\site-packages\ipykernel_launcher.py:13: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]
def hour_analysis(x):
    return pd.Series({"fatal_num":x["Fatalities"].sum(),"time":x.shape[0],"fatal_ratio":x["Fatalities"].sum() / x["Aboard"].sum()})
hour = time_fatal.groupby(["hour"]).apply(hour_analysis)
plt.close()
plt.figure(figsize=(16,4))
plt.subplot(1,3,1)
hour["fatal_num"].plot(title="fatal_num")
plt.subplot(1,3,2)
hour["time"].plot(title="crash_time")
plt.subplot(1,3,3)
hour["fatal_ratio"].plot(title="fatal_ratio")
plt.show()
按時(shí)間分析
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市殿怜,隨后出現(xiàn)的幾起案子典蝌,更是在濱河造成了極大的恐慌,老刑警劉巖头谜,帶你破解...
    沈念sama閱讀 221,198評(píng)論 6 514
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件骏掀,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡柱告,警方通過(guò)查閱死者的電腦和手機(jī)截驮,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,334評(píng)論 3 398
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)际度,“玉大人葵袭,你說(shuō)我怎么就攤上這事」粤猓” “怎么了坡锡?”我有些...
    開(kāi)封第一講書人閱讀 167,643評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵,是天一觀的道長(zhǎng)块请。 經(jīng)常有香客問(wèn)我娜氏,道長(zhǎng),這世上最難降的妖魔是什么墩新? 我笑而不...
    開(kāi)封第一講書人閱讀 59,495評(píng)論 1 296
  • 正文 為了忘掉前任贸弥,我火速辦了婚禮,結(jié)果婚禮上海渊,老公的妹妹穿的比我還像新娘绵疲。我一直安慰自己,他們只是感情好臣疑,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,502評(píng)論 6 397
  • 文/花漫 我一把揭開(kāi)白布盔憨。 她就那樣靜靜地躺著,像睡著了一般讯沈。 火紅的嫁衣襯著肌膚如雪郁岩。 梳的紋絲不亂的頭發(fā)上,一...
    開(kāi)封第一講書人閱讀 52,156評(píng)論 1 308
  • 那天缺狠,我揣著相機(jī)與錄音问慎,去河邊找鬼。 笑死挤茄,一個(gè)胖子當(dāng)著我的面吹牛如叼,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播穷劈,決...
    沈念sama閱讀 40,743評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼笼恰,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼踊沸!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起社证,我...
    開(kāi)封第一講書人閱讀 39,659評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤逼龟,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后猴仑,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體审轮,經(jīng)...
    沈念sama閱讀 46,200評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡肥哎,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,282評(píng)論 3 340
  • 正文 我和宋清朗相戀三年辽俗,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片篡诽。...
    茶點(diǎn)故事閱讀 40,424評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡崖飘,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出杈女,到底是詐尸還是另有隱情朱浴,我是刑警寧澤,帶...
    沈念sama閱讀 36,107評(píng)論 5 349
  • 正文 年R本政府宣布达椰,位于F島的核電站翰蠢,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏啰劲。R本人自食惡果不足惜梁沧,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,789評(píng)論 3 333
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望蝇裤。 院中可真熱鬧廷支,春花似錦、人聲如沸栓辜。這莊子的主人今日做“春日...
    開(kāi)封第一講書人閱讀 32,264評(píng)論 0 23
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)藕甩。三九已至施敢,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間狭莱,已是汗流浹背僵娃。 一陣腳步聲響...
    開(kāi)封第一講書人閱讀 33,390評(píng)論 1 271
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留贩毕,地道東北人悯许。 一個(gè)月前我還...
    沈念sama閱讀 48,798評(píng)論 3 376
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像辉阶,于是被迫代替她去往敵國(guó)和親先壕。 傳聞我的和親對(duì)象是個(gè)殘疾皇子瘩扼,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,435評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容