python練習(xí):Case Study - Sunlight in Austin

內(nèi)容來自datacamp課程:pandas foundation
數(shù)據(jù)以及代碼在github

數(shù)據(jù):

數(shù)據(jù)1

  • weather_data_austin_2010:
    2010年的Austin天氣情況


    head

    為了后續(xù)更好使用,把date作為index

df.Date=pd.to_datetime(df.Date)
df.index=df.Date
df=df.drop(['Date'],axis=1)
df.head()
head

數(shù)據(jù)2

NOAA_QCLCD_2011_hourly_13904.txt
2011年的天氣情況滑负,沒有header进宝,數(shù)據(jù)列數(shù)44列锌雀,會在后面刪除一些


head
column_labels='Wban,date,Time,StationType,sky_condition,sky_conditionFlag,visibility,visibilityFlag,wx_and_obst_to_vision,wx_and_obst_to_visionFlag,dry_bulb_faren,dry_bulb_farenFlag,dry_bulb_cel,dry_bulb_celFlag,wet_bulb_faren,wet_bulb_farenFlag,wet_bulb_cel,wet_bulb_celFlag,dew_point_faren,dew_point_farenFlag,dew_point_cel,dew_point_celFlag,relative_humidity,relative_humidityFlag,wind_speed,wind_speedFlag,wind_direction,wind_directionFlag,value_for_wind_character,value_for_wind_characterFlag,station_pressure,station_pressureFlag,pressure_tendency,pressure_tendencyFlag,presschange,presschangeFlag,sea_level_pressure,sea_level_pressureFlag,record_type,hourly_precip,hourly_precipFlag,altimeter,altimeterFlag,junk'

column_labels_list = column_labels.split(',')
df2.columns = column_labels_list
list_to_drop=['sky_conditionFlag', 'visibilityFlag', 'wx_and_obst_to_vision', 'wx_and_obst_to_visionFlag', 'dry_bulb_farenFlag', 'dry_bulb_celFlag', 'wet_bulb_farenFlag', 'wet_bulb_celFlag', 'dew_point_farenFlag', 'dew_point_celFlag', 'relative_humidityFlag', 'wind_speedFlag', 'wind_directionFlag', 'value_for_wind_character', 'value_for_wind_characterFlag', 'station_pressureFlag', 'pressure_tendencyFlag', 'pressure_tendency', 'presschange', 'presschangeFlag', 'sea_level_pressureFlag', 'hourly_precip', 'hourly_precipFlag', 'altimeter', 'record_type', 'altimeterFlag', 'junk']
df2_dropped = df2.drop(list_to_drop,axis='columns')
print(df2_dropped.head())
只保留這些columns

數(shù)據(jù)清洗伞矩,把date還有time合并靴迫,并且作為index

# Convert the date column to string: df_dropped['date']
df2_dropped['date'] = df2_dropped['date'].astype(str)

# Pad leading zeros to the Time column: df_dropped['Time']
df2_dropped['Time'] = df2_dropped['Time'].apply(lambda x:'{:0>4}'.format(x))

# Concatenate the new date and Time columns: date_string
date_string = df2_dropped['date'] + df2_dropped['Time']

# Convert the date_string Series to datetime: date_times
date_times = pd.to_datetime(date_string, format='%Y%m%d%H%M')

# Set the index to be the new date_times container: df_clean
df2_clean = df2_dropped.set_index(date_times)

# Print the output of df_clean.head()
print(df2_clean.head())
清洗后的數(shù)據(jù)2

處理缺失值 把表格中標(biāo)記為M的缺失值改為NAN

# Print the dry_bulb_faren temperature between 8 AM and 9 AM on June 20, 2011
print(df2_clean.loc['2011-6-20 8:00:00':'2011-6-20 9:00:00','dry_bulb_faren' ])

# Convert the dry_bulb_faren column to numeric values: df_clean['dry_bulb_faren']
df2_clean['dry_bulb_faren'] = pd.to_numeric(df2_clean['dry_bulb_faren'], errors='coerce')

# Print the transformed dry_bulb_faren temperature between 8 AM and 9 AM on June 20, 2011
print(df2_clean.loc['2011-6-20 8:00:00':'2011-6-20 9:00:00', 'dry_bulb_faren'])

# Convert the wind_speed and dew_point_faren columns to numeric values
df2_clean['wind_speed'] = pd.to_numeric(df2_clean['wind_speed'], errors='coerce')
df2_clean['dew_point_faren'] = pd.to_numeric(df2_clean['dew_point_faren'], errors='coerce')

了解數(shù)據(jù)2

# Print the median of the dry_bulb_faren column
print(df2_clean.dry_bulb_faren.median())

# Print the median of the dry_bulb_faren column for the time range '2011-Apr':'2011-Jun'
print(df2_clean.loc['2011-Apr':'2011-Jun', 'dry_bulb_faren'].median())

# Print the median of the dry_bulb_faren column for the month of January
print(df2_clean.loc['2011-Jan', 'dry_bulb_faren'].median())

72.0
78.0
48.0

只分析列了‘干球溫度’的中位數(shù)佳镜,以及他在不同時(shí)間的中位數(shù)

how much hotter was every day in 2011 than expected from the 30-year average?求方差

# Downsample df_clean by day and aggregate by mean: daily_mean_2011
daily_mean_2011 = df2_clean.resample('D').mean()

# Extract the dry_bulb_faren column from daily_mean_2011 using .values: daily_temp_2011
daily_temp_2011 = daily_mean_2011['dry_bulb_faren'].values

# Downsample df_climate by day and aggregate by mean: daily_climate
daily_climate = df.resample('D').mean()

# Extract the Temperature column from daily_climate using .reset_index(): daily_temp_climate
daily_temp_climate = daily_climate.reset_index()['Temperature']

# Compute the difference between the two arrays and print the mean difference
difference = daily_temp_2011 - daily_temp_climate
print(difference.mean())

1.3301831870056477

晴天還是雨天启盛?

On average, how much hotter is it when the sun is shining? In this exercise, you will compare temperatures on sunny days against temperatures on overcast days.
Your job is to use Boolean selection to filter out sunny and overcast days, and then compute the difference of the mean daily maximum temperatures between each type of day.
The column 'sky_condition' provides information about whether the day was sunny ('CLR') or overcast ('OVC').

# Using df_clean, when is sky_condition 'CLR'?
is_sky_clear = df2_clean['sky_condition']=='CLR'

# Filter df_clean using is_sky_clear
sunny = df2_clean.loc[is_sky_clear]

# Resample sunny by day then calculate the max
sunny_daily_max = sunny.resample('D').max()
# Using df_clean, when does sky_condition contain 'OVC'?
is_sky_overcast = df2_clean['sky_condition'].str.contains('OVC')

# Filter df_clean using is_sky_overcast
overcast = df2_clean.loc[is_sky_overcast]

# Resample overcast by day then calculate the max
overcast_daily_max = overcast.resample('D').max()
# Calculate the mean of sunny_daily_max
sunny_daily_max_mean = sunny_daily_max.mean()

# Calculate the mean of overcast_daily_max
overcast_daily_max_mean = overcast_daily_max.mean()

# Print the difference (sunny minus overcast)
print(sunny_daily_max_mean-overcast_daily_max_mean)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Wban               0.000000
StationType        0.000000
dry_bulb_faren     6.504304
dew_point_faren   -4.339286
wind_speed        -3.246062
dtype: float64

The average daily maximum dry bulb temperature was 6.5 degrees Fahrenheit higher on sunny days compared to overcast days.

可見度和溫度
your job is to plot the weekly average temperature and visibility as subplots.

# Import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

# Select the visibility and dry_bulb_faren columns and resample them: weekly_mean
weekly_mean = df2_clean[['visibility','dry_bulb_faren']].resample('W').mean()

# Print the output of weekly_mean.corr()
print(weekly_mean.corr())

# Plot weekly_mean with subplots=True
weekly_mean.plot(subplots=True)
plt.show()
溫度高,可見度大葛家?

計(jì)算晴天的比例

# Using df_clean, when is sky_condition 'CLR'?
is_sky_clear = df2_clean['sky_condition']=='CLR'

# Resample is_sky_clear by day
resampled = is_sky_clear.resample('D')
# Calculate the number of sunny hours per day
sunny_hours = resampled.sum()

# Calculate the number of measured hours per day
total_hours = resampled.count()

# Calculate the fraction of hours per day that were sunny
sunny_fraction = sunny_hours/total_hours
sunny_fraction.plot(kind='box')
plt.show()
image.png

露點(diǎn)和溫度

Dew point is a measure of relative humidity based on pressure and temperature. A dew point above 65 is considered uncomfortable while a temperature above 90 is also considered uncomfortable.

In this exercise, you will explore the maximum temperature and dew point of each month. The columns of interest are 'dew_point_faren' and 'dry_bulb_faren'. After resampling them appropriately to get the maximum temperature and dew point in each month, generate a histogram of these values as subplots.

# Resample dew_point_faren and dry_bulb_faren by Month, aggregating the maximum values: monthly_max
monthly_max = df2_clean[['dew_point_faren','dry_bulb_faren']].resample('M').max()

# Generate a histogram with bins=8, alpha=0.5, subplots=True
monthly_max.plot(kind='hist',bins=8,alpha=0.5,subplots=True)

# Show the plot
plt.show()
result

溫度高的可能性 cdf

We already know that 2011 was hotter than the climate normals for the previous thirty years. In this final exercise, you will compare the maximum temperature in August 2011 against that of the August 2010 climate normals. More specifically, you will use a CDF plot to determine the probability of the 2011 daily maximum temperature in August being above the 2010 climate normal value. To do this, you will leverage the data manipulation, filtering, resampling, and visualization skills you have acquired throughout this course.

The two DataFrames df_clean and df_climate are available in the workspace. Your job is to select the maximum temperature in August in df_climate, and then maximum daily temperatures in August 2011. You will then filter out the days in August 2011 that were above the August 2010 maximum, and use this to construct a CDF plot.

# Extract the maximum temperature in August 2010 from df_climate: august_max
august_max = df.loc['2010-Aug','Temperature'].max()
print(august_max)

# Resample August 2011 temps in df_clean by day & aggregate the max value: august_2011
august_2011 = df2_clean.loc['2011-Aug','dry_bulb_faren'].resample('D').max()

# Filter for days in august_2011 where the value exceeds august_max: august_2011_high

august_2011_high = august_2011.loc[august_2011 > august_max]

# Construct a CDF of august_2011_high
august_2011_high.plot(kind='hist', normed=True, cumulative=True, bins=25)

# Display the plot
plt.show()
result
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末户辞,一起剝皮案震驚了整個(gè)濱河市,隨后出現(xiàn)的幾起案子癞谒,更是在濱河造成了極大的恐慌底燎,老刑警劉巖刃榨,帶你破解...
    沈念sama閱讀 206,378評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異双仍,居然都是意外死亡枢希,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,356評論 2 382
  • 文/潘曉璐 我一進(jìn)店門朱沃,熙熙樓的掌柜王于貴愁眉苦臉地迎上來晴玖,“玉大人,你說我怎么就攤上這事为流。” “怎么了让簿?”我有些...
    開封第一講書人閱讀 152,702評論 0 342
  • 文/不壞的土叔 我叫張陵敬察,是天一觀的道長。 經(jīng)常有香客問我尔当,道長莲祸,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,259評論 1 279
  • 正文 為了忘掉前任椭迎,我火速辦了婚禮锐帜,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘畜号。我一直安慰自己缴阎,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,263評論 5 371
  • 文/花漫 我一把揭開白布简软。 她就那樣靜靜地躺著蛮拔,像睡著了一般。 火紅的嫁衣襯著肌膚如雪痹升。 梳的紋絲不亂的頭發(fā)上建炫,一...
    開封第一講書人閱讀 49,036評論 1 285
  • 那天,我揣著相機(jī)與錄音疼蛾,去河邊找鬼肛跌。 笑死,一個(gè)胖子當(dāng)著我的面吹牛察郁,可吹牛的內(nèi)容都是我干的衍慎。 我是一名探鬼主播,決...
    沈念sama閱讀 38,349評論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼绳锅,長吁一口氣:“原來是場噩夢啊……” “哼西饵!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起鳞芙,我...
    開封第一講書人閱讀 36,979評論 0 259
  • 序言:老撾萬榮一對情侶失蹤眷柔,失蹤者是張志新(化名)和其女友劉穎期虾,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體驯嘱,經(jīng)...
    沈念sama閱讀 43,469評論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡镶苞,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 35,938評論 2 323
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了鞠评。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片茂蚓。...
    茶點(diǎn)故事閱讀 38,059評論 1 333
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖剃幌,靈堂內(nèi)的尸體忽然破棺而出聋涨,到底是詐尸還是另有隱情,我是刑警寧澤负乡,帶...
    沈念sama閱讀 33,703評論 4 323
  • 正文 年R本政府宣布牍白,位于F島的核電站,受9級特大地震影響抖棘,放射性物質(zhì)發(fā)生泄漏茂腥。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,257評論 3 307
  • 文/蒙蒙 一切省、第九天 我趴在偏房一處隱蔽的房頂上張望最岗。 院中可真熱鬧,春花似錦朝捆、人聲如沸般渡。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,262評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽诊杆。三九已至,卻和暖如春何陆,著一層夾襖步出監(jiān)牢的瞬間晨汹,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,485評論 1 262
  • 我被黑心中介騙來泰國打工贷盲, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留淘这,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 45,501評論 2 354
  • 正文 我出身青樓巩剖,卻偏偏與公主長得像铝穷,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個(gè)殘疾皇子佳魔,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,792評論 2 345

推薦閱讀更多精彩內(nèi)容