簡介:隨著共享單車的星期,這次探索三大美國城市的自行車共享系統(tǒng)相關(guān)的數(shù)據(jù):芝加哥、紐約和華盛頓特區(qū),幫助共享單車公司得到一些關(guān)鍵性的數(shù)據(jù)信息玛歌,例如哪個起始車站最熱門,哪一趟行程最熱門等等厦幅,來對共享單車的投放給予一定幫助沾鳄。
一、分析步驟
- 編寫代碼導(dǎo)入數(shù)據(jù)确憨,并通過計算描述性統(tǒng)計數(shù)據(jù)回答有趣的問題译荞。
- 編寫一個腳本,該腳本會接受原始輸入并在終端中創(chuàng)建交互式體驗休弃,以展現(xiàn)這些統(tǒng)計信息吞歼。
- 提出問題
- 終端應(yīng)用腳本
二、提出問題
- 起始時間(Start Time 列)中哪個月份最常見塔猾?
- 起始時間中篙骡,一周的哪一天(比如 Monday, Tuesday)最常見?
- 起始時間中丈甸,一天當中哪個小時最常見糯俗?
- 總騎行時長(Trip Duration)是多久,平均騎行時長是多久睦擂?
- 哪個起始車站(Start Station)最熱門得湘,哪個結(jié)束車站(End Station)最熱門?
- 哪一趟行程最熱門(即顿仇,哪一個起始站點與結(jié)束站點的組合最熱門)淘正?
- 每種用戶類型有多少人摆马?
- 每種性別有多少人?
- 出生年份最早的是哪一年鸿吆、最晚的是哪一年囤采,最常見的是哪一年?
三惩淳、代碼實現(xiàn)
工具:Python
文本編輯器:Pycharm
import time
import pandas as pd
import numpy as np
CITY_DATA = { 'chicago': 'chicago.csv',
'new york city': 'new_york_city.csv',
'washington': 'washington.csv' }
def get_filters():
"""
Asks user to specify a city, month, and day to analyze.
Returns:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
"""
print('Hello! Let\'s explore some US bikeshare data!')
# get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
city = input("Which city do you want to analyze? input :chicago, new york city, washington\n").lower()
while True:
if city not in CITY_DATA.keys():
city = input('Invalid input======\nwould you like to see data for chicago, '
'new youk city, or washington?')
else:
break
# get user input for month (all, january, february, ... , june)
months = ['all', 'january', 'february', 'march', 'april', 'may', 'june']
month = input("Which month data do you want to analyze蕉毯?input :all,january, february, "
"march, april, may, june\n").lower()
while True:
if month not in months:
month = input('Invalid input======\nWhich month data do you want to analyze黎泣?input :all恕刘,january, february,'
'march, april, may, june\n').lower()
else:
break
# get user input for day of week (all, monday, tuesday, ... sunday)
days = ['all', 'monday','tuesday','wednesday','thursday','friday','saturday','sunday']
day = input("Which day of week do you want to analyze? input:"
"all缤谎,monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
while True:
if day not in days:
day = input("Invalid input======\nWhich day of week do you want to analyze? input:"
"all抒倚,monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
else:
break
print('-'*40)
return city, month, day
def load_data(city, month, day):
"""
Loads data for the specified city and filters by month and day if applicable.
Args:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
Returns:
df - Pandas DataFrame containing city data filtered by month and day
"""
# load data file into a dataframe
df = pd.read_csv(CITY_DATA[city])
# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
# extract month and day of week from Start Time to create new columns
df['month'] = df['Start Time'].dt.month
df['day_of_week'] = df['Start Time'].dt.weekday_name
# filter by month if applicable
if month != 'all':
# use the index of the months list to get the corresponding int
months = ['january', 'february', 'march', 'april', 'may', 'june']
month = months.index(month) + 1
# filter by month to create the new dataframe
df = df[df['month'] == month]
# filter by day of week if applicable
if day != 'all':
# filter by day of week to create the new dataframe
df = df[df['day_of_week'] == day.title()]
return df
def time_stats(df):
"""Displays statistics on the most frequent times of travel."""
print('\nCalculating The Most Frequent Times of Travel...\n')
start_time = time.time()
# display the most common month
common_month = df['month'].mode()[0]
print('The most common month: ', common_month)
# display the most common day of week
common_day_of_week = df['day_of_week'].mode()[0]
print('The most common day of week: ', common_day_of_week)
# display the most common start hour
df['start_hour'] = df['Start Time'].dt.hour
common_start_hour = df['start_hour'].mode()[0]
print('The most common start hour: ', common_start_hour)
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
def station_stats(df):
"""Displays statistics on the most popular stations and trip."""
print('\nCalculating The Most Popular Stations and Trip...\n')
start_time = time.time()
# display most commonly used start station
common_start_station = df['Start Station'].mode()[0]
print('The most commonly used start station: ', common_start_station)
# display most commonly used end station
common_end_station = df['End Station'].mode()[0]
print('The most commonly used end station: ', common_end_station)
# display most frequent combination of start station and end station trip
df['Station'] = df['Start Station'] + df['End Station']
frequent_station = df['Station'].mode()[0]
print('The most frequent station: ', frequent_station)
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
def trip_duration_stats(df):
"""Displays statistics on the total and average trip duration."""
print('\nCalculating Trip Duration...\n')
start_time = time.time()
# display total travel time
total_travel_time = df['Trip Duration'].sum()
print('The total trabel time: ', total_travel_time)
# display mean travel time
mean_trabel_time = df['Trip Duration'].mean()
print('The mean travel time: ', mean_trabel_time)
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
def user_stats(df):
"""Displays statistics on bikeshare users."""
print('\nCalculating User Stats...\n')
start_time = time.time()
# Display counts of user types
count_user_types = df['User Type'].value_counts()
print('Counts of user types: ', count_user_types)
# Display counts of gender
try:
count_gender = df['Gender'].value_counts()
print('Counts of gender: ', count_gender)
except KeyError:
print('Counts of gender:oh sorry, this city have no this data.')
# Display earliest, most recent, and most common year of birth
try:
earliest_birth = df['Birth Year'].min()
most_recent_birth = df['Birth Year'].max()
most_common_birth = df['Birth Year'].mode()[0]
print('Earliest year of birth:',earliest_birth)
print('Most recent year of birth',most_recent_birth)
print('Most common year of birth',most_common_birth)
except KeyError:
print('oh sorry, this city have no Birth Year data.')
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
def main():
while True:
city, month, day = get_filters()
df = load_data(city, month, day)
time_stats(df)
station_stats(df)
trip_duration_stats(df)
user_stats(df)
restart = input('\nWould you like to restart? Enter yes or no.\n')
if restart.lower() != 'yes':
break
if __name__ == "__main__":
main()
四、互動式體驗
該文件是一個腳本坷澡,它接受原始輸入在終端中創(chuàng)建交互式體驗托呕,來回答有關(guān)數(shù)據(jù)集的問題。
輸入想要查看的問題:
得出答案:
Ps:腳本還可以持續(xù)地優(yōu)化频敛,這次只是做了一個簡易的版本项郊,另外還可以在腳本加入可視化的工具,輸入需要的數(shù)據(jù)斟赚,自動生成需要的圖表着降,這就不要太方便了啊啊啊啊啊^志H味础!7⑶帧=惶汀!