共享單車項目分析

簡介:隨著共享單車的星期,這次探索三大美國城市的自行車共享系統(tǒng)相關(guān)的數(shù)據(jù):芝加哥、紐約和華盛頓特區(qū),幫助共享單車公司得到一些關(guān)鍵性的數(shù)據(jù)信息玛歌,例如哪個起始車站最熱門,哪一趟行程最熱門等等厦幅,來對共享單車的投放給予一定幫助沾鳄。

一、分析步驟

  • 編寫代碼導(dǎo)入數(shù)據(jù)确憨,并通過計算描述性統(tǒng)計數(shù)據(jù)回答有趣的問題译荞。
  • 編寫一個腳本,該腳本會接受原始輸入并在終端中創(chuàng)建交互式體驗休弃,以展現(xiàn)這些統(tǒng)計信息吞歼。
  • 提出問題
  • 終端應(yīng)用腳本

二、提出問題

  • 起始時間(Start Time 列)中哪個月份最常見塔猾?
  • 起始時間中篙骡,一周的哪一天(比如 Monday, Tuesday)最常見?
  • 起始時間中丈甸,一天當中哪個小時最常見糯俗?
  • 總騎行時長(Trip Duration)是多久,平均騎行時長是多久睦擂?
  • 哪個起始車站(Start Station)最熱門得湘,哪個結(jié)束車站(End Station)最熱門?
  • 哪一趟行程最熱門(即顿仇,哪一個起始站點與結(jié)束站點的組合最熱門)淘正?
  • 每種用戶類型有多少人摆马?
  • 每種性別有多少人?
  • 出生年份最早的是哪一年鸿吆、最晚的是哪一年囤采,最常見的是哪一年?

三惩淳、代碼實現(xiàn)

工具:Python
文本編輯器:Pycharm

import time
import pandas as pd
import numpy as np


CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')
    # get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    city = input("Which city do you want to analyze? input :chicago, new york city, washington\n").lower()
    while True:
        if city not in CITY_DATA.keys():
            city = input('Invalid input======\nwould you like to see data for chicago, '
                         'new youk city, or washington?')
        else:
            break

    # get user input for month (all, january, february, ... , june)
    months = ['all', 'january', 'february', 'march', 'april', 'may', 'june']
    month = input("Which month data do you want to analyze蕉毯?input :all,january, february, "
                  "march, april, may, june\n").lower()
    while True:
        if month not in months:
            month = input('Invalid input======\nWhich month data do you want to analyze黎泣?input :all恕刘,january, february,'
                  'march, april, may, june\n').lower()
        else:
            break

    # get user input for day of week (all, monday, tuesday, ... sunday)
    days = ['all', 'monday','tuesday','wednesday','thursday','friday','saturday','sunday']
    day = input("Which day of week do you want to analyze? input:"
                "all缤谎,monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
    while True:
        if day not in days:
            day = input("Invalid input======\nWhich day of week do you want to analyze? input:"
                "all抒倚,monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
        else:
            break

    print('-'*40)
    return city, month, day


def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name

    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1

        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]
    return df


def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month
    common_month = df['month'].mode()[0]
    print('The most common month: ', common_month)

    # display the most common day of week
    common_day_of_week = df['day_of_week'].mode()[0]
    print('The most common day of week: ', common_day_of_week)

    # display the most common start hour
    df['start_hour'] = df['Start Time'].dt.hour
    common_start_hour = df['start_hour'].mode()[0]
    print('The most common start hour: ', common_start_hour)


    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station
    common_start_station = df['Start Station'].mode()[0]
    print('The most commonly used start station: ', common_start_station)

    # display most commonly used end station
    common_end_station = df['End Station'].mode()[0]
    print('The most commonly used end station: ', common_end_station)

    # display most frequent combination of start station and end station trip
    df['Station'] = df['Start Station'] + df['End Station']
    frequent_station = df['Station'].mode()[0]
    print('The most frequent station: ', frequent_station)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time
    total_travel_time = df['Trip Duration'].sum()
    print('The total trabel time: ', total_travel_time)

    # display mean travel time
    mean_trabel_time = df['Trip Duration'].mean()
    print('The mean travel time: ', mean_trabel_time)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # Display counts of user types
    count_user_types = df['User Type'].value_counts()
    print('Counts of user types: ', count_user_types)

    # Display counts of gender
    try:
        count_gender = df['Gender'].value_counts()
        print('Counts of gender: ', count_gender)
    except KeyError:
        print('Counts of gender:oh sorry, this city have no this data.')

    # Display earliest, most recent, and most common year of birth
    try:
        earliest_birth = df['Birth Year'].min()
        most_recent_birth = df['Birth Year'].max()
        most_common_birth = df['Birth Year'].mode()[0]
        print('Earliest year of birth:',earliest_birth)
        print('Most recent year of birth',most_recent_birth)
        print('Most common year of birth',most_common_birth)
    except KeyError:
        print('oh sorry, this city have no Birth Year data.')

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
    main()

四、互動式體驗

該文件是一個腳本坷澡,它接受原始輸入在終端中創(chuàng)建交互式體驗托呕,來回答有關(guān)數(shù)據(jù)集的問題。
輸入想要查看的問題:

輸入.png

得出答案:
答案.png

Ps:腳本還可以持續(xù)地優(yōu)化频敛,這次只是做了一個簡易的版本项郊,另外還可以在腳本加入可視化的工具,輸入需要的數(shù)據(jù)斟赚,自動生成需要的圖表着降,這就不要太方便了啊啊啊啊啊^志H味础!7⑶帧=惶汀!

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末刃鳄,一起剝皮案震驚了整個濱河市盅弛,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌叔锐,老刑警劉巖挪鹏,帶你破解...
    沈念sama閱讀 216,372評論 6 498
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異愉烙,居然都是意外死亡讨盒,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,368評論 3 392
  • 文/潘曉璐 我一進店門齿梁,熙熙樓的掌柜王于貴愁眉苦臉地迎上來催植,“玉大人肮蛹,你說我怎么就攤上這事〈茨希” “怎么了伦忠?”我有些...
    開封第一講書人閱讀 162,415評論 0 353
  • 文/不壞的土叔 我叫張陵,是天一觀的道長稿辙。 經(jīng)常有香客問我昆码,道長,這世上最難降的妖魔是什么邻储? 我笑而不...
    開封第一講書人閱讀 58,157評論 1 292
  • 正文 為了忘掉前任赋咽,我火速辦了婚禮,結(jié)果婚禮上吨娜,老公的妹妹穿的比我還像新娘脓匿。我一直安慰自己,他們只是感情好宦赠,可當我...
    茶點故事閱讀 67,171評論 6 388
  • 文/花漫 我一把揭開白布陪毡。 她就那樣靜靜地躺著,像睡著了一般勾扭。 火紅的嫁衣襯著肌膚如雪毡琉。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,125評論 1 297
  • 那天妙色,我揣著相機與錄音桅滋,去河邊找鬼。 笑死身辨,一個胖子當著我的面吹牛丐谋,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播栅表,決...
    沈念sama閱讀 40,028評論 3 417
  • 文/蒼蘭香墨 我猛地睜開眼笋鄙,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了怪瓶?” 一聲冷哼從身側(cè)響起萧落,我...
    開封第一講書人閱讀 38,887評論 0 274
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎洗贰,沒想到半個月后找岖,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,310評論 1 310
  • 正文 獨居荒郊野嶺守林人離奇死亡敛滋,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,533評論 2 332
  • 正文 我和宋清朗相戀三年许布,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片绎晃。...
    茶點故事閱讀 39,690評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡蜜唾,死狀恐怖杂曲,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情袁余,我是刑警寧澤擎勘,帶...
    沈念sama閱讀 35,411評論 5 343
  • 正文 年R本政府宣布,位于F島的核電站颖榜,受9級特大地震影響棚饵,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜掩完,卻給世界環(huán)境...
    茶點故事閱讀 41,004評論 3 325
  • 文/蒙蒙 一噪漾、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧且蓬,春花似錦欣硼、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,659評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至存淫,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間沼填,已是汗流浹背桅咆。 一陣腳步聲響...
    開封第一講書人閱讀 32,812評論 1 268
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留坞笙,地道東北人岩饼。 一個月前我還...
    沈念sama閱讀 47,693評論 2 368
  • 正文 我出身青樓,卻偏偏與公主長得像薛夜,于是被迫代替她去往敵國和親籍茧。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 44,577評論 2 353