共享單車項目分析

簡介：隨著共享單車的星期，這次探索三大美國城市的自行車共享系統(tǒng)相關(guān)的數(shù)據(jù)：芝加哥、紐約和華盛頓特區(qū)，幫助共享單車公司得到一些關(guān)鍵性的數(shù)據(jù)信息玛歌，例如哪個起始車站最熱門，哪一趟行程最熱門等等厦幅，來對共享單車的投放給予一定幫助沾鳄。

一、分析步驟

編寫代碼導(dǎo)入數(shù)據(jù)确憨，并通過計算描述性統(tǒng)計數(shù)據(jù)回答有趣的問題译荞。
編寫一個腳本，該腳本會接受原始輸入并在終端中創(chuàng)建交互式體驗休弃，以展現(xiàn)這些統(tǒng)計信息吞歼。
提出問題
終端應(yīng)用腳本

二、提出問題

起始時間（Start Time 列）中哪個月份最常見塔猾？
起始時間中篙骡，一周的哪一天（比如 Monday, Tuesday）最常見？
起始時間中丈甸，一天當中哪個小時最常見糯俗？
總騎行時長（Trip Duration）是多久，平均騎行時長是多久睦擂？
哪個起始車站（Start Station）最熱門得湘，哪個結(jié)束車站（End Station）最熱門？
哪一趟行程最熱門（即顿仇，哪一個起始站點與結(jié)束站點的組合最熱門）淘正？
每種用戶類型有多少人摆马？
每種性別有多少人？
出生年份最早的是哪一年鸿吆、最晚的是哪一年囤采，最常見的是哪一年？

三惩淳、代碼實現(xiàn)

工具：Python
文本編輯器：Pycharm

import time
import pandas as pd
import numpy as np


CITY_DATA = { 'chicago': 'chicago.csv',
              'new york city': 'new_york_city.csv',
              'washington': 'washington.csv' }

def get_filters():
    """
    Asks user to specify a city, month, and day to analyze.

    Returns:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    """
    print('Hello! Let\'s explore some US bikeshare data!')
    # get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
    city = input("Which city do you want to analyze? input ：chicago, new york city, washington\n").lower()
    while True:
        if city not in CITY_DATA.keys():
            city = input('Invalid input======\nwould you like to see data for chicago, '
                         'new youk city, or washington?')
        else:
            break

    # get user input for month (all, january, february, ... , june)
    months = ['all', 'january', 'february', 'march', 'april', 'may', 'june']
    month = input("Which month data do you want to analyze蕉毯？input ：all，january, february, "
                  "march, april, may, june\n").lower()
    while True:
        if month not in months:
            month = input('Invalid input======\nWhich month data do you want to analyze黎泣？input ：all恕刘，january, february,'
                  'march, april, may, june\n').lower()
        else:
            break

    # get user input for day of week (all, monday, tuesday, ... sunday)
    days = ['all', 'monday','tuesday','wednesday','thursday','friday','saturday','sunday']
    day = input("Which day of week do you want to analyze? input："
                "all缤谎，monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
    while True:
        if day not in days:
            day = input("Invalid input======\nWhich day of week do you want to analyze? input："
                "all抒倚，monday, tuesday, wednesday, thursday, friday, saturday, sunday").lower()
        else:
            break

    print('-'*40)
    return city, month, day


def load_data(city, month, day):
    """
    Loads data for the specified city and filters by month and day if applicable.

    Args:
        (str) city - name of the city to analyze
        (str) month - name of the month to filter by, or "all" to apply no month filter
        (str) day - name of the day of week to filter by, or "all" to apply no day filter
    Returns:
        df - Pandas DataFrame containing city data filtered by month and day
    """
    # load data file into a dataframe
    df = pd.read_csv(CITY_DATA[city])

    # convert the Start Time column to datetime
    df['Start Time'] = pd.to_datetime(df['Start Time'])

    # extract month and day of week from Start Time to create new columns
    df['month'] = df['Start Time'].dt.month
    df['day_of_week'] = df['Start Time'].dt.weekday_name

    # filter by month if applicable
    if month != 'all':
        # use the index of the months list to get the corresponding int
        months = ['january', 'february', 'march', 'april', 'may', 'june']
        month = months.index(month) + 1

        # filter by month to create the new dataframe
        df = df[df['month'] == month]

    # filter by day of week if applicable
    if day != 'all':
        # filter by day of week to create the new dataframe
        df = df[df['day_of_week'] == day.title()]
    return df


def time_stats(df):
    """Displays statistics on the most frequent times of travel."""

    print('\nCalculating The Most Frequent Times of Travel...\n')
    start_time = time.time()

    # display the most common month
    common_month = df['month'].mode()[0]
    print('The most common month: ', common_month)

    # display the most common day of week
    common_day_of_week = df['day_of_week'].mode()[0]
    print('The most common day of week: ', common_day_of_week)

    # display the most common start hour
    df['start_hour'] = df['Start Time'].dt.hour
    common_start_hour = df['start_hour'].mode()[0]
    print('The most common start hour: ', common_start_hour)


    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def station_stats(df):
    """Displays statistics on the most popular stations and trip."""

    print('\nCalculating The Most Popular Stations and Trip...\n')
    start_time = time.time()

    # display most commonly used start station
    common_start_station = df['Start Station'].mode()[0]
    print('The most commonly used start station: ', common_start_station)

    # display most commonly used end station
    common_end_station = df['End Station'].mode()[0]
    print('The most commonly used end station: ', common_end_station)

    # display most frequent combination of start station and end station trip
    df['Station'] = df['Start Station'] + df['End Station']
    frequent_station = df['Station'].mode()[0]
    print('The most frequent station: ', frequent_station)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def trip_duration_stats(df):
    """Displays statistics on the total and average trip duration."""

    print('\nCalculating Trip Duration...\n')
    start_time = time.time()

    # display total travel time
    total_travel_time = df['Trip Duration'].sum()
    print('The total trabel time: ', total_travel_time)

    # display mean travel time
    mean_trabel_time = df['Trip Duration'].mean()
    print('The mean travel time: ', mean_trabel_time)

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def user_stats(df):
    """Displays statistics on bikeshare users."""

    print('\nCalculating User Stats...\n')
    start_time = time.time()

    # Display counts of user types
    count_user_types = df['User Type'].value_counts()
    print('Counts of user types: ', count_user_types)

    # Display counts of gender
    try:
        count_gender = df['Gender'].value_counts()
        print('Counts of gender: ', count_gender)
    except KeyError:
        print('Counts of gender:oh sorry, this city have no this data.')

    # Display earliest, most recent, and most common year of birth
    try:
        earliest_birth = df['Birth Year'].min()
        most_recent_birth = df['Birth Year'].max()
        most_common_birth = df['Birth Year'].mode()[0]
        print('Earliest year of birth:',earliest_birth)
        print('Most recent year of birth',most_recent_birth)
        print('Most common year of birth',most_common_birth)
    except KeyError:
        print('oh sorry, this city have no Birth Year data.')

    print("\nThis took %s seconds." % (time.time() - start_time))
    print('-'*40)


def main():
    while True:
        city, month, day = get_filters()
        df = load_data(city, month, day)

        time_stats(df)
        station_stats(df)
        trip_duration_stats(df)
        user_stats(df)

        restart = input('\nWould you like to restart? Enter yes or no.\n')
        if restart.lower() != 'yes':
            break


if __name__ == "__main__":
    main()

四、互動式體驗

該文件是一個腳本坷澡，它接受原始輸入在終端中創(chuàng)建交互式體驗托呕，來回答有關(guān)數(shù)據(jù)集的問題。
輸入想要查看的問題：

輸入.png

得出答案：

答案.png

Ps：腳本還可以持續(xù)地優(yōu)化频敛，這次只是做了一個簡易的版本项郊，另外還可以在腳本加入可視化的工具，輸入需要的數(shù)據(jù)斟赚，自動生成需要的圖表着降，這就不要太方便了啊啊啊啊啊＾志Ｈ味础！７⑶帧＝惶汀！