這篇文章本來有個前言:170528 吐槽|不算順利的半年自學讥邻,
因為太啰嗦惰匙,剔出去單獨成篇酵颁。
一篇調(diào)查
公司階段與數(shù)據(jù)分析地位/價值
sample project
平時看到題目有趣赶撰,分析手段有可取之處的project就放到這里來蝌以。
build a portfolio
Build a personal blog to hold your projects, using Python library pelican, see here
build project motivator
- from dataquest
- from analytics vidhya
- from quora
- competition websites
Learning path
-
What classes should I take if I want to become a data scientist?
- **The top voted answer by Rahul Agarwal offers an excellent course list of related topics: Mathematics, Statistics, Computer Science, Machine Learning and Distributed and parallel computing. **
-
羅文益的回答 中文世界的讀書狂魔益愈。
在一眾“商業(yè)數(shù)據(jù)運營”相關(guān)的學習路徑中發(fā)現(xiàn)這個回答梢灭,很對胃口夷家。然而缺點:- 一蒸其,純CS的人寫的,對數(shù)學库快,統(tǒng)計摸袁,算法沒涉及多少
- 二,我估計我不太想找這些中文書
萬變不離其宗义屏,內(nèi)容大概: - 計算機原理:基本原理靠汁,數(shù)據(jù)庫原理
- 計算機應用:數(shù)據(jù)獲取和清洗
- 算法:概率,統(tǒng)計闽铐,數(shù)據(jù)挖掘蝶怔,算法
-
youtube video: How I'm Learning AI and Machine Learning
Below are learnt in sequence.- 數(shù)學:
MIT OCW
Single Variable Calculus
multivariate calculus
linear algebra - AI:
MIT OCW Artificial Intelligence - Advanced mathematics
Numerical Analysis with Justin Solomon
- 數(shù)學:
-
What does it take to be a data scientist?
- Answer of Florian Goossens tells a view of programming skill:
- Junior DS: A high-level rapid prototyping language such as Python or R. I recommend python very strongly.
- Data Scientist: A low-level deployment language such as Java, C++, C#, etc.
- Senior Data Scientist: A scalable/Big Data language such as Scala/Spark
- Answer of Florian Goossens tells a view of programming skill:
-
What's the best way to learn data science as a beginner?
- Answer of Karlijn Willems mentioned some source from datacamp, along with a nice picture illustrating each step. But not as well organized as answer of **Rahul Agarwal **
what matters in working scenario
-
What Kaggle has learned from almost a million data scientists - Anthony Goldbloom (Kaggle)
Kaggle steps:- Understand data, EDA
- Come up with features(often matter more than which algorithm to choose)
- Feed data to trainning algotithm
* Excel
專欄上寫了三篇Excel的文章,比較簡單兄墅,大體介紹了Excel應用踢星,可以作為職場新人的指南。
第一篇數(shù)據(jù)分析—函數(shù)篇隙咸。主要簡單講解常用的函數(shù)沐悦,以及與之對應的SQL/Python函數(shù)成洗。
第二篇數(shù)據(jù)分析—技巧篇。主要簡單講解我認為很有新價比的功能藏否,提高工作效率瓶殃。
第三篇數(shù)據(jù)分析—實戰(zhàn)篇。主要將前兩篇的內(nèi)容以實戰(zhàn)方式進行副签,簡單地進行了一次數(shù)據(jù)分析遥椿。數(shù)據(jù)源采用了真實的爬蟲數(shù)據(jù),是5000行數(shù)據(jù)分析師崗位數(shù)據(jù)淆储。
分析思維
- 金字塔原理 (豆瓣)**修壕,
- XMind中文網(wǎng)站**
- 如何培養(yǎng)麥肯錫式的分析思維。
- 如何建立數(shù)據(jù)分析的思維框架遏考。
- 這里送三條金句:
一個業(yè)務沒有指標慈鸠,則不能增長和分析
好的指標應該是比率或比例
好的分析應該對比或關(guān)聯(lián)。
舉一個例子:我告訴你一家超市今天有1000人的客流量灌具,你會怎么分析青团?
這1000人的數(shù)量,和附件其他超市比是多是少咖楣?(對比)
這1000人的數(shù)量比昨天多還是少督笆?(對比)
1000人有多少產(chǎn)生了實際購買?(轉(zhuǎn)化比例)
路過超市诱贿,超市外的人流是多少娃肿?(轉(zhuǎn)化比例)
這是一個快速搭建分析框架的方法。如果只看1000人珠十,是看不出分析不出任何結(jié)果料扰。
comparison of training alternatives
How do I become a data scientist? An evaluation of 3 alternatives
An article comparing Master's degree, bootcamp, and MOOC, well-illustrated with stories under each circumstance.
Leave these to another day
Look into these links when have time.
- I completed CS109 course from Harvard, which other courses other than CS109 shall I take for a career as a data scientist?
- How can I become a data scientist?
-
How do I get a job as a data scientist if I have no prior experience as a data scientist?
- Josh Devlin suggest the Dataquest style --- do the real project.
- Are data science certificates worth it?
- What is your review of Coursera Data Science Specialization Track?
- Is there a data science bootcamp for someone with previous experience in the field?
- William Chen's profile and his data-science-related answers
- If I'm not a maths savant should I bother learning data science or will the hardcore maths people get all of the data science jobs?
- I am choosing between masters in data science on King's College London or Berkeley online masters. Which you would recommend?
- Which data science career track course should I take for career transition?
- Can you pay to get bad boot camp reviews removed from Course Report?
- Is a master in Data Science worth it?
- Which are some good websites to learn more about Data Science?
- 如何在業(yè)余時學數(shù)據(jù)分析?
- 你是如何走上數(shù)據(jù)分析之路的焙蹭?
- 比較喜歡數(shù)據(jù)分析方向晒杈,求問比較熱門的專業(yè)的就業(yè)前景是什么呢?
- (國內(nèi)211孔厉、985高校畢業(yè)拯钻,市場營銷專業(yè),現(xiàn)有的知識技能基礎上撰豺,是否適合做數(shù)據(jù)分析師粪般?)[https://www.zhihu.com/question/28021598/answer/40279672]
- 做數(shù)據(jù)分析不得不看的書有哪些?
other findings
-
try_jupyter
Official introduction to use R, hackshell, Python, Spark with Python, Spark with Scala with jupyter notebook from jupyter.org. -
Which are some good websites to learn more about Data Science?
Seems to me all answers are good but take time to explore and absorb. - comparison of schools' data science program. Quora might be helpful when I'm ready to pursue a master's degree.
- Here's one concerning UC Berkely
-
Class Central
A counterpart of 果殼MOOC, maybe it's more active. It also comes with a recommend system, which might be a nice thing to have. - (GrowingIO)[https://www.growingio.com/]
前 LinkedIn 商業(yè)分析高級總監(jiān)張溪夢污桦,把數(shù)據(jù)分析后臺化承包了亩歹,捂臉
20170818 discard info: certifications
--------
-
If I want to get into Data Analytics, what type of certifications should I get?
- option mentioned:
- Python/R
- excel
- SQL
- Tableau
Tableau is a data visualization software that lets you import data from multiple data sources, like Excel and SQL, and visualize that data through the use of interactive dashboards and charts. Tableau offers an in-depth training for their products and has a certification exam. This would absolutely look good.
- resource mentioned:
- k2datascience. Mainly focus on data analyst training.
- option mentioned:
20170817 discard info: about "best course on edx"
--------
20170817后記
刪掉這一段是因為:
-
這個提問的方向就錯了。
- 其一,局限范圍在edx捆憎。
- 其二舅柜,所謂的“best course”無法定義,什么叫好躲惰?
正確思路難道不是先看所需知識結(jié)構(gòu)致份,然后按照知識內(nèi)容去選擇學習材料嗎?
MITx 15.071x并不對我胃口
---------
What are the best data science courses on edX?
- Anilkumar Panda mention MITx: 15.071x The Analytics Edge. I see this course at least 5 times in several answers from other questions.
- Anant Agarwal mention Data Science for Executives, esp one of its component course Statistical Thinking for Data Science and Analytics
- Arjun Narayanan mention Microsoft DAT210x Course Info | edX
20170714 discard info: about harvard cs109
--------
20170714后記
因為謎之討厭這門課的風格:
- 不完全公開础拨、視頻卡
- 真人課堂實錄氮块、動輒幾小時浪費時間
- 口水話太多
- somehow overwhelming,上了三天就致郁(0529-0601)
趕緊放棄诡宗。
-------- 2015 video too slow --------
-------- 2016 CS109a --------
Available on canvas.harvard.edu, many page require student ID, I somehow can view some core page without an ID.
-
These three pages are organized in chronological order, acts like a good reminder of what to do next:
-
lecture and project material
Others, like homework and quizzes, most of the time are unavailable.