Chapter 6 Decision Trees

1. What is the approximate depth of a decision tree trained (without restrictions) on a training set with one million instances?

The depth of a well-balanced binary tree containing m leaves is equal to log?(m), rounded up. log? is the binary log; log?(m) = log(m)/log(2). A binary Decision Tree (one that makes only binary decisions, as is the case with all trees in Scikit-Learn) will end up more or less well balanced at the end of training, with one leaf per training instance if it is trained without restrictions. Thus, if the training set contains ome million instances, the Decision Tree will have a dept of log?(1000000) ≈ 20 (actually a bit more since te tree will generally not be perfectly well balanced).

2. Is a node’s Gini impurity generally lower or higher than its parent’s? Is it generally lower/higher, or always lower/higher?

A node's Gini impurity is generally lower than its parent's. This is due to te CART trraining algorithm's cost function, which splits eac node in a way that minimizes the weigted sum of its children's Gini impurities. However, it is possible for a node to have a higher Gini impurity than its parent, as long as this increase is more than compensated for by a decrease in the other child impurity. For example, consider a node containing four instances of class A and one of class B. Its Gini impurity is 1 – (1/5)2 – (4/5)2 = 0.32. Now suppose te dataset is one-dimensional and te instances are lined up in te following order: A, B, A,A,A. you can verify that the algorithm will split this node after the second instance, producing one child node with instances A, B, and the other child node with instances A, A, A. The first child node's Gini impurity is 1 – (1/2)2 – (1/2)2 = 0.5, which is higher than its parent's. This is compensated for by the fact that the other node is pure , so its overall weighted Gini impurity is 2/5 × 0.5 + 3/5 × 0 = 0.2, which is lower than the parent's Gini impurity.

3.If a decision tree is overfitting the training set, is it a good idea to try decreasing max_depth?

If a Decision Tree is overfitting the training set, it may be a good idea to decrease max_depth, since this will constrain the model, regrlarizing it.

4.If a decision tree is underfitting the training set, is it a good idea to try scaling the input features?

Decision Trees don't care whether or not the traning data is scaled or centered; that's one of the nice things about them. So if a Decision Tree underfits the training set, scaling the input features will just be a waste of time.

5.If it takes one hour to train a decision tree on a training set containing one million instances, roughly how much time will it take to train another decision tree on a training set containing ten million instances? Hint: consider the CART algorithm’s computational complexity.

The computational complexity of training a Decision Tree is O(n × m log?(m)). So if you multiply the training set size by 10, the training time will be multiplied by k = (n × 10 m × log?(10 m)) / (n × m × log?(m)) = 10 × log?(10 m) / log?(m). if m = 1000000, then k ≈ 11.7. so you can expect the training time to e roughtly 11.7 hours.
you can expect the traning time to e roughtly 11y hours.

6.If it takes one hour to train a decision tree on a given training set, roughly how much time will it take if you double the number of features?
if the number of features doubles, then the training time will also roughly double.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末挠轴,一起剝皮案震驚了整個濱河市妇押,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌考传,老刑警劉巖燎孟,帶你破解...
    沈念sama閱讀 218,682評論 6 507
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件臭脓,死亡現(xiàn)場離奇詭異,居然都是意外死亡吗浩,警方通過查閱死者的電腦和手機建芙,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,277評論 3 395
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來懂扼,“玉大人禁荸,你說我怎么就攤上這事右蒲。” “怎么了赶熟?”我有些...
    開封第一講書人閱讀 165,083評論 0 355
  • 文/不壞的土叔 我叫張陵瑰妄,是天一觀的道長。 經(jīng)常有香客問我钧大,道長翰撑,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,763評論 1 295
  • 正文 為了忘掉前任啊央,我火速辦了婚禮眶诈,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘瓜饥。我一直安慰自己逝撬,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 67,785評論 6 392
  • 文/花漫 我一把揭開白布乓土。 她就那樣靜靜地躺著宪潮,像睡著了一般。 火紅的嫁衣襯著肌膚如雪趣苏。 梳的紋絲不亂的頭發(fā)上狡相,一...
    開封第一講書人閱讀 51,624評論 1 305
  • 那天,我揣著相機與錄音食磕,去河邊找鬼尽棕。 笑死,一個胖子當(dāng)著我的面吹牛彬伦,可吹牛的內(nèi)容都是我干的滔悉。 我是一名探鬼主播,決...
    沈念sama閱讀 40,358評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼单绑,長吁一口氣:“原來是場噩夢啊……” “哼回官!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起搂橙,我...
    開封第一講書人閱讀 39,261評論 0 276
  • 序言:老撾萬榮一對情侶失蹤歉提,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后份氧,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體唯袄,經(jīng)...
    沈念sama閱讀 45,722評論 1 315
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,900評論 3 336
  • 正文 我和宋清朗相戀三年蜗帜,在試婚紗的時候發(fā)現(xiàn)自己被綠了恋拷。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 40,030評論 1 350
  • 序言:一個原本活蹦亂跳的男人離奇死亡厅缺,死狀恐怖蔬顾,靈堂內(nèi)的尸體忽然破棺而出宴偿,到底是詐尸還是另有隱情,我是刑警寧澤诀豁,帶...
    沈念sama閱讀 35,737評論 5 346
  • 正文 年R本政府宣布窄刘,位于F島的核電站,受9級特大地震影響舷胜,放射性物質(zhì)發(fā)生泄漏娩践。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,360評論 3 330
  • 文/蒙蒙 一烹骨、第九天 我趴在偏房一處隱蔽的房頂上張望翻伺。 院中可真熱鬧,春花似錦沮焕、人聲如沸吨岭。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,941評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽辣辫。三九已至,卻和暖如春魁巩,著一層夾襖步出監(jiān)牢的瞬間急灭,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,057評論 1 270
  • 我被黑心中介騙來泰國打工谷遂, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留化戳,地道東北人。 一個月前我還...
    沈念sama閱讀 48,237評論 3 371
  • 正文 我出身青樓埋凯,卻偏偏與公主長得像,于是被迫代替她去往敵國和親扫尖。 傳聞我的和親對象是個殘疾皇子白对,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,976評論 2 355

推薦閱讀更多精彩內(nèi)容