劉鐵巖
- 微軟亞研院副院長箩兽,首席研究員
Key Technical Area
- Computer Vision
ImageNet 11-12年深度學(xué)習(xí)
ResNet(Residual Network)技術(shù) - Speech
語音識別
16年年底双饥,微軟把speech recognition word error rate 降低到5.1%(Magic Number耗啦,人類的錯誤率) - Natural Language
機器翻譯水準(zhǔn)尚還低于人類创坞,但距離不遠
翻譯準(zhǔn)確率的量化?N-gram可以粗糙地衡量
業(yè)內(nèi)認為一年后裹赴,可以超過同聲傳譯的專家 - Games
Alphago
Key Industries
Security
公安領(lǐng)域尔许,交通領(lǐng)域
技術(shù):人體分析么鹤,車輛分析,行為分析
Industry Trend
資本流向技術(shù)
博鰲亞洲論壇——Face++安保
Autonomous Drive
Google,Baidu,Mobileye,Tesla,Benz,BMW
最主要的問題:復(fù)雜路況母债,道德午磁,法律條款
無人駕駛的車撞人的責(zé)任尝抖?
Industry Trend
Baidu:阿波羅計劃
Google:200w mile路測數(shù)據(jù)
Mobileye:3000wkm路測數(shù)據(jù)
Tesla:16年與Mobileye停止合作
Healthcare
最數(shù)字化(早已經(jīng)是計算機輔助的技術(shù)毡们,血常規(guī),CT..)
- 基于大數(shù)據(jù)(CT昧辽,核磁共振)的輔助診斷系統(tǒng)
- 醫(yī)療知識圖譜
- 智能醫(yī)療顧問
- 基因工程
- 制藥衙熔、免疫
Deep Learning
An end-to-end learning approach that uses a highly complex model(nonlinear,multi-layer) to fit the training data from scratch.
做Genomics不需要先學(xué)幾年生物
LightGBM
速度快于XGBoost
Basic Machine Learning Concepts
- The goal:To learn a model from experiences/data
Training data
model - Test/inference/prediction
- Validation sets for hyperparameter tuning
- Training:empirical loss minimization
Loss Function L
1.Linear regression
2.SVM
3.Maximum likelihood
Biological Motivation and Connections
Dendrite 樹突
Synapse 突觸
Axon 軸突,輸出信號
Perceptron
Feedforward Neural Networks
有界連續(xù)函數(shù)可以被深度神經(jīng)網(wǎng)絡(luò)完美逼近(要有隱層)Universal Approximation Theorem
Hidden Unites: Sigmoid and Tangenth
Sigmoid: f(x)=1/(1+e^(-x))
Rectified Linear Units
Loss Function
交叉商
Gradient Descent
GD肯定可以收斂搅荞,計算量很大
SGD(隨機梯度下降法)红氯,過程快很多,是對整體的無偏估計
SGD也有問題:可能方差非常大咕痛,掩蓋收斂過程的小的抖動痢甘,不能保證收斂性
定義一個Learning Rate,平方階求和收斂
實際上使用的是折中的辦法——Minibatch SGD
以上的都是基本方法
現(xiàn)在用了很多技巧和改進
比如Momentum SGD,Nesterov Momentum
AdaGrad
Adam
Regularization for deep learning
Overfitting
Generalization gap
DropOut:Prevents units from co-adapting too much
Batch Normalization:The distribution of each layer's inputs changes during training帶參數(shù)的歸一化
Weight decay(or L^2 parameter norm penalty)
Early Stopping
Convolutional neural networks
局部連接
模擬人的模式識別的過程
卷積核:SGD學(xué)出來
Pooling:Reduce dimension
An example:VGG
- Gradient Vanishing
深層神經(jīng)網(wǎng)絡(luò)茉贡,梯度求不出來
Sigmoid求導(dǎo)數(shù)小于等于0.5,深層求導(dǎo)相乘塞栅,會變得很小
解決:Residual Network(ResNet) - What's Missing?
Feedforward network and CNN
However, many applications involve sequences with variable lengths
Recurrent Neural Networks(RNN)
We can process a sequence of vectors x by applying a recurrence formula at every time step
記憶上一層的輸入
- Many to One:輸入序列当船,輸出單一標(biāo)量
- One to Many:輸入單一向量身冀,輸出序列(例如:看圖寫話)
- Many to many:Language Modeling (聯(lián)想下一個詞).Encoder-Decoder for Sequence Generation.
同樣的問題:網(wǎng)絡(luò)過長
解決:Long Short Term Memory
Deep learning toolkits
- Tensorflow(Google)
- Caffe(UC Berkeley)
- CNTK(Microsoft)
- MAXNET(Amazon)
- Torch7(NYU/Facebook)
- Theano(U Mnotreal)
圖像分類:Caffe Torch
文本:Theano
大規(guī)模:CNTK
豐富性:Tensorflow
Advanced topics in deep learning
Challenging of deep learning
- Relying on Big Training Data
- Relying on Big Computation
- Modify Coefficients
- Lack of interpretability
黑盒子?白盒子寺擂? - Lack of Diverse Tech Roadmaps
NIPS愉粤,ICML越來越多的論文是Deep Leaning - Overlooking Differences between Animal and Human
解決的是函數(shù)擬合問題砾医,離真正的智能還很遠
Dual learning
- A New View:The Beauty of Symmetry
Dual Learning from with 10% bilingual data (2016 NIPS)
Lightweight deep learning
Light RNN
Distributed deep learning
Convex Problems
Universal Approximation Theorem只是存在性命題