數(shù)據(jù)源(file/data/mllib/input/ridge-data/defDemo1):
42,0.10
43.5,0.11
45,0.12
45.5,0.13
45,0.14
47.5,0.15
49,0.16
53,0.17
50,0.18
55,0.20
55,0.21
60,0.23
代碼:
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.{ LinearRegressionWithSGD, LabeledPoint }
import org.apache.spark.{ SparkConf, SparkContext }
def main(args: Array[String]): Unit = {
val conf=new SparkConf().setMaster("local").setAppName(this.getClass().getSimpleName().filter(!_.equals('$')))
val sc=new SparkContext(conf)
val data=sc.textFile("file/data/mllib/input/ridge-data/defDemo1")//獲取數(shù)據(jù)集路徑
val parsedData=data.map { line =>//開(kāi)始對(duì)數(shù)據(jù)集處理
val parts=line.split(',')//根據(jù)逗號(hào)進(jìn)行分區(qū)
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).trim().split(' ').map(_.toDouble)))
}//轉(zhuǎn)化數(shù)據(jù)格式
//val parsedData = data.map { line => //開(kāi)始對(duì)數(shù)據(jù)集處理
//val parts = line.split(',') //根據(jù)逗號(hào)進(jìn)行分區(qū)
//LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).trim().split(' ').map(d=>{
//(d.toDouble-0.10)/(0.23-0.10) //歸一化 (x-minX)/(maxX-minX)
//})))
//} //轉(zhuǎn)化數(shù)據(jù)格式 歸一化無(wú)效果
parsedData.foreach(line=>{
println(line.label+" , "+line.features)
})
val model= LinearRegressionWithSGD.train(parsedData,1000,0.001)//建立模型
val result=model.predict(Vectors.dense(0.19))//通過(guò)模型預(yù)測(cè)模型
println("model weights:")
println(model.weights)
println("model intercept:")
println(model.intercept)
println("result:")
println(result)//打印預(yù)測(cè)結(jié)果
sc.stop
}
運(yùn)行結(jié)果:
model weights:
[0.11670307429843765]
model intercept:
0.0
result:
0.022173584116703154
實(shí)際線性函數(shù)(y=mx+n)應(yīng)該接近:y=130.835x + 28.493
當(dāng)x=0.19 時(shí),y=53.35
LinearRegressionWithSGD 執(zhí)行的結(jié)果跟實(shí)際結(jié)果函數(shù)對(duì)不上.....
相應(yīng)的數(shù)據(jù)R語(yǔ)言執(zhí)行的結(jié)果: