RandomForest建模中拍谐,入?yún)⒅邪藬?shù)值型、字符串類型的值。入模的時候啦辐,統(tǒng)一使用df.na.fill(0.0)會導致NullPointerException或者Cannot have an empty string for name卓嫂。
如果不需要string入?yún)⒖对荩苯尤サ簟?/p>
val (trainingData, testData) = splitData(featsDFWithLabel, trainingSampleRatio)
val formula = new RFormula()
.setFormula("label ~ . - user_id - relation_type - industry - is_phonenum - relation_type_definite - type - - category - mark -
name - job_first_level - job_second_level - result -
duration - phone_label - dt")
.setFeaturesCol("features")
.setLabelCol("label")
val pipelineModel: PipelineModel = getRFModel(formula, trainingData)
如果需要string類型的變量,則需要分開處理命黔。
val trainingDFNew = trainingDF.na.fill(Map("industry" -> "empty", "category" -> "empty", "phone_label" -> "empty")).na.fill(0.0)
.na.replace("industry", Map("" -> "empty"))
.na.replace("category", Map("" -> "empty"))
.na.replace("phone_label", Map("" -> "empty"))
Error:
ERROR ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: requirement failed: Cannot have an empty string for name.
java.lang.IllegalArgumentException: requirement failed: Cannot have an empty string for name.
如果string類型的值存在空值呜呐,也需要處理就斤,否則在onehot編碼時會報錯。