摘自:http://blog.csdn.net/sparkexpert/article/details/52871000
還可以參看:https://stackoverflow.com/questions/39517980/spark-error-unable-to-find-encoder-for-type-stored-in-a-dataset
隨著新版本的Spark已經(jīng)逐漸穩(wěn)定,最近擬將原有框架升級到spark 2.0橘沥。還是比較興奮的窗轩,特別是SQL的速度真的快了許多座咆。。
然而介陶,在其中一個(gè)操作時(shí)卻卡住了堤舒。主要是dataframe.map操作哺呜,這個(gè)之前在spark 1.X是可以運(yùn)行的,然而在spark 2.0上卻無法通過某残。。
看了提醒的問題驾锰,主要是:
******error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. resDf_upd.map(row => {******
針對這個(gè)問題,網(wǎng)上所得獲取的資料還真不多走越。不過想著肯定是dataset統(tǒng)一了datframe與rdd之后就出現(xiàn)了新的要求。
經(jīng)過查看spark官方文檔旨指,對spark有了一條這樣的描述。
Dataset is Spark SQL’s strongly-typed API for working with structured data, i.e. records with a known schema.
Datasets are lazy and structured query expressions are only triggered when an action is invoked. Internally, aDataset
represents a logical plan that describes the computation query required to produce the data (for a givenSpark SQL session).
A Dataset is a result of executing a query expression against data storage like files, Hive tables or JDBC databases. The structured query expression can be described by a SQL query, a Column-based SQL expression or a Scala/Java lambda function. And that is why Dataset operations are available in three variants.
從這可以看出谆构,要想對dataset進(jìn)行操作,需要進(jìn)行相應(yīng)的encode操作搬素。特別是官網(wǎng)給的例子
// No pre-defined encoders for Dataset[Map[K,V]], define explicitlyimplicit val mapEncoder = org.apache.spark.sql.Encoders.kryo[Map[String, Any]]// Primitive types and case classes can be also defined as// implicit val stringIntMapEncoder: Encoder[Map[String, Any]] = ExpressionEncoder()// row.getValuesMap[T] retrieves multiple columns at once into a Map[String, T]teenagersDF.map(teenager => teenager.getValuesMapAny)).collect()// Array(Map("name" -> "Justin", "age" -> 19))
從這看出呵晨,要進(jìn)行map操作熬尺,要先定義一個(gè)Encoder。粱哼。
這就增加了系統(tǒng)升級繁重的工作量了季二。為了更簡單一些,幸運(yùn)的dataset也提供了轉(zhuǎn)化RDD的操作胯舷。因此只需要將之前dataframe.map
在中間修改為:dataframe.rdd.map即可。