스파크 ml 모델에서 hdfs로 저장

스파크 ml 라이브러리에서 생성 한 개체로 모델을 저장하려고합니다. V 을 (Ljava/랭/문자열) org.apache.spark.ml.PipelineModel.save : 스레드에서스파크 ml 모델에서 hdfs로 저장

예외 "주요"java.lang.NoSuchMethodError :

그러나, 그것은 나에게 오류를주고있다 at com.sf.prediction $ .main (prediction.scala : 61) com.sf.prediction.main (prediction.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0 (기본 메소드) at sun.reflect.NativeMethodAccessorImpl .invoke (NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:606) org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain (SparkSubmit.scala : 672) at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1 (SparkSubmit. 스칼라 : 180) org.apache.spark.deploy.SparkSubmit $ .submit (SparkSubmit.scala : 205) at org.apache.spark.deploy.SparkSubmit $ .main (SparkSubmit.scala : 120) at org. 나는 또한 CSV로 모델에서 생성 된 dataframe을 저장할

<dependency> 
     <groupId>org.scalatest</groupId> 
     <artifactId>scalatest_2.10</artifactId> 
     <version>2.1.7</version> 
     <scope>test</scope> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-shade-plugin</artifactId> 
     <version>2.4.3</version> 
     <type>maven-plugin</type> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.10</artifactId> 
     <version>1.6.0</version> 
    </dependency> 
    <dependency> 
     <groupId>org.scala-lang</groupId> 
     <artifactId>scala-parser-combinators</artifactId> 
     <version>2.11.0-M4</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-sql_2.10</artifactId> 
     <version>1.6.0</version> 
    </dependency> 

    <dependency> 
     <groupId>org.apache.commons</groupId> 
     <artifactId>commons-csv</artifactId> 
     <version>1.2</version> 
    </dependency> 

    <dependency> 
     <groupId>com.databricks</groupId> 
     <artifactId>spark-csv_2.10</artifactId> 
     <version>1.4.0</version> 
    </dependency> 

    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-hive_2.10</artifactId> 
     <version>1.6.1</version> 
    </dependency> 

    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.10</artifactId> 
     <version>1.6.0</version> 
    </dependency>

: apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala는) 다음

내 의존성이다.

model.transform(df).select("features","label","prediction").show() 



import org.apache.spark.SparkConf 
import org.apache.spark.SparkContext 
import org.apache.spark.sql.SQLContext 
import org.apache.spark.sql.functions._ 

import org.apache.spark.SparkConf 

import org.apache.spark.sql.hive.HiveContext 


import org.apache.spark.ml.feature.OneHotEncoder 
import org.apache.spark.ml.feature.VectorAssembler 
import org.apache.spark.ml.classification.LogisticRegression 
import org.apache.spark.ml.Pipeline 
import org.apache.spark.ml.PipelineModel._ 
import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer} 
import org.apache.spark.ml.util.MLWritable 

object prediction { 
    def main(args: Array[String]): Unit = { 

    val conf = new SparkConf() 
      .setMaster("local[2]") 
      .setAppName("conversion") 
    val sc = new SparkContext(conf) 

    val hiveContext = new HiveContext(sc) 

    val df = hiveContext.sql("select * from prediction_test") 
    df.show() 
    val credit_indexer = new StringIndexer().setInputCol("transaction_credit_card").setOutputCol("creditCardIndex").fit(df) 
    val category_indexer = new StringIndexer().setInputCol("transaction_category").setOutputCol("categoryIndex").fit(df) 
    val location_flag_indexer = new StringIndexer().setInputCol("location_flag").setOutputCol("locationIndex").fit(df) 
    val label_indexer = new StringIndexer().setInputCol("fraud").setOutputCol("label").fit(df) 

    val assembler = new VectorAssembler().setInputCols(Array("transaction_amount", "creditCardIndex","categoryIndex","locationIndex")).setOutputCol("features") 
    val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01) 
    val pipeline = new Pipeline().setStages(Array(credit_indexer, category_indexer, location_flag_indexer, label_indexer, assembler, lr)) 

    val model = pipeline.fit(df) 

    pipeline.save("/user/f42h/prediction/pipeline") 
    model.save("/user/f42h/prediction/model") 
// val sameModel = PipelineModel.load("/user/bob/prediction/model") 
    model.transform(df).select("features","label","prediction") 

    } 
}

출처

2016-06-16 Defcon

2.0.0 아티팩트를 사용하고 있습니까? – BenFradet

나는 그렇게 생각한다. pom 파일에서 내 종속성을 추가했습니다. – Defcon

Spark 1.6.0을 사용하고 있으며 afaik는 ml 모델의 저장/불러 오기가 2.0 이상에서만 가능합니다. 버전의 이슈를 사용하여 미리보기를 사용할 수 있습니다. http://search.maven.org/#search%7Cga%7C1%7Cg%3Aorg.apache.spark%20v%3A2.0.0-preview

출처

2016-06-21 07:48:59 BenFradet

기본적으로 스파크 2.0의 안정적인 릴리즈는 아직 나오지 않았습니까? – Defcon

아직 아니지만, 곧 사용해야합니다. – BenFradet

그들은 현재 RC1을 절단 중입니다. – BenFradet

스파크 ml 모델에서 hdfs로 저장

답변

관련 문제