메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


StringBuffer의 값을 toString()을 이용하여 문자열로 변환할때 "java.lang.OutOfMemoryError: Java heap space"가 발생하는데 이것은 StringBuffer.toString()하는 과정에서 값을 복사하는데 이때 heap메모리가 부족해서 발생하는 오류이다.

이때는 spark-submit에서 --driver-memory 5g처럼 지정하는 메모리를 크게 증가시켜서 -Xmx값을 증가시켜준다.


------------------오류내용------------------------

[2018-02-01 10:12:40,253] [internal.Logging$class] [logError(#70)] [ERROR] Task 0 in stage 20.0 failed 1 times; aborting job
[2018-02-01 10:12:40,267] [internal.Logging$class] [logError(#91)] [ERROR] Error running job streaming job 1517447030000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 1 times, most recent failure: Lost task 0.0 in stage 20.0 (TID 20, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.StringBuffer.toString(StringBuffer.java:671)
        at com.pineone.icbms.sda.sf.TripleService.sendTripleFileToHalyard(TripleService.java:500)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.sendTriples(AvroOneM2MDataSparkSubscribe.java:296)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.access$100(AvroOneM2MDataSparkSubscribe.java:34)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$ConsumerT.go(AvroOneM2MDataSparkSubscribe.java:202)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:101)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:93)
        at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:734)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:733)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:255)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.StringBuffer.toString(StringBuffer.java:671)
        at com.pineone.icbms.sda.sf.TripleService.sendTripleFileToHalyard(TripleService.java:500)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.sendTriples(AvroOneM2MDataSparkSubscribe.java:296)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.access$100(AvroOneM2MDataSparkSubscribe.java:34)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$ConsumerT.go(AvroOneM2MDataSparkSubscribe.java:202)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:101)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:93)
        at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        ... 3 more

번호 제목 날짜 조회 수
506 [우분투] suppoie 채굴 프로세스 발생시 자동으로 삭제하는 shell프로그램 2018.04.01 1347
505 Impala daemon기동시 "Could not create temporary timezone file"오류 발생시 조치사항 2018.03.29 1442
504 각 서버에 설치되는 cloudera서비스 프로그램 목록(CDH 5.14.0의 경우) 2018.03.29 1081
503 Cloudera설치중 실패로 여러번 설치하는 과정에 "Running in non-interactive mode, and data appears to exist in Storage Directory /dfs/nn. Not formatting." 오류가 발생시 조치하는 방법 2018.03.29 1507
502 Cloudera설치중에 "Error, CM server guid updated"오류 발생시 조치방법 2018.03.29 677
501 Cloudera가 사용하는 서비스별 포트 2018.03.29 1470
500 Cloudera가 사용하는 서비스별 디렉토리 2018.03.29 1067
499 cloudera-scm-agent 설정파일 위치및 재시작 명령문 2018.03.29 1392
498 [CentOS] 네트워크 설정 2018.03.26 1054
497 Components of the Impala Server 2018.03.21 917
496 HDFS Balancer설정및 수행 2018.03.21 1108
495 hadoop 클러스터 실행 스크립트 정리 2018.03.20 2303
494 HA(Namenode, ResourceManager, Kerberos) 및 보안(Zookeeper, Hadoop) 2018.03.16 581
493 자주쓰는 유용한 프로그램 2018.03.16 1866
492 에러 추적(Error Tracking) 및 로그 취합(logging aggregation) 시스템인 Sentry 설치 2018.03.14 753
491 update 샘플 2018.03.12 1819
490 이미지 관리 오픈소스 목록 2018.03.11 1049
489 Scala에서 countByWindow를 이용하기(예제) 2018.03.08 1437
488 Scala를 이용한 Streaming예제 2018.03.08 1379
487 scala application 샘플소스(SparkSession이용) 2018.03.07 1547
위로