메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


Spark에서 KafkaUtils.createStream을 이용하여 Kafka의 data를 가져올때 StorageLevel을 StorageLevel.MEMORY_ONLY()로 하는 경우 "Could not compute split, block input-0-1517397051800 not found"형태의 오류가 발생하는데 이는 Spark가 메모리 부족 상황이 되면 해당 데이타를 버리기 때문에 문제가 발생한다.

이때는 StorageLevel.MEMORY_ONLY()을 StorageLevel.MEMORY_AND_DISK_SER()로 변경해준다.



-------------소스코드 일부분-----

JavaPairReceiverInputDStream<byte[], byte[]> kafkaStream = KafkaUtils.createStream(jssc,byte[].class, byte[].class, kafka.serializer.DefaultDecoder.class, kafka.serializer.DefaultDecoder.class,
        conf, topic, StorageLevel.MEMORY_AND_DISK_SER());
JavaDStream<byte[]> lines = kafkaStream.map(tuple2 -> tuple2._2());


-----------------------------------오류 메세지------------------

[2018-01-31 20:17:26,404] [internal.Logging$class] [logError(#70)] [ERROR] Task 0 in stage 1020.0 failed 1 times; aborting job
[2018-01-31 20:17:26,404] [internal.Logging$class] [logError(#91)] [ERROR] Error running job streaming job 1517397060000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1020.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1020.0 (TID 1020, localhost, executor driver): java.lang.Exception: Could not compute split, block input-0-1517397051800 not found
        at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:50)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:734)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:733)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:255)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Could not compute split, block input-0-1517397051800 not found
        at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:50)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        ... 3 more
[2018-01-31 20:17:26,415] [onem2m.AvroOneM2MDataSparkSubscribe$ConsumerT] [go(#142)] [DEBUG] count data from kafka broker stream in AvroOneM2MDataSparkSubscribe: 39981
[2018-01-31 20:17:29,039] [sf.QueryServiceFactory] [create(#28)] [DEBUG] query gubun : FUSEKISPARQL
[2018-01-31 20:17:29,040] [sf.QueryCommon] [makeFinal(#44)] [DEBUG] Count : 0 , Vals : [] 
[2018-01-31 20:17:29,040] [sf.SparqlFusekiQueryImpl] [runModifySparql(#162)] [DEBUG] runModifySparql() on DatWarehouse server start.................................. 
[2018-01-31 20:17:29,040] [sf.SparqlFusekiQueryImpl] [runModifySparql(#165)] [DEBUG] try (first).................................. 
[2018-01-31 20:17:29,042] [sf.SparqlFusekiQueryImpl] [runModifySparql(#207)] [DEBUG] runModifySparql() on DataWarehouse server end.................................. 
[2018-01-31 20:17:29,042] [sf.SparqlFusekiQueryImpl] [runModifySparql(#212)] [DEBUG] runModifySparql() on DataMart server start.................................. 
[2018-01-31 20:17:29,043] [sf.SparqlFusekiQueryImpl] [runModifySparql(#224)] [DEBUG] runModifySparql() on DataMart server end.................................. 
[2018-01-31 20:17:29,044] [sf.QueryCommon] [makeFinal(#44)] [DEBUG] Count : 0 , Vals : [] 
[2018-01-31 20:17:29,044] [sf.SparqlFusekiQueryImpl] [runModifySparql(#162)] [DEBUG] runModifySparql() on DatWarehouse server start.................................. 
[2018-01-31 20:17:29,044] [sf.SparqlFusekiQueryImpl] [runModifySparql(#165)] [DEBUG] try (first).................................. 
[2018-01-31 20:17:29,045] [sf.SparqlFusekiQueryImpl] [runModifySparql(#207)] [DEBUG] runModifySparql() on DataWarehouse server end.................................. 
[2018-01-31 20:17:29,045] [sf.SparqlFusekiQueryImpl] [runModifySparql(#212)] [DEBUG] runModifySparql() on DataMart server start.................................. 
[2018-01-31 20:17:29,046] [sf.SparqlFusekiQueryImpl] [runModifySparql(#224)] [DEBUG] runModifySparql() on DataMart server end.................................. 
[2018-01-31 20:17:29,047] [sf.QueryCommon] [makeFinal(#44)] [DEBUG] Count : 0 , Vals : [] 
[2018-01-31 20:17:29,047] [sf.SparqlFusekiQueryImpl] [runModifySparql(#212)] [DEBUG] runModifySparql() on DataMart server start.................................. 
[2018-01-31 20:17:29,049] [sf.SparqlFusekiQueryImpl] [runModifySparql(#224)] [DEBUG] runModifySparql() on DataMart server end.................................. 
[2018-01-31 20:17:29,049] [sf.TripleService] [makeTripleFile(#333)] [INFO] makeTripleFile start==========================>
[2018-01-31 20:17:29,049] [sf.TripleService] [makeTripleFile(#334)] [DEBUG] makeTripleFile ========triple_path_file=================>/svc/apps/sda/triples/20180131/AvroOneM2MDataSparkSubscribe_TT20180131T201100S0000000991_WRK20180131T201729.nt
[2018-01-31 20:17:29,170] [sf.TripleService] [makeTripleFile(#346)] [INFO] makeTripleFile end==========================>
[2018-01-31 20:17:29,170] [onem2m.AvroOneM2MDataSparkSubscribe] [sendTriples(#288)] [INFO] Sending triples in com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe to DW start.......................
[2018-01-31 20:17:29,170] [sf.TripleService] [sendTripleFileToDW(#382)] [INFO] sendTripleFile to DW start==========================>
[2018-01-31 20:17:29,170] [sf.TripleService] [sendTripleFileToDW(#383)] [DEBUG] sendTripleFile ==============triple_path_file============>/svc/apps/sda/triples/20180131/AvroOneM2MDataSparkSubscribe_TT20180131T201100S0000000991_WRK20180131T201729.nt
[2018-01-31 20:17:29,171] [sf.TripleService] [sendTripleFileToDW(#396)] [DEBUG] sendTripleFile ==============args============>/svc/apps/sda/bin/fuseki/bin/s-post http://166.104.112.43:23030/icbms default /svc/apps/sda/triples/20180131/AvroOneM2MDataSparkSubscribe_TT20180131T201100S0000000991_WRK20180131T201729.nt 
[2018-01-31 20:17:29,171] [sf.TripleService] [sendTripleFileToDW(#399)] [DEBUG] try (first).......................
[2018-01-31 20:17:36,950] [util.Utils] [runShell(#737)] [DEBUG] Thread stdMsgT Status : TERMINATED
[2018-01-31 20:17:36,951] [util.Utils] [runShell(#738)] [DEBUG] Thread errMsgT Status : TERMINATED
[2018-01-31 20:17:36,951] [util.Utils] [runShell(#743)] [DEBUG] notTimeOver ==========================>true
[2018-01-31 20:17:36,951] [sf.TripleService] [sendTripleFileToDW(#402)] [DEBUG] resultStr in TripleService.sendTripleFileToDW() == > [, ]
[2018-01-31 20:17:36,951] [sf.TripleService] [sendTripleFileToDW(#433)] [INFO] sendTripleFile to DW  end==========================>
[2018-01-31 20:17:36,951] [onem2m.AvroOneM2MDataSparkSubscribe] [sendTriples(#290)] [INFO] Sending triples in com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe to DW end.......................
[2018-01-31 20:17:36,951] [onem2m.AvroOneM2MDataSparkSubscribe] [sendTriples(#293)] [INFO] Sending triples in com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe to Halyard start.......................
[2018-01-31 20:17:36,952] [sf.TripleService] [sendTripleFileToHalyard(#486)] [INFO] sendTripleFile to Halyard  start==========================>
[2018-01-31 20:17:37,294] [sf.QueryServiceFactory] [create(#31)] [DEBUG] query gubun : HALYARDSPARQL
[2018-01-31 20:17:37,317] [sf.SparqlHalyardQueryImpl] [insertByPost(#189)] [DEBUG] ------------------------insertByPost-----start-----------------------
[2018-01-31 20:17:37,317] [sf.SparqlHalyardQueryImpl] [insertByPost(#198)] [DEBUG] ------------------------insertByPost-----end-----------------------

번호 제목 날짜 조회 수
501 compile할때와 exclude할때 대상을 표현하는 명칭이 다르므로 주의할것 2016.08.10 667
500 Kafka Offset Monitor로 kafka 상태 모니터링 하기 file 2016.11.08 654
499 동시에 많은 요청이 endpoint로 몰려서java.net.NoRouteToHostException가 발생하는 경우의 처리방법 2016.10.17 654
498 Job이 끝난 log을 볼수 있도록 설정하기 2016.05.30 651
497 kudu table와 impala(hive) table정보가 틀어져서 테이블을 읽지 못하는 경우(Error Loading Metadata) 조치방법 2023.11.10 649
496 Error: Could not find or load main class nodemnager 가 발생할때 해결하는 방법 2015.06.05 646
495 [Kudu]ERROR: Unable to advance iterator for node with id '2' for Kudu table 'impala::core.pm0_abdasubjct': Network error: recv error from unknown peer: Transport endpoint is not connected (error 107) 2023.03.16 644
494 db를 통째로 새로운 이름의 db로 복사하는 방법/절차 2017.11.14 638
493 Impala Admission Control 설정시 쿼리가 사용하는 메모리 사용량 판단 방법 2023.05.19 637
492 Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 TaskAttempt killed because it ran on unusable node 오류시 조치방법 2017.04.06 632
491 Error: java.lang.RuntimeException: java.lang.OutOfMemoryError 오류가 발생하는 경우 2018.09.20 631
490 spark-submit 실행시 "java.lang.OutOfMemoryError: Java heap space"발생시 조치사항 2018.02.01 631
489 spark-shell실행시 "A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection."오류가 발생하는 경우 해결방법 2016.05.20 625
488 Scala에서 countByWindow를 이용하기(예제) 2018.03.08 612
487 spark client프로그램 기동시 "Error initializing SparkContext"오류 발생할때 조치사항 2016.05.27 610
486 java.util.NoSuchElementException발생시 조치 2014.08.27 610
485 외부 기기(usb, 하드)등 mount(연결)하기 2014.08.03 610
484 hive기동시 Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D 오류 발생시 조치사항 2016.09.25 608
483 Query 1234:1234 expired due to client inactivity(timeout is 5m)및 invalid query handle 2022.06.10 600
482 [Hive canary]Hive에 Metastore canary red alert및 hive log파일에 Duplicate entry '123456' for key 'NOTIFICATION_LOG_EVENT_ID'가 발생시 조치사항 2023.03.10 594
위로