메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


LA가 전송한 로그 파일을 구별하여 각각의 파일로 저장하는 경우의 설정방법

 

 가. 수집되는 파일 :  /home/hadoop/log_data/log1.log, /home/hadoop/log_data/log1.log

 나. 저장되는 파일 : /home/hadoop/save_data/fs01, /home/hadoop/save_data/fs01

 다. 파일의 구분키 : state

 라. 구분값 : SMS, VOICE

 

1. 로그를 수집하는 LC의 flume설정파일(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi.properties)

lc01.sources = avroGenSrc_src01
lc01.channels = fileChannel_fc01 fileChannel_fc02
lc01.sources.avroGenSrc_src01.selector.type=multiplexing
lc01.sources.avroGenSrc_src01.selector.header = state
lc01.sources.avroGenSrc_src01.selector.mapping.SMS=fileChannel_fc01
lc01.sources.avroGenSrc_src01.selector.mapping.VOICE=fileChannel_fc02

lc01.sinks = fileSink_fs01 fileSink_fs02

# For each one of the sources, the type is defined
lc01.sources.avroGenSrc_src01.type = avro
lc01.sources.avroGenSrc_src01.bind = localhost
lc01.sources.avroGenSrc_src01.port = 5555

# The channel can be defined as follows.
lc01.sources.avroGenSrc_src01.channels = fileChannel_fc01 fileChannel_fc02

# Each sink's type must be defined
lc01.sinks.fileSink_fs01.type = file_roll
lc01.sinks.fileSink_fs01.sink.directory=/home/hadoop/save_data/fs01
lc01.sinks.fileSink_fs01.sink.rollInterval = 10
lc01.sinks.fileSink_fs01.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs01.channel = fileChannel_fc01

# Each channel's type is defined.
lc01.channels.fileChannel_fc01.type = file
lc01.channels.fileChannel_fc01.maxFileSize = 214643507
lc01.channels.fileChannel_fc01.checkpointDir = /home/hadoop/flume/fc01/checkpoint
lc01.channels.fileChannel_fc01.dataDirs = /home/hadoop/flume/fc01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc01.capacity = 100
lc01.channels.fileChannel_fc01.transactionCapacity = 10


# another setting
#lc02.sources = avroGenSrc_src02
#lc02.channels = fileChannel_fc02
#lc02.sinks = fileSink_fs02

# For each one of the sources, the type is defined
#lc02.sources.avroGenSrc_src02.type = avro
#lc02.sources.avroGenSrc_src02.bind = localhost
#lc02.sources.avroGenSrc_src02.port = 4444

# The channel can be defined as follows.
#lc01.sources.avroGenSrc_src01.channels = fileChannel_fc02
# Each sink's type must be defined
lc01.sinks.fileSink_fs02.type = file_roll
lc01.sinks.fileSink_fs02.sink.directory=/home/hadoop/save_data/fs02
lc01.sinks.fileSink_fs02.sink.rollInterval = 10
lc01.sinks.fileSink_fs02.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs02.channel = fileChannel_fc02

# Each channel's type is defined.
lc01.channels.fileChannel_fc02.type = file
lc01.channels.fileChannel_fc02.maxFileSize = 214643507
lc01.channels.fileChannel_fc02.checkpointDir = /home/hadoop/flume/fc02/checkpoint
lc01.channels.fileChannel_fc02.dataDirs = /home/hadoop/flume/fc02/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc02.capacity = 100
lc01.channels.fileChannel_fc02.transactionCapacity =10

--------------------------------

2. 로그를 전송하는 LA의 설정정보(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi-agent.properties)

: la01과 la02의 두개 설정정보가 같이 들어있고 각각을 기동하여 2개의 LA가 파일을 읽어 들어는것으로 가정함


la01.sources = execGenSrc_la01
la01.channels = fileChannel_la01
la01.sinks = avroSink_la01

# For each one of the sources, the type is defined
la01.sources.execGenSrc_la01.type = exec
la01.sources.execGenSrc_la01.command = tail -f /home/hadoop/log_data/log1.log
la01.sources.execGenSrc_la01.batchSize = 10

la01.sources.execGenSrc_la01.interceptors = i1
la01.sources.execGenSrc_la01.interceptors.i1.type=static
la01.sources.execGenSrc_la01.interceptors.i1.key=state
la01.sources.execGenSrc_la01.interceptors.i1.value = SMS

# The channel can be defined as follows.
la01.sources.execGenSrc_la01.channels = fileChannel_la01

# Each sink's type must be defined
la01.sinks.avroSink_la01.type = avro
la01.sinks.avroSink_la01.hostname=localhost
la01.sinks.avroSink_la01.port=5555
la01.sinks.avroSink_la01.batch-size = 10

#Specify the channel the sink should use
la01.sinks.avroSink_la01.channel = fileChannel_la01

# Each channel's type is defined.
la01.channels.fileChannel_la01.type = file
la01.channels.fileChannel_la01.maxFileSize = 214643507
la01.channels.fileChannel_la01.checkpointDir = /home/hadoop/flume/la01/checkpoint
la01.channels.fileChannel_la01.dataDirs = /home/hadoop/flume/la01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la01.channels.fileChannel_la01.capacity = 10000
la01.channels.fileChannel_la01.transctionCapacity = 10000


#another logagent conf....
la02.sources = execGenSrc_la02
la02.channels = fileChannel_la02
la02.sinks = avroSink_la02

# For each one of the sources, the type is defined
la02.sources.execGenSrc_la02.type = exec
la02.sources.execGenSrc_la02.command = tail -f /home/hadoop/log_data/log2.log
la02.sources.execGenSrc_la02.batchSize = 10
la02.sources.execGenSrc_la02.interceptors = i2
la02.sources.execGenSrc_la02.interceptors.i2.type=static
la02.sources.execGenSrc_la02.interceptors.i2.key= state
la02.sources.execGenSrc_la02.interceptors.i2.value = VOICE


# The channel can be defined as follows.
la02.sources.execGenSrc_la02.channels = fileChannel_la02

# Each sink's type must be defined
la02.sinks.avroSink_la02.type = avro
la02.sinks.avroSink_la02.hostname=localhost
la02.sinks.avroSink_la02.port=5555
la02.sinks.avroSink_la02.batch-size = 10

#Specify the channel the sink should use
la02.sinks.avroSink_la02.channel = fileChannel_la02

# Each channel's type is defined.
la02.channels.fileChannel_la02.type = file
la02.channels.fileChannel_la02.maxFileSize = 214643507
la02.channels.fileChannel_la02.checkpointDir = /home/hadoop/flume/la02/checkpoint
la02.channels.fileChannel_la02.dataDirs = /home/hadoop/flume/la02/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la02.channels.fileChannel_la02.capacity = 10000
la02.channels.fileChannel_la02.transctionCapacity = 10000

번호 제목 날짜 조회 수
50 JobHistory 서버 기동시 HDFS상에 특정 폴더를 생성할 수 없어서 기동하지 못하는 경우 조치 2018.05.29 5651
49 hbase shell 필드 검색 방법 2015.05.24 5702
48 System Properties Comparison Elasticsearch vs. Hive vs. Jena file 2016.03.10 5714
47 solr 6.2에 한글 형태소 분석기(arirang 6.x) 적용 및 테스트 file 2017.06.27 5749
46 "java.net.NoRouteToHostException: 호스트로 갈 루트가 없음" 오류시 확인및 조치할 사항 2016.04.01 5809
45 HBASE Client API : 기본 기능 정리 file 2013.04.01 5852
44 [CDP7.1.7]impala-shell을 이용하여 kudu table에 insert/update수행시 발생하는 오류(Transport endpoint is not connected (error 107)) 발생시 확인할 내용 2023.11.30 5873
43 Hbase Shell 명령 정리 2013.04.01 5924
42 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from...오류해결방법 2015.06.16 5943
41 HBase 설치하기 – Fully-distributed 2013.03.12 6012
40 Hadoop Cluster 설치 (Hadoop+Zookeeper+Hbase) file 2013.03.07 6097
39 checking for termcap functions library... configure: error: No curses/termcap library found 2013.03.08 6097
38 protege 설명및 사용법 file 2017.04.04 6116
37 특정파일이 생성되어야 action이 실행되는 oozie job만들기(coordinator.xml) 2014.05.20 6260
36 [impala]insert into db명.table명 select a, b from db명.table명 쿼리 수행시 "Memory limit exceeded: Failed to allocate memory for Parquet page index"오류 조치 방법 2023.05.31 6326
35 원보드pc인 bananapi를 이용하여 hadoop 클러스터 구성하기(준비물) file 2014.05.29 6431
34 hadoop및 ecosystem에서 사용되는 명령문 정리 2014.05.28 6435
» 다수의 로그 에이전트로 부터 로그를 받아 각각의 파일로 저장하는 방법(interceptor및 multiplexing) 2014.04.04 6439
32 hadoop 2.6.0 기동(에코시스템 포함)및 wordcount 어플리케이션을 이용한 테스트 2015.05.05 6517
31 Last transaction was partial에 따른 Unable to load database on disk오류 발생시 조치사항 2018.08.03 6652
위로