Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
LA가 전송한 로그 파일을 구별하여 각각의 파일로 저장하는 경우의 설정방법
가. 수집되는 파일 : /home/hadoop/log_data/log1.log, /home/hadoop/log_data/log1.log
나. 저장되는 파일 : /home/hadoop/save_data/fs01, /home/hadoop/save_data/fs01
다. 파일의 구분키 : state
라. 구분값 : SMS, VOICE
1. 로그를 수집하는 LC의 flume설정파일(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi.properties)
lc01.sources = avroGenSrc_src01
lc01.channels = fileChannel_fc01 fileChannel_fc02
lc01.sources.avroGenSrc_src01.selector.type=multiplexing
lc01.sources.avroGenSrc_src01.selector.header = state
lc01.sources.avroGenSrc_src01.selector.mapping.SMS=fileChannel_fc01
lc01.sources.avroGenSrc_src01.selector.mapping.VOICE=fileChannel_fc02
lc01.sinks = fileSink_fs01 fileSink_fs02
# For each one of the sources, the type is defined
lc01.sources.avroGenSrc_src01.type = avro
lc01.sources.avroGenSrc_src01.bind = localhost
lc01.sources.avroGenSrc_src01.port = 5555
# The channel can be defined as follows.
lc01.sources.avroGenSrc_src01.channels = fileChannel_fc01 fileChannel_fc02
# Each sink's type must be defined
lc01.sinks.fileSink_fs01.type = file_roll
lc01.sinks.fileSink_fs01.sink.directory=/home/hadoop/save_data/fs01
lc01.sinks.fileSink_fs01.sink.rollInterval = 10
lc01.sinks.fileSink_fs01.sink.batchSize = 10
#Specify the channel the sink should use
lc01.sinks.fileSink_fs01.channel = fileChannel_fc01
# Each channel's type is defined.
lc01.channels.fileChannel_fc01.type = file
lc01.channels.fileChannel_fc01.maxFileSize = 214643507
lc01.channels.fileChannel_fc01.checkpointDir = /home/hadoop/flume/fc01/checkpoint
lc01.channels.fileChannel_fc01.dataDirs = /home/hadoop/flume/fc01/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc01.capacity = 100
lc01.channels.fileChannel_fc01.transactionCapacity = 10
# another setting
#lc02.sources = avroGenSrc_src02
#lc02.channels = fileChannel_fc02
#lc02.sinks = fileSink_fs02
# For each one of the sources, the type is defined
#lc02.sources.avroGenSrc_src02.type = avro
#lc02.sources.avroGenSrc_src02.bind = localhost
#lc02.sources.avroGenSrc_src02.port = 4444
# The channel can be defined as follows.
#lc01.sources.avroGenSrc_src01.channels = fileChannel_fc02
# Each sink's type must be defined
lc01.sinks.fileSink_fs02.type = file_roll
lc01.sinks.fileSink_fs02.sink.directory=/home/hadoop/save_data/fs02
lc01.sinks.fileSink_fs02.sink.rollInterval = 10
lc01.sinks.fileSink_fs02.sink.batchSize = 10
#Specify the channel the sink should use
lc01.sinks.fileSink_fs02.channel = fileChannel_fc02
# Each channel's type is defined.
lc01.channels.fileChannel_fc02.type = file
lc01.channels.fileChannel_fc02.maxFileSize = 214643507
lc01.channels.fileChannel_fc02.checkpointDir = /home/hadoop/flume/fc02/checkpoint
lc01.channels.fileChannel_fc02.dataDirs = /home/hadoop/flume/fc02/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc02.capacity = 100
lc01.channels.fileChannel_fc02.transactionCapacity =10
--------------------------------
2. 로그를 전송하는 LA의 설정정보(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi-agent.properties)
: la01과 la02의 두개 설정정보가 같이 들어있고 각각을 기동하여 2개의 LA가 파일을 읽어 들어는것으로 가정함
la01.sources = execGenSrc_la01
la01.channels = fileChannel_la01
la01.sinks = avroSink_la01
# For each one of the sources, the type is defined
la01.sources.execGenSrc_la01.type = exec
la01.sources.execGenSrc_la01.command = tail -f /home/hadoop/log_data/log1.log
la01.sources.execGenSrc_la01.batchSize = 10
la01.sources.execGenSrc_la01.interceptors = i1
la01.sources.execGenSrc_la01.interceptors.i1.type=static
la01.sources.execGenSrc_la01.interceptors.i1.key=state
la01.sources.execGenSrc_la01.interceptors.i1.value = SMS
# The channel can be defined as follows.
la01.sources.execGenSrc_la01.channels = fileChannel_la01
# Each sink's type must be defined
la01.sinks.avroSink_la01.type = avro
la01.sinks.avroSink_la01.hostname=localhost
la01.sinks.avroSink_la01.port=5555
la01.sinks.avroSink_la01.batch-size = 10
#Specify the channel the sink should use
la01.sinks.avroSink_la01.channel = fileChannel_la01
# Each channel's type is defined.
la01.channels.fileChannel_la01.type = file
la01.channels.fileChannel_la01.maxFileSize = 214643507
la01.channels.fileChannel_la01.checkpointDir = /home/hadoop/flume/la01/checkpoint
la01.channels.fileChannel_la01.dataDirs = /home/hadoop/flume/la01/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la01.channels.fileChannel_la01.capacity = 10000
la01.channels.fileChannel_la01.transctionCapacity = 10000
#another logagent conf....
la02.sources = execGenSrc_la02
la02.channels = fileChannel_la02
la02.sinks = avroSink_la02
# For each one of the sources, the type is defined
la02.sources.execGenSrc_la02.type = exec
la02.sources.execGenSrc_la02.command = tail -f /home/hadoop/log_data/log2.log
la02.sources.execGenSrc_la02.batchSize = 10
la02.sources.execGenSrc_la02.interceptors = i2
la02.sources.execGenSrc_la02.interceptors.i2.type=static
la02.sources.execGenSrc_la02.interceptors.i2.key= state
la02.sources.execGenSrc_la02.interceptors.i2.value = VOICE
# The channel can be defined as follows.
la02.sources.execGenSrc_la02.channels = fileChannel_la02
# Each sink's type must be defined
la02.sinks.avroSink_la02.type = avro
la02.sinks.avroSink_la02.hostname=localhost
la02.sinks.avroSink_la02.port=5555
la02.sinks.avroSink_la02.batch-size = 10
#Specify the channel the sink should use
la02.sinks.avroSink_la02.channel = fileChannel_la02
# Each channel's type is defined.
la02.channels.fileChannel_la02.type = file
la02.channels.fileChannel_la02.maxFileSize = 214643507
la02.channels.fileChannel_la02.checkpointDir = /home/hadoop/flume/la02/checkpoint
la02.channels.fileChannel_la02.dataDirs = /home/hadoop/flume/la02/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la02.channels.fileChannel_la02.capacity = 10000
la02.channels.fileChannel_la02.transctionCapacity = 10000
댓글 0
번호 | 제목 | 날짜 | 조회 수 |
---|---|---|---|
42 | json 값 다루기 | 2014.04.17 | 1425 |
41 | 통계자료 구할수 있는 곳 | 2014.04.16 | 2044 |
40 | column family삭제시 Column family 'delete' does not exist오류 발생하는 경우 | 2014.04.14 | 1058 |
39 | hive에서 생성된 external table에서 hbase의 table에 값 insert하기 | 2014.04.11 | 1865 |
38 | Oozie 설치, 환경설정 및 테스트 | 2014.04.08 | 1737 |
» | 다수의 로그 에이전트로 부터 로그를 받아 각각의 파일로 저장하는 방법(interceptor및 multiplexing) | 2014.04.04 | 4202 |
36 | external partition table생성및 data확인 | 2014.04.03 | 1590 |
35 | 동일서버에서 LA와 LC동시에 기동하여 테스트 | 2014.04.01 | 1105 |
34 | 의사분산모드에서 presto설치하기 | 2014.03.31 | 3306 |
33 | Hive Query Examples from test code (2 of 2) | 2014.03.26 | 11535 |
32 | Hive Query Examples from test code (1 of 2) | 2014.03.26 | 1389 |
31 | hadoop설치시 오류 | 2013.12.18 | 2731 |
30 | centsOS vsftpd설치하기 | 2013.12.17 | 1935 |
29 | ubuntu에 hadoop 2.0.5설치하기 | 2013.12.16 | 2011 |
28 | centos 5.X에 hadoop 2.0.5 alpha 설치 | 2013.12.16 | 1739 |
27 | hbase에 필요한 jar들 | 2013.04.01 | 2247 |
26 | Hive java connection 설정 | 2013.04.01 | 2310 |
25 | Hbase Shell 명령 정리 | 2013.04.01 | 3466 |
24 | HBASE Client API : 기본 기능 정리 | 2013.04.01 | 3781 |
23 | 하둡 분산 파일 시스템을 기반으로 색인하고 검색하기 | 2013.03.15 | 5763 |