메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


LA가 전송한 로그 파일을 구별하여 각각의 파일로 저장하는 경우의 설정방법

 

 가. 수집되는 파일 :  /home/hadoop/log_data/log1.log, /home/hadoop/log_data/log1.log

 나. 저장되는 파일 : /home/hadoop/save_data/fs01, /home/hadoop/save_data/fs01

 다. 파일의 구분키 : state

 라. 구분값 : SMS, VOICE

 

1. 로그를 수집하는 LC의 flume설정파일(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi.properties)

lc01.sources = avroGenSrc_src01
lc01.channels = fileChannel_fc01 fileChannel_fc02
lc01.sources.avroGenSrc_src01.selector.type=multiplexing
lc01.sources.avroGenSrc_src01.selector.header = state
lc01.sources.avroGenSrc_src01.selector.mapping.SMS=fileChannel_fc01
lc01.sources.avroGenSrc_src01.selector.mapping.VOICE=fileChannel_fc02

lc01.sinks = fileSink_fs01 fileSink_fs02

# For each one of the sources, the type is defined
lc01.sources.avroGenSrc_src01.type = avro
lc01.sources.avroGenSrc_src01.bind = localhost
lc01.sources.avroGenSrc_src01.port = 5555

# The channel can be defined as follows.
lc01.sources.avroGenSrc_src01.channels = fileChannel_fc01 fileChannel_fc02

# Each sink's type must be defined
lc01.sinks.fileSink_fs01.type = file_roll
lc01.sinks.fileSink_fs01.sink.directory=/home/hadoop/save_data/fs01
lc01.sinks.fileSink_fs01.sink.rollInterval = 10
lc01.sinks.fileSink_fs01.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs01.channel = fileChannel_fc01

# Each channel's type is defined.
lc01.channels.fileChannel_fc01.type = file
lc01.channels.fileChannel_fc01.maxFileSize = 214643507
lc01.channels.fileChannel_fc01.checkpointDir = /home/hadoop/flume/fc01/checkpoint
lc01.channels.fileChannel_fc01.dataDirs = /home/hadoop/flume/fc01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc01.capacity = 100
lc01.channels.fileChannel_fc01.transactionCapacity = 10


# another setting
#lc02.sources = avroGenSrc_src02
#lc02.channels = fileChannel_fc02
#lc02.sinks = fileSink_fs02

# For each one of the sources, the type is defined
#lc02.sources.avroGenSrc_src02.type = avro
#lc02.sources.avroGenSrc_src02.bind = localhost
#lc02.sources.avroGenSrc_src02.port = 4444

# The channel can be defined as follows.
#lc01.sources.avroGenSrc_src01.channels = fileChannel_fc02
# Each sink's type must be defined
lc01.sinks.fileSink_fs02.type = file_roll
lc01.sinks.fileSink_fs02.sink.directory=/home/hadoop/save_data/fs02
lc01.sinks.fileSink_fs02.sink.rollInterval = 10
lc01.sinks.fileSink_fs02.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs02.channel = fileChannel_fc02

# Each channel's type is defined.
lc01.channels.fileChannel_fc02.type = file
lc01.channels.fileChannel_fc02.maxFileSize = 214643507
lc01.channels.fileChannel_fc02.checkpointDir = /home/hadoop/flume/fc02/checkpoint
lc01.channels.fileChannel_fc02.dataDirs = /home/hadoop/flume/fc02/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc02.capacity = 100
lc01.channels.fileChannel_fc02.transactionCapacity =10

--------------------------------

2. 로그를 전송하는 LA의 설정정보(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi-agent.properties)

: la01과 la02의 두개 설정정보가 같이 들어있고 각각을 기동하여 2개의 LA가 파일을 읽어 들어는것으로 가정함


la01.sources = execGenSrc_la01
la01.channels = fileChannel_la01
la01.sinks = avroSink_la01

# For each one of the sources, the type is defined
la01.sources.execGenSrc_la01.type = exec
la01.sources.execGenSrc_la01.command = tail -f /home/hadoop/log_data/log1.log
la01.sources.execGenSrc_la01.batchSize = 10

la01.sources.execGenSrc_la01.interceptors = i1
la01.sources.execGenSrc_la01.interceptors.i1.type=static
la01.sources.execGenSrc_la01.interceptors.i1.key=state
la01.sources.execGenSrc_la01.interceptors.i1.value = SMS

# The channel can be defined as follows.
la01.sources.execGenSrc_la01.channels = fileChannel_la01

# Each sink's type must be defined
la01.sinks.avroSink_la01.type = avro
la01.sinks.avroSink_la01.hostname=localhost
la01.sinks.avroSink_la01.port=5555
la01.sinks.avroSink_la01.batch-size = 10

#Specify the channel the sink should use
la01.sinks.avroSink_la01.channel = fileChannel_la01

# Each channel's type is defined.
la01.channels.fileChannel_la01.type = file
la01.channels.fileChannel_la01.maxFileSize = 214643507
la01.channels.fileChannel_la01.checkpointDir = /home/hadoop/flume/la01/checkpoint
la01.channels.fileChannel_la01.dataDirs = /home/hadoop/flume/la01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la01.channels.fileChannel_la01.capacity = 10000
la01.channels.fileChannel_la01.transctionCapacity = 10000


#another logagent conf....
la02.sources = execGenSrc_la02
la02.channels = fileChannel_la02
la02.sinks = avroSink_la02

# For each one of the sources, the type is defined
la02.sources.execGenSrc_la02.type = exec
la02.sources.execGenSrc_la02.command = tail -f /home/hadoop/log_data/log2.log
la02.sources.execGenSrc_la02.batchSize = 10
la02.sources.execGenSrc_la02.interceptors = i2
la02.sources.execGenSrc_la02.interceptors.i2.type=static
la02.sources.execGenSrc_la02.interceptors.i2.key= state
la02.sources.execGenSrc_la02.interceptors.i2.value = VOICE


# The channel can be defined as follows.
la02.sources.execGenSrc_la02.channels = fileChannel_la02

# Each sink's type must be defined
la02.sinks.avroSink_la02.type = avro
la02.sinks.avroSink_la02.hostname=localhost
la02.sinks.avroSink_la02.port=5555
la02.sinks.avroSink_la02.batch-size = 10

#Specify the channel the sink should use
la02.sinks.avroSink_la02.channel = fileChannel_la02

# Each channel's type is defined.
la02.channels.fileChannel_la02.type = file
la02.channels.fileChannel_la02.maxFileSize = 214643507
la02.channels.fileChannel_la02.checkpointDir = /home/hadoop/flume/la02/checkpoint
la02.channels.fileChannel_la02.dataDirs = /home/hadoop/flume/la02/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la02.channels.fileChannel_la02.capacity = 10000
la02.channels.fileChannel_la02.transctionCapacity = 10000

번호 제목 날짜 조회 수
447 hue.axes_accessattempt테이블의 username컬럼에 NULL 혹은 space가 들어갈수도 있음. 2021.11.03 90865
446 bananapi 5대(ubuntu계열 리눅스)에 yarn(hadoop 2.6.0)설치하기-ResourceManager HA/HDFS HA포함, JobHistory포함 2015.04.24 22219
445 mapreduce appliction을 실행시 "is running beyond virtual memory limits" 오류 발생시 조치사항 2017.05.04 19792
444 org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. 2013.03.11 16996
443 Hive Query Examples from test code (2 of 2) 2014.03.26 13505
442 drop table로 삭제했으나 tablet server에는 여전히 존재하는 테이블 삭제방법 2021.07.09 10808
441 [Decommission]시 시간이 많이 걸리면서(수일) Decommission이 완료되지 않는 경우 조치 2018.01.03 10200
440 [DataNode]org.apache.hadoop.security.KerberosAuthException: failure to login: for principal: hdfs/datanode03@GOOPER.COM from keytab hdfs.keytab오류 2023.04.18 9820
439 insert hbase by hive ... error occured after 5 hours..HMaster가 뜨지 않는 장애에 대한 복구 방법 2014.04.29 9666
438 hive 2.0.1 설치및 mariadb로 metastore 설정 2016.06.03 9032
437 HBase shell로 작업하기 2013.03.15 8335
436 spark-sql실행시 The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH오류 발생시 조치사항 2016.06.09 7887
435 dr.who로 공격들어오는 경우 조치방법 file 2018.06.09 7849
434 Spark에서 Serializable관련 오류및 조치사항 2017.04.21 7743
433 jupyter, zeppelin, rstudio를 이용하여 spark cluster에 job를 실행시키기 위한 정보 2018.04.13 7699
432 하둡 분산 파일 시스템을 기반으로 색인하고 검색하기 2013.03.15 7635
431 oracle to hive data type정리표 2018.08.22 7375
430 spark-sql실행시 Caused by: java.lang.NumberFormatException: For input string: "0s" 오류발생시 조치사항 2016.06.09 7108
429 [Kerberos]Kerberos상태의 클러스터에 JDBC로 접근할때 케이스별 오류내용 2020.02.14 7052
428 sqoop작업시 hdfs의 개수보다 더많은 값이 중복되어 oracle에 입력되는 경우가 있음 2014.09.02 6932
위로