메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


LA가 전송한 로그 파일을 구별하여 각각의 파일로 저장하는 경우의 설정방법

 

 가. 수집되는 파일 :  /home/hadoop/log_data/log1.log, /home/hadoop/log_data/log1.log

 나. 저장되는 파일 : /home/hadoop/save_data/fs01, /home/hadoop/save_data/fs01

 다. 파일의 구분키 : state

 라. 구분값 : SMS, VOICE

 

1. 로그를 수집하는 LC의 flume설정파일(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi.properties)

lc01.sources = avroGenSrc_src01
lc01.channels = fileChannel_fc01 fileChannel_fc02
lc01.sources.avroGenSrc_src01.selector.type=multiplexing
lc01.sources.avroGenSrc_src01.selector.header = state
lc01.sources.avroGenSrc_src01.selector.mapping.SMS=fileChannel_fc01
lc01.sources.avroGenSrc_src01.selector.mapping.VOICE=fileChannel_fc02

lc01.sinks = fileSink_fs01 fileSink_fs02

# For each one of the sources, the type is defined
lc01.sources.avroGenSrc_src01.type = avro
lc01.sources.avroGenSrc_src01.bind = localhost
lc01.sources.avroGenSrc_src01.port = 5555

# The channel can be defined as follows.
lc01.sources.avroGenSrc_src01.channels = fileChannel_fc01 fileChannel_fc02

# Each sink's type must be defined
lc01.sinks.fileSink_fs01.type = file_roll
lc01.sinks.fileSink_fs01.sink.directory=/home/hadoop/save_data/fs01
lc01.sinks.fileSink_fs01.sink.rollInterval = 10
lc01.sinks.fileSink_fs01.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs01.channel = fileChannel_fc01

# Each channel's type is defined.
lc01.channels.fileChannel_fc01.type = file
lc01.channels.fileChannel_fc01.maxFileSize = 214643507
lc01.channels.fileChannel_fc01.checkpointDir = /home/hadoop/flume/fc01/checkpoint
lc01.channels.fileChannel_fc01.dataDirs = /home/hadoop/flume/fc01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc01.capacity = 100
lc01.channels.fileChannel_fc01.transactionCapacity = 10


# another setting
#lc02.sources = avroGenSrc_src02
#lc02.channels = fileChannel_fc02
#lc02.sinks = fileSink_fs02

# For each one of the sources, the type is defined
#lc02.sources.avroGenSrc_src02.type = avro
#lc02.sources.avroGenSrc_src02.bind = localhost
#lc02.sources.avroGenSrc_src02.port = 4444

# The channel can be defined as follows.
#lc01.sources.avroGenSrc_src01.channels = fileChannel_fc02
# Each sink's type must be defined
lc01.sinks.fileSink_fs02.type = file_roll
lc01.sinks.fileSink_fs02.sink.directory=/home/hadoop/save_data/fs02
lc01.sinks.fileSink_fs02.sink.rollInterval = 10
lc01.sinks.fileSink_fs02.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs02.channel = fileChannel_fc02

# Each channel's type is defined.
lc01.channels.fileChannel_fc02.type = file
lc01.channels.fileChannel_fc02.maxFileSize = 214643507
lc01.channels.fileChannel_fc02.checkpointDir = /home/hadoop/flume/fc02/checkpoint
lc01.channels.fileChannel_fc02.dataDirs = /home/hadoop/flume/fc02/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc02.capacity = 100
lc01.channels.fileChannel_fc02.transactionCapacity =10

--------------------------------

2. 로그를 전송하는 LA의 설정정보(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi-agent.properties)

: la01과 la02의 두개 설정정보가 같이 들어있고 각각을 기동하여 2개의 LA가 파일을 읽어 들어는것으로 가정함


la01.sources = execGenSrc_la01
la01.channels = fileChannel_la01
la01.sinks = avroSink_la01

# For each one of the sources, the type is defined
la01.sources.execGenSrc_la01.type = exec
la01.sources.execGenSrc_la01.command = tail -f /home/hadoop/log_data/log1.log
la01.sources.execGenSrc_la01.batchSize = 10

la01.sources.execGenSrc_la01.interceptors = i1
la01.sources.execGenSrc_la01.interceptors.i1.type=static
la01.sources.execGenSrc_la01.interceptors.i1.key=state
la01.sources.execGenSrc_la01.interceptors.i1.value = SMS

# The channel can be defined as follows.
la01.sources.execGenSrc_la01.channels = fileChannel_la01

# Each sink's type must be defined
la01.sinks.avroSink_la01.type = avro
la01.sinks.avroSink_la01.hostname=localhost
la01.sinks.avroSink_la01.port=5555
la01.sinks.avroSink_la01.batch-size = 10

#Specify the channel the sink should use
la01.sinks.avroSink_la01.channel = fileChannel_la01

# Each channel's type is defined.
la01.channels.fileChannel_la01.type = file
la01.channels.fileChannel_la01.maxFileSize = 214643507
la01.channels.fileChannel_la01.checkpointDir = /home/hadoop/flume/la01/checkpoint
la01.channels.fileChannel_la01.dataDirs = /home/hadoop/flume/la01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la01.channels.fileChannel_la01.capacity = 10000
la01.channels.fileChannel_la01.transctionCapacity = 10000


#another logagent conf....
la02.sources = execGenSrc_la02
la02.channels = fileChannel_la02
la02.sinks = avroSink_la02

# For each one of the sources, the type is defined
la02.sources.execGenSrc_la02.type = exec
la02.sources.execGenSrc_la02.command = tail -f /home/hadoop/log_data/log2.log
la02.sources.execGenSrc_la02.batchSize = 10
la02.sources.execGenSrc_la02.interceptors = i2
la02.sources.execGenSrc_la02.interceptors.i2.type=static
la02.sources.execGenSrc_la02.interceptors.i2.key= state
la02.sources.execGenSrc_la02.interceptors.i2.value = VOICE


# The channel can be defined as follows.
la02.sources.execGenSrc_la02.channels = fileChannel_la02

# Each sink's type must be defined
la02.sinks.avroSink_la02.type = avro
la02.sinks.avroSink_la02.hostname=localhost
la02.sinks.avroSink_la02.port=5555
la02.sinks.avroSink_la02.batch-size = 10

#Specify the channel the sink should use
la02.sinks.avroSink_la02.channel = fileChannel_la02

# Each channel's type is defined.
la02.channels.fileChannel_la02.type = file
la02.channels.fileChannel_la02.maxFileSize = 214643507
la02.channels.fileChannel_la02.checkpointDir = /home/hadoop/flume/la02/checkpoint
la02.channels.fileChannel_la02.dataDirs = /home/hadoop/flume/la02/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la02.channels.fileChannel_la02.capacity = 10000
la02.channels.fileChannel_la02.transctionCapacity = 10000

번호 제목 날짜 조회 수
440 [Ranger]RangerAdminRESTClient Error gertting pplicies; Received NULL response!!, secureMode=true, user=rangerkms/node01.gooper.com@ GOOPER.COM (auth:KERBEROS), serviceName=cm_kms 2023.06.27 73
439 [Encryption Zone]Encryption Zone에 생성된 table을 select할때 HDFS /tmp/zone1에 대한 권한이 없는 경우 2023.06.29 83
438 [EncryptionZone]User:testuser not allowed to do "DECRYPT_EEK" on 'testkey' 2023.06.29 89
437 [Impala] alter table구문수행시 "WARNINGS: Impala does not have READ_WRITE access to path 'hdfs://nameservice1/DATA/Temp/DB/source/table01_ccd'" 발생시 조치 2024.04.26 98
436 CM의 Impala->Query tab에서 FINISHED query가 보이지 않는 현상 2021.08.31 99
435 [Hue metadata]Oracle에 있는 Hue 메타정보 테이블을 이용하여 coordinator와 workflow관계 목록을 추출하는 방법 2023.08.22 99
434 [Cloudera Agent] Metadata-Plugin throttling_logger INFO (713 skipped) Unable to send data to nav server. Will try again. 2022.05.16 103
433 oozie의 sqoop action수행시 ooize:launcher의 applicationId를 이용하여 oozie:action의 applicationId및 관련 로그를 찾는 방법 2023.07.26 104
432 [CDP7.1.6,HDFS]HDFS파일을 삭제하고 Trash비움이 완료된후에도 HDFS 공간을 차지하고 있는 경우 확인/조치 방법 2023.07.17 107
431 [CDP7.1.7, Replication]Encryption Zone내 HDFS파일을 비Encryption Zone으로 HDFS Replication시 User hdfs가 아닌 hadoop으로 수행하는 방법 2024.01.15 110
430 [CDP7.1.7, Hive Replication]Hive Replication진행중 "The following columns have types incompatible with the existing columns in their respective positions " 오류 2023.12.27 116
429 [CDP7.1.7]Oozie job에서 ERROR: Kudu error(s) reported, first error: Timed out: Failed to write batch of 774 ops to tablet 8003f9a064bf4be5890a178439b2ba91가 발생하면서 쿼리가 실패하는 경우 2024.01.05 118
428 Cloudera Manager 5.x설치시 embedded postgresql를 사용하는 경우의 관리정보 2018.04.13 119
427 [Hadoop Encryption] Encryption Zone에 생성된 table에 Hue에서 insert 수행시 User:hdfs not allowed to do 'DECRYPT_EEK' ON 'testkey' 오류 2023.11.01 121
426 hadoop에서 yarn jar ..를 이용하여 appliction을 실행하여 정상적으로 수행되었으나 yarn UI의 어플리케이션 목록에 나타나지 않는 문제 2017.05.02 122
425 [kerberos]Kerberos HA구성 참고 페이지 2022.08.31 124
424 [oozie]Oozie WF수행시 단계별 ID넘버링 비교/설명 2022.03.23 127
423 [Kerberos]병렬 kinit 호출시 cache파일이 손상되어 Bad format in credentials cache 혹은 No credentials cache found 혹은 Internal credentials cache error 오류 발생시 2023.01.20 127
422 [CDP7.1.7] oozie sqoop action으로 import혹은 export수행시 발생한 오류에 대한 자세한 로그 확인 하는 방법 2024.04.19 131
421 [CDP7.1.3]Ranger WebUI에서 Error! Connection refused: Please check the KMS provider URL and whether the Ranager KMS is running발생시 조치 방법 2023.06.07 132
위로