Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
1. Sqoop를 다운받아 압축을 푼다
*다운로드 받을 라이브러리가 사용하려는 하둡의 메이저 버전과 대응 되는지 확인 필요
wget http://mirror.apache-kr.org/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-1.0.0.tar.gz
* hadoop2의 경우는 sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz를 받아서 사용할것..
tar zxvf sqoop-1.4.4.bin__hadoop-1.0.0.tar.gz
ln -s sqoop-1.4.4.bin__hadoop-1.0.0. sqoop
2. Sqoop환경변수 설정
vi ~/.bashrc
export SQOOP_HOME=/home/hadoop/sqoop
export SQOOP_CONF_DIR=/home/hadoop/sqoop/conf
export PATH=$PATH:$SQOOP_HOME/bin
변경사항 반영
source .bashrc
2-1. sqoop lib디렉토리 설정
hadoop의 share디렉토리 밑에 있는 lib파일및 폴더를 모두 sqoop의 lib디렉토리 밑으로 복사한다.
3. Sqoop 환경설정
cd $SQOOP_HOME/conf
cp sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
------------------------------------------------------------
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop/lib
#set the path to where bin/hbase is available
export HBASE_HOME=/home/hadoop/hbase
#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper/conf
----------------------------------------------------------
sqoop기동 확인
hadoop@bigdata-host:~/sqoop/conf$ sqoop
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
/home/hadoop/sqoop/bin/sqoop: line 101: /home/hadoop/hadoop/bin/bin/hadoop: No such file or directory
이 발생하게 되면 sqoop설치 디렉토리에 있는 sqoop-1.4.4.jar를 hadoop/lib에 복사한다.
4. 테스트(mysql)
1) mysql jdbc 드라이버 설치
http://dev.mysql.com/downloads/connector/j/
tar zxvf mysql-connector-java-5.1.26.tar.gz
cp mysql-connector-java-5.1.26/mysql-connector-java-5.1.26-bin.jar $SQOOP_HOME/lib
* mysql에 bigdata계정 설정 및 sqoop테스트용 database생성 및 할당
mysql -u root (암호 걸려있을 경우 -p)
(암호 있을경우 입력)
CREATE USER 'bigdata'@'localhost';
create database sqoop_test;
GRANT ALL PRIVILEGES ON sqoop_test.* TO 'bigdata'@'localhost';
2) sqoop테스트용 테이블 생성 및 example데이터 입력
mysql -u bigdata
use sqoop_test;
create table employee(id INT, name VARCHAR(20)); --import 테스트
INSERT INTO employee(id,name) VALUES(1, "kim");
INSERT INTO employee(id,name) VALUES(2, "lee");
INSERT INTO employee(id,name) VALUES(3, "park");
create table employee_ex(id INT, name VARCHAR(20)); --export 테스트
3) hadoop 기동
start-all.sh
4) import 테스트(mysql에서 hdoop으로 저장)
sqoop import --connect jdbc:mysql://localhost/sqoop_test --table employee --username bigdata -m 1
=====>hadoop에서 확인<============
hadoop@bigdata-host:~/sqoop$ hadoop fs -lsr employee
-rw-r--r-- 1 hadoop supergroup 0 2014-04-21 17:34 /user/hadoop/employee/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-04-21 17:34 /user/hadoop/employee/_logs
drwxr-xr-x - hadoop supergroup 0 2014-04-21 17:34 /user/hadoop/employee/_logs/history
-rw-r--r-- 1 hadoop supergroup 9165 2014-04-21 17:34 /user/hadoop/employee/_logs/history/job_201404211042_0001_1398069253093_hadoop_employee.jar
-rw-r--r-- 1 hadoop supergroup 59274 2014-04-21 17:34 /user/hadoop/employee/_logs/history/job_201404211042_0001_conf.xml
-rw-r--r-- 1 hadoop supergroup 19 2014-04-21 17:34 /user/hadoop/employee/part-m-00000
hadoop@bigdata-host:~/sqoop$ hadoop fs -cat employee/part-m-00000
1,kim
2,lee
3,park
------------------------------------------
5) export 테스트(hadoop에서 mysql의 테이블에 저장)
sqoop export -connect jdbc:mysql://localhost/sqoop_test -table employee_ex -export-dir employee --username bigdata -m 1
5. postgresql
가. cp postgresql-9.3-1103.jdbc4.jar $SQOOP_HOME/lib
나.[hadoop@master]$ sqoop list-databases --connect jdbc:postgresql://node1:5432/hive_db --username hive_user --password hive_pass
Warning: /hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/05/21 13:42:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/05/21 13:42:36 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/05/21 13:42:36 INFO manager.SqlManager: Using default fetchSize of 1000
template1
template0
postgres
psmgdb
hive_db
oozie_db