Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
python python test.py실행시 "ImportError: No module named pyspark" 혹은 "ImportError: No module named py4j.protocol"등의 오류 발생시 조치사항
python으로 python프로그램인 test.py를 실행시 "ImportError: No module named py4j.protocol" 오류가 발생하면 아래와 환경 변수를 설정해준다.(예, /etc/profile)
export SPARK_HOME=$HOME/spark
export PYTHONPATH=${SPARK_HOME}/python/:$(echo ${SPARK_HOME}/python/lib/py4j-*-src.zip):${PYTHONPATH}
----------------오류내용 #1--------------------------
-bash-4.1$ python test.py
Traceback (most recent call last):
File "test.py", line 1, in <module>
from pyspark import SparkContext
ImportError: No module named pyspark
----------------오류내용 #2--------------------------
-bash-4.1$ python test.py
Traceback (most recent call last):
File "test.py", line 1, in <module>
from pyspark import SparkContext
File "$HOME/spark/python/pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "$HOME/spark/python/pyspark/context.py", line 29, in <module>
from py4j.protocol import Py4JError
ImportError: No module named py4j.protocol
-----------------test.py프로그램-------------------
-bash-4.1$ cat test.py
from pyspark import SparkContext
sc = SparkContext("local[2]", "Test App")
data = sc.textFile("ml/UserPurchaseHistory.csv").map(lambda line: line.split(",")).map(lambda record: (record[0], record[1]))
purchase_number = data.count()
print(data.count())