Hello,
I have build my personnal, cluster hadoop 2.6.0 and spark 1.4.1 in shared mode, they work fine, i use ipython and pyspark, and the rest of my ecosystem
on window, under eclipse i am able to launch scala/java program spark/hadoop localy or remotly, i get working python 3.4/ipython notebook for play with pyspark,
but because lot of compatibility issue of library python 3.4 and 2.7 on my pc window, i must re-install anaconda with python 2.7 / ipython 3.2.0, like scala 2.10 for spark and the rest of world like hadoop ecosystem ex:hive connector not compatible when it is required by spark..
when i start ipython notebook –profile=pyspark, i get kernel error under jupyter, it seem some difference at process child creation with kernel.json for ipython 3.2.0 pyspark
if someboy have some idea, i need new one
Error log kernel ipython note book
Code: [Select]
raceback (most recent call last):
File “C:\Anaconda\lib\site-packages\IPython\html\base\handlers.py”, line 394, in wrapper
result = yield gen.maybe_future(method(self, *args, **kwargs))
File “C:\Anaconda\lib\site-packages\IPython\html\services\sessions\handlers.py”, line 53, in post
model = sm.create_session(path=path, kernel_name=kernel_name)
File “C:\Anaconda\lib\site-packages\IPython\html\services\sessions\sessionmanager.py”, line 66, in create_session
kernel_name=kernel_name)
File “C:\Anaconda\lib\site-packages\IPython\html\services\kernels\kernelmanager.py”, line 84, in start_kernel
kernel_name=kernel_name, **kwargs)
File “C:\Anaconda\lib\site-packages\IPython\kernel\multikernelmanager.py”, line 112, in start_kernel
km.start_kernel(**kwargs)
File “C:\Anaconda\lib\site-packages\IPython\kernel\manager.py”, line 240, in start_kernel
**kw)
File “C:\Anaconda\lib\site-packages\IPython\kernel\manager.py”, line 189, in _launch_kernel
return launch_kernel(kernel_cmd, **kw)
File “C:\Anaconda\lib\site-packages\IPython\kernel\launcher.py”, line 202, in launch_kernel
proc = Popen(cmd, **kwargs)
File “C:\Anaconda\lib\subprocess.py”, line 710, in __init__
errread, errwrite)
File “C:\Anaconda\lib\subprocess.py”, line 958, in _execute_child
startupinfo)
TypeError: environment can only contain strings
kernel.json (working fine under python 3.4 windows)
Quote
{
“display_name”: “pySpark (Spark 1.4.1)”,
“language”: “python”,
“argv”: [
“c:/Anaconda/python”,
“-m”,
“IPython.kernel”,
“–profile=pyspark”,
“-f”,
“{connection_file}”
],
“env”: {
“SPARK_HOME”: “C:/bigdatadev/spark-1.4.1-bin-hadoop2.6/”,
“PYSPARK_SUBMIT_ARGS”: “–master local[2] pyspark-shell”,
“PYTHONPATH”: “C:/bigdatadev/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip”
}
}
00-pyspark-startup.py
Code: [Select]
import os
import sys
spark_home = os.environ.get(‘SPARK_HOME’, None)
if not spark_home:
raise ValueError(‘SPARK_HOME environment variable is not set’)
pyspark_submit_args = os.environ.get(“PYSPARK_SUBMIT_ARGS”, “”)
if not “pyspark-shell” in pyspark_submit_args: pyspark_submit_args += ” pyspark-shell”
os.environ[“PYSPARK_SUBMIT_ARGS”] = pyspark_submit_args
sys.path.insert(0, os.path.join(spark_home, ‘python’))
sys.path.insert(0, os.path.join(spark_home, ‘python/lib/py4j-0.8.2.1-src.zip’))
exec(compile(open(os.path.join(spark_home, ‘python/pyspark/shell.py’)).read(), os.path.join(spark_home, ‘python/pyspark/shell.py’), ‘exec’))
Thanks a lot for your remak, i continu search
KR