Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Reply To: Pyspark support for external machine learning libraries

$
0
0

From SparkContext.addFile doc

addFile(self, path)
source code
Add a file to be downloaded with this Spark job on every node. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.

To access the file in Spark jobs, use SparkFiles.get(path) to find its download location.

>>> from pyspark import SparkFiles
>>> path = os.path.join(tempdir, “test.txt”)
>>> with open(path, “w”) as testFile:
… testFile.write(“100″)
>>> sc.addFile(path)
>>> def func(iterator):
… with open(SparkFiles.get(“test.txt”)) as testFile:
… fileVal = int(testFile.readline())
… return [x * 100 for x in iterator]
>>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect()
[100, 200, 300, 400]


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>