Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Pig script using streaming_python works in local mode but not map reduce mode

$
0
0

Hi, I am able to run my pig script in local mode, but fails when running in mapreduce mode.

My Pig script is:

Register ‘pyudf.py’ using streaming_python as myfuncs;
a = LOAD ‘nyse_daily_price’ USING org.apache.hcatalog.pig.HCatLoader();
stocks = GROUP a BY stock_symbol;
res = FOREACH stocks generate group, flatten(myfuncs.summary_stats(a.stock_price_adj_close));
STORE res INTO ‘stock_stats_py_output’;

And the Python script I have is just pyudf.py:

from pig_util import outputSchema
import numpy as np

@outputSchema(“(Q25:double, MEDIAN:double, Q75:double, Q99:double)”)
def summary_stats(input):
input = [float(item[0]) for item in input]
feature_25 = np.percentile(input, 25)
feature_50 = np.median(input)
feature_75 = np.percentile(input, 75)
feature_99 = np.percentile(input, 99)
return feature_25, feature_50, feature_75, feature_99


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>