Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Optimize Pig Replace

$
0
0

In my project, I need to clean the data in HDFS to make it final and usable for discoivery. Some of these tasks are
1) Cleaning up multiple white spaces etc.

The Pig Script used is

RAW_DATA = LOAD ‘$inputPath’ USING PigStorage(‘\n’) AS (record:chararray);
CLEANED_SPECIAL = FOREACH RAW_DATA GENERATE REPLACE(record, ‘$regex’, ‘ ‘) AS record;
CLEANED_SPECIAL = FOREACH CLEANED_SPECIAL GENERATE TRIM(REPLACE(record, ‘\\|null’, ‘\\|’)) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPECIAL GENERATE REPLACE(record, ‘\\s+’, ‘ ‘) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPACE GENERATE REPLACE(record, ‘\\s+\\|’, ‘\\|’) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPACE GENERATE TRIM(REPLACE(record, ‘\\|\\s+’, ‘\\|’)) AS record;

My question is, is there a way to merge these replace actions into a single one.


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>