Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Optimize Pig Replace

$
0
0

In my project, I need to clean the data in HDFS to make it final and usable for discoivery. Some of these tasks are
1) Cleaning up multiple white spaces etc.

The Pig Script used is

RAW_DATA = LOAD ‘$inputPath’ USING PigStorage(‘\n’) AS (record:chararray);
CLEANED_SPECIAL = FOREACH RAW_DATA GENERATE REPLACE(record, ‘$regex’, ‘ ‘) AS record;
CLEANED_SPECIAL = FOREACH CLEANED_SPECIAL GENERATE TRIM(REPLACE(record, ‘\\|null’, ‘\\|’)) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPECIAL GENERATE REPLACE(record, ‘\\s+’, ‘ ‘) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPACE GENERATE REPLACE(record, ‘\\s+\\|’, ‘\\|’) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPACE GENERATE TRIM(REPLACE(record, ‘\\|\\s+’, ‘\\|’)) AS record;

My question is, is there a way to merge these replace actions into a single one.


Viewing all articles
Browse latest Browse all 3435

Trending Articles


Practice Sheet of Right form of verbs for HSC Students


Sarah Samis, Emil Bove III


ZARIA CUMMINGS


Need radio code for IVECO Delphi Aptiv FJ5 RBT M16


Black Angus Grilled Artichokes


Ed Sheeran – Sapphire – Pre-Single [iTunes Plus M4A]


Sunny Garcia’s Ex-Wife Colleen McCullough


99 God Status for Whatsapp, Facebook


Funeral of Sir Warwick Franklin


MHDD



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>