In my project, I need to clean the data in HDFS to make it final and usable for discoivery. Some of these tasks are
1) Cleaning up multiple white spaces etc.
The Pig Script used is
RAW_DATA = LOAD ‘$inputPath’ USING PigStorage(‘\n’) AS (record:chararray);
CLEANED_SPECIAL = FOREACH RAW_DATA GENERATE REPLACE(record, ‘$regex’, ‘ ‘) AS record;
CLEANED_SPECIAL = FOREACH CLEANED_SPECIAL GENERATE TRIM(REPLACE(record, ‘\\|null’, ‘\\|’)) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPECIAL GENERATE REPLACE(record, ‘\\s+’, ‘ ‘) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPACE GENERATE REPLACE(record, ‘\\s+\\|’, ‘\\|’) AS record;
CLEANED_SPACE = FOREACH CLEANED_SPACE GENERATE TRIM(REPLACE(record, ‘\\|\\s+’, ‘\\|’)) AS record;
My question is, is there a way to merge these replace actions into a single one.