Pig’s SequenceFileLoader is unable to read textual data from a sequence file generated by Sqoop. It returns (0,) instead of keys and contents. Any ideas of why it loads incorrectly? Am I missing any steps?
Details:
I create a simple two column table (seq_proto) in MySQL (int, char(10) ) and inserted a row, namely (768, mango).
I use sqoop to import data from seq_proto into HDFS as a sequence file. (I validate the successful import by exporting the sequence file into MySQL and inspecting its contents.)
sqoop import –connect jdbc:mysql://localhost/test –table seq_proto –username root –as-sequencefile -m 1 –target-dir /user/hdfs/seq_proto
I compile and package seq_proto.jar file, and upload seq_proto.jar and sqoop-1.4.4.2.1.1.0-385.jar on hdfs (in the folder /lib/pig/).
I execute the following Pig script.
REGISTER piggybank.jar
REGISTER /lib/pig/seq_proto.jar
REGISTER /lib/pig/sqoop-1.4.4.2.1.1.0-385.jar
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
a = LOAD ‘/user/hdfs/seq_proto/part-m-00000′ using SequenceFileLoader AS (id:int, value:chararray);
describe a;
dump a;
The script executes successfully and I get the following output:
a: {id: int,value: chararray}
(0,)
I use Hortonworks’ Sandbox 2.1 for the above. Any ideas of why it loads incorrectly? Am I missing any steps?
Thanks and I look forward to your reply.