Context: I am in the process of migrating my MR jobs on HBase from CDH 2.0.0-cdh4.5.0 (Hadoop1) to HDP 2.2.0.0-2041 (YARN).
After minor changes the code was compiled against HDP 2.2.0.0-2041.
Problem: I am trying to run a oozie workflow that executes a series of MR jobs after creating a scan on HBase. The scan is
created programatically and then serialised-deserialised before handing it to the mapper to fetch batches from HBase.
Issue: When TableInputFormat internally tries to deserialise the scan string, it throws an error indicating that under
the hood google protobuf was not able to deserialise the string. The stack trace looks
as follows.
Exception in thread “main” java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group
tag did not match expected tag.
at com.flipkart.yarn.test.TestScanSerialiseDeserialise.convertStringToScan(TestScanSerialiseDeserialise.java:37)
at com.flipkart.yarn.test.TestScanSerialiseDeserialise.main(TestScanSerialiseDeserialise.java:25)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
at com.google.protobuf.CodedInputStream.readGroup(CodedInputStream.java:241)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:488)
at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan.<init>(ClientProtos.java:13718)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan.<init>(ClientProtos.java:13676)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan$1.parsePartialFrom(ClientProtos.java:13868)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan$1.parsePartialFrom(ClientProtos.java:13863)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Scan.parseFrom(ClientProtos.java:14555)
at com.flipkart.yarn.test.TestScanSerialiseDeserialise.convertStringToScan(TestScanSerialiseDeserialise.java:35)
… 1 more
Reproducable: I am able to reproduce this in the sample code I have attached https://drive.google.com/file/d/0B5-H2DFQJJZeNWllejlVSjRMbDA/view?usp=sharing
Possible causes: I am suspecting that I missed supplying some dependency or there is some dependency mismatch in
underlying jars.
Appreciate any help in solving this?