Hello,
I am new to hive and horton works and hadoop. I am trying to understand what the correct approach to handle multiple xml files to push and manage them in the HDFS?
I have seen various ways to do this. I would like to know what way is being adopted and why? Hive or Pig? I need to process thousands of xml files periodically
and push into HDFS. Do I need to write my own mapReduce functions?
Typical format:
<SITE Name=”clark” ID=”18”>
<POINT ID=”5001″ PointType=”Asset” Name=”ControlGroup”>
<ITEM ID=”100” Name=”groupmode” Type=”Integer”>
<ValueTag Value=”1″ Time=”141823865″/>
<ValueTag Value=”1″ Time=”141823870″/>
</ITEM>
</POINT>
</SITE>