Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

LOAD vs OVERWRITE INSERT PRODUCE DIFFERENT TABLE SIZE!

$
0
0

Hi to everyone,

I would like to know whether a problem I am facing with Apache Hive in Azure HDInsight occurs in Hortonworks as well. As follow is the explanation:

The size of the table doubles if I load the data with INSERT OVERWRITE vs LOAD, more specifically:

I created a table “item”. Loaded the data from item.dat (aprox 28MB). After that what happens is that the file item.dat will be moved to hive/warehouse and off course the size remains the same

Now if I create another table “item2″ same as item and then load the data from item to item2 with the following command:

INSERT OVERWRITE TABLE item2 SELECT * FROM item

the size of table item2 is double of item (aprox 55MB)

Why does this happen? And is there any way to avoid it?

And the situation escalates as the size of the data grows.

ps. this is only to illustrate the problem. In practice I am interested for pre-joining tables but INSERT OVERWRITE increases the size of the joined table drastically (Actual problem: 4GB joined with 28MB gives 18GB)

Thank you!


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>