Quantcast
Viewing all articles
Browse latest Browse all 3435

Hive Data Storage

We are inserting data into a Hive table and I’ve discovered that each insert is generating a new file the directory. As some of our tables may not have a lot of activity, they don’t always insert enough data to come anywhere near the HDFS block size. All these small files will take up space on the name node as well as slow down query performance. Is there a way for hive to consolidate these files?

I know that we can do this with the FCFile format by using the compress command (alter table tablename concatenate) but I would prefer to use ORC as we would like to eventually use ACID transactions.

The Sqoop merge command is also a possibility.

Am I missing something, or are these my only real options for consolidating these files?

Thanks!


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>