Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Reply To: hive time series data partitioning

$
0
0

Thejas,
If we create the partition daily, there will be 360 partition per year. For 10 years it would be 10*360 = 3600 partitions.

when we are quering on 10 years worth of data, hive would take more time on finding the partition itself also it is going to be heavy load on name node to keep the address of all the partitions.

Also , all the partition may not have equal amount of data. say suppose for saturday and sunday i would have few kbs of data its not worth creating partition for a kb of data.

if i have range partitioning, i would collapse the partition once in a while to combine multiple partition into one.


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>