hive time series data partitioning

February 6, 2015, 11:57 pm

Hive supports list partitioning, but what is the best way to partition time series data?

To be descriptive, every day i get 2 GB worth of data into a table, 90 percent of the time i am interested in today’s data. Its worth partitioning the data on daily basis but when number of partitions increases there is no way to do partition collapsing unless i do some synthetic logic to read partition data. please note i don’t drop any partition.

It may be a common problem for all the guys having time series data, just want some ideas how best we can tackle these problems.

thanks,
Lokesh

↧