Hive supports list partitioning, but what is the best way to partition time series data?
To be descriptive, every day i get 2 GB worth of data into a table, 90 percent of the time i am interested in today’s data. Its worth partitioning the data on daily basis but when number of partitions increases there is no way to do partition collapsing unless i do some synthetic logic to read partition data. please note i don’t drop any partition.
It may be a common problem for all the guys having time series data, just want some ideas how best we can tackle these problems.
thanks,
Lokesh