Hello,
I have a list of values for a specific field (state) sorted by dates. I want do display only the lines which state has changed from the previous date:
Example input:
date state
2013-01-15 04:15:07.602 ON
2013-01-15 05:15:08.502 ON
2013-01-15 06:15:08.502 OFF
2013-01-15 07:15:08.502 ON
2013-01-15 08:15:08.502 ON
...
Output expected
date state
2013-01-15 04:15:07.602 ON
2013-01-15 06:15:08.502 OFF
2013-01-15 07:15:08.502 ON
My hiveql query is like this
select date, state from demo_bd where statechanged(state) sort by date
“statechanged” is my UDF java function that returns true only if the current state is different from the previous one. This function works fine in java.
My problem is that while it seems to work for the first hundreds values then it fails and sometimes (not everytime) I get the same state for 2 adjacent dates…
I really don’t see where the problem comes from. Is it related to the way and order hive process the data ?
Any help is really appreciated.
Thank you.