For each user, we need to compute the difference beween the counters of two consecutive events:
Input:
user_id, counter, time_stamp
USER_1, 30000, 2015_03_01
USER_2, 20000, 2015_03_01
USER_1, 40000, 2015_03_02
USER_2, 30000, 2015_03_02
USER_1, 50000, 2015_03_03
USER_2, 40000, 2015_03_03
Output:
user_id, Delta in counter, time_stamp
USER_1, NULL, 2015_03_01
USER_1, 10000, 2015_03_02
USER_1, 10000, 2015_03_03
USER_2, NULL, 2015_03_01
USER_2, 10000, 2015_03_02
USER_2, 10000, 2015_03_03
Any hints on how to do this (or something close) in Pig?
Is Pig the right tool for this? or Hive?