Lets say that i have a base table Base in hive and my incremental data is stored in a table called Stage.
Table Base has 4 columns: id, name, address, age
Table Stage also has the same 4 columns.
id is the key column.
Now i want to identify whether the rows in the table stage are inserts or updates or deletes and apply the changes to the Base table.
Note:
The above example is just for illustrating the use case.
The size of data in the base table grows over time significantly and need a solution that works for the scale.
Can you please suggest some options of doing this hive?
I understand that Hive 0.14 supports CRUD operations by means of transactions, but i think this is the second step in the solution as the first step is to identify the changes.