We upgraded our cluster to HDP 2.2.0.0 with Hive 0.14 and noticed the left join produces wrong result. Here is a simple query:
WITH t1 AS
(SELECT ‘12345’ AS id, ‘11111’ AS ones, ‘john’ AS name),
t2 AS
(SELECT ‘11111’ AS ones, ‘select’ AS type),
t3 AS
(SELECT ‘12345’ AS id, 0 AS duration)
SELECT * FROM t1 JOIN t2 ON t1.ones= t2.ones JOIN t3 ON t1.id= t3.id;
The output is correct:
t1.id t1.ones t1.name t2.ones t2.type t3.id t3.duration
12345 11111 john 11111 select 12345 0
But if I change “JOIN t3″ to “LEFT JOIN t3″, the output becomes:
t1.id t1.ones t1.name t2.ones t2.type t3.id t3.duration
12345 11111 john 11111 12345 12345 0
Note the t2.type column changed from “select” to “12345”, which is wrong.
Is this a known bug?