Quantcast
Viewing all articles
Browse latest Browse all 3435

Tez on HDP 2.2

Hi,

First of all sorry if I’m not totally clear because I’m a newbie in this forum and with HDP. I’m setting a hadoop environment with HDP 2.2 and I sqooped a mysql database in hive/hcatalog and hbase. Most of the tables are in hive and just the biggest one 50.4GB and 40+ million rows in hbase. Hive and Hbase are integrated. I tested some queries using Tez engine on hive tables of 2.2GB, basic queries with a couple of joins. These queries took around 15 mins but the same ones in mysql were a lot quicker. I don’t know if I’m following the correct approach but it has also surprised me they were using the 100% of Yarn memory (4GB). In fact, I tried to run a query from Hue and other one from the console at the same time and the second one didn’t start until the first finished. The cluster which at the moment has got just two datanodes of 2 and 8 GB respectively is expected to be expanded as well as an important increase of data volume be received in a near future. I used vectorization and cost based query optimization in hive but the files aren’t ORC. Is there anything else I can do to improve the performance before I start testing against the big table and trying with more complicated queries?

Thanks in advance.


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>