hello
is there any way to tune tez for better performance. I have a dataset of 30.5 GB and my joins are taking over 200 seconds
select d.per_id, count(*), sum(lo_ioh_ext_cost_dlrs) from dim_period d join fct_ioh_Day_str_pln f on
(d.per_id=f.per_id)
WHERE d.per_id in (13879, 13880,13881,13882)
group by d.per_id order by d.per_id
I noticed when i run this query that the node i am running from is not using much of the machine capacity.. even though max capacity is set to 75%
CPU and mem ustilization is low.
can someone help .. or is 200+ seconds normal for a 6 node cluster when joining on a 30 GB dataset
Please note that i am using ORC tables that are partitioned and bucketed
Thanks