Hi,
I am running on 10 node cluster hdp 2.2.
Using tez and yarn.
hive version is 0.14
I have a 90 milion row table stroed in a plain text csv 10GB text file.
When trying to insert into an orc partitioned table using the statement:
“insert overwrite table 2h2 partition (dt) select *,TIME_STAMP from 2h_tmp;”
dt is the dynamic partition key.
Tez alloactes only one reducer to the job which results in a 6 hour run.
I expect about 120 partions to be created .
How can I increase number of reducers to speed up this job?
Is this related to https://issues.apache.org/jira/browse/HIVE-7158 , it is marked as resolved for hive 0.14
I am running with default values
hive.tez.auto.reducer.parallelism
Default Value: false
Added In: Hive 0.14.0 with HIVE-7158
hive.tez.max.partition.factor
Default Value: 2
Added In: Hive 0.14.0 with HIVE-7158
hive.tez.min.partition.factor
Default Value: 0.25
Added In: Hive 0.14.0 with HIVE-7158
and hive.exec.dynamic.partition=true;
hive.exec.dynamic.partition.mode=nonstrict;