Hi all,
I have 8 csv input files each being 150 MB.
I have loaded those into a hive table of format Textfile.
I am now running various queries on them but I only get to a maximum of 8 mappers. I can do whatever I want, no chance to get above that.
I am running Hive on Tez on HDP 2.3 on RHEL 6.6.
I have besides others tried the following:
Set tez.grouping.min-size=1048576;
set tez.grouping.max-size=2097152;
select count(*) from data35text where my_field_01 is null;
Still getting 8 Mappers.
Also changing parameters like this:
set mapreduce.input.fileinputformat.split.maxsize=1048576;
select count(*) from data35text where my_field_01 is null;
did not help.
Also not setting mappers to 20 with the following Parm:
set mapreduce.job.maps=20;
select count(*) from data35text where my_field_01 is null;
whatever I do I always get maximum 8 mappers which is exactly the number of csv files behind the hive table (within the hive warehouse dir).
anyone has any ideas ? I know of course that these files are quite small (I have standard 128 MB DFS block size) but I thought it is possible to enforce more mappers if you have files which are small…