Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Hive: Java heap size | How to select optimal value

$
0
0

Hi there,

I have a problem with a hive query. The second stage (out of two) fails on approx. 60-70% completion with the error message: Java heap size. The results of the “explain” statement are:

Explain
STAGE DEPENDENCIES:
” Stage-6 is a root stage , consists of Stage-1″
Stage-1
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
“”
STAGE PLANS:
Stage: Stage-6
Conditional Operator
“”
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: detailed_connections
filterExpr: (((query_year = 2013) and (query_month = 2)) and (query_day = 5)) (type: boolean)
Statistics: Num rows: 325317762 Data size: 199527782283 Basic stats: COMPLETE Column stats: NONE
Select Operator
” expressions: omitted”
” outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13″
Statistics: Num rows: 325317762 Data size: 199527782283 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: bigint)
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 325317762 Data size: 199527782283 Basic stats: COMPLETE Column stats: NONE
” value expressions: _col1 (type: int), _col2 (type: tinyint), _col3 (type: varchar(2)), _col4 (type: varchar(2)), _col5 (type: int), _col6 (type: int), _col7 (type: varchar(1)), _col8 (type: int), _col9 (type: varchar(1)), _col10 (type: varchar(3)), _col11 (type: varchar(3)), _col12 (type: timestamp), _col13 (type: timestamp)”
TableScan
alias: search_results
filterExpr: (((query_year = 2013) and (query_month = 2)) and (query_day = 5)) (type: boolean)
Statistics: Num rows: 45519226 Data size: 58144930020 Basic stats: COMPLETE Column stats: NONE
Select Operator
” expressions: omitted”
” outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13″
Statistics: Num rows: 45519226 Data size: 58144930020 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: bigint)
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 45519226 Data size: 58144930020 Basic stats: COMPLETE Column stats: NONE
” value expressions: _col1 (type: timestamp), _col2 (type: varchar(2)), _col3 (type: varchar(6)), _col4 (type: varchar(10)), _col5 (type: int), _col6 (type: varchar(10)), _col7 (type: int), _col8 (type: varchar(255)), _col9 (type: varchar(255)), _col10 (type: decimal(12,2)), _col11 (type: decimal(12,2)), _col12 (type: decimal(12,2)), _col13 (type: decimal(12,2))”
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
condition expressions:
0 {KEY.reducesinkkey0} {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col6} {VALUE._col7} {VALUE._col8} {VALUE._col9} {VALUE._col10} {VALUE._col11} {VALUE._col12}
1 {VALUE._col0} {VALUE._col1} {VALUE._col2} {VALUE._col3} {VALUE._col4} {VALUE._col5} {VALUE._col6} {VALUE._col7} {VALUE._col8} {VALUE._col9} {VALUE._col10} {VALUE._col11} {VALUE._col12}
” outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27″
Statistics: Num rows: 357849545 Data size: 219480565268 Basic stats: COMPLETE Column stats: NONE
Select Operator
” expressions: _col0 (type: bigint), 2013 (type: int), 2 (type: int), 5 (type: int), _col1 (type: timestamp), _col2 (type: varchar(2)), _col4 (type: varchar(10)), _col6 (type: varchar(10)), _col8 (type: varchar(255)), _col9 (type: varchar(255)), _col10 (type: decimal(12,2)), _col11 (type: decimal(12,2)), _col12 (type: decimal(12,2)), _col13 (type: decimal(12,2)), _col15 (type: int), _col16 (type: tinyint), _col18 (type: varchar(2)), _col19 (type: int), _col20 (type: int), _col21 (type: varchar(1)), _col22 (type: int), _col23 (type: varchar(1)), _col24 (type: varchar(3)), _col25 (type: varchar(3)), _col26 (type: timestamp), _col27 (type: timestamp), _col3 (type: varchar(6)), _col17 (type: varchar(2)), _col5 (type: int), _col7 (type: int)”
” outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29″
Statistics: Num rows: 357849545 Data size: 219480565268 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
“”
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
” key expressions: UDFToShort(_col1) (type: smallint), UDFToByte(_col2) (type: tinyint), UDFToByte(_col3) (type: tinyint), _col0 (type: bigint)”
sort order: ++++
” Map-reduce partition columns: UDFToShort(_col1) (type: smallint), UDFToByte(_col2) (type: tinyint), UDFToByte(_col3) (type: tinyint), _col0 (type: bigint)”
Statistics: Num rows: 357849545 Data size: 219480565268 Basic stats: COMPLETE Column stats: NONE
” value expressions: _col0 (type: bigint), _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: timestamp), _col5 (type: varchar(2)), _col6 (type: varchar(10)), _col7 (type: varchar(10)), _col8 (type: varchar(255)), _col9 (type: varchar(255)), _col10 (type: decimal(12,2)), _col11 (type: decimal(12,2)), _col12 (type: decimal(12,2)), _col13 (type: decimal(12,2)), _col14 (type: int), _col15 (type: tinyint), _col16 (type: varchar(2)), _col17 (type: int), _col18 (type: int), _col19 (type: varchar(1)), _col20 (type: int), _col21 (type: varchar(1)), _col22 (type: varchar(3)), _col23 (type: varchar(3)), _col24 (type: timestamp), _col25 (type: timestamp), _col26 (type: varchar(6)), _col27 (type: varchar(2)), _col28 (type: int), _col29 (type: int)”
Reduce Operator Tree:
Extract
Statistics: Num rows: 357849545 Data size: 219480565268 Basic stats: COMPLETE Column stats: NONE
Select Operator
” expressions: _col0 (type: bigint), UDFToShort(_col1) (type: smallint), UDFToByte(_col2) (type: tinyint), UDFToByte(_col3) (type: tinyint), _col4 (type: timestamp), _col5 (type: varchar(2)), _col6 (type: varchar(10)), _col7 (type: varchar(10)), _col8 (type: varchar(255)), _col9 (type: varchar(255)), _col10 (type: decimal(12,2)), _col11 (type: decimal(12,2)), _col12 (type: decimal(12,2)), _col13 (type: decimal(12,2)), UDFToShort(_col14) (type: smallint), _col15 (type: tinyint), _col16 (type: varchar(2)), _col17 (type: int), _col18 (type: int), _col19 (type: varchar(1)), UDFToByte(_col20) (type: tinyint), _col21 (type: varchar(1)), _col22 (type: varchar(3)), _col23 (type: varchar(3)), _col24 (type: timestamp), _col25 (type: timestamp), _col26 (type: varchar(6)), _col27 (type: varchar(2)), _col28 (type: int), _col29 (type: int)”
” outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29″
Statistics: Num rows: 357849545 Data size: 219480565268 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: true
Statistics: Num rows: 357849545 Data size: 219480565268 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: omitted
“”
Stage: Stage-0
Move Operator
tables:
partition:
arrival_month
departure_month
marketing_carrier
segment
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: omitted
“”
Stage: Stage-3
Stats-Aggr Operator
“”

Could somebody guide me through and tell me how should I calculate the heap size needed for the query (or optimize it so that it completes :))?

Thanks in advance

Cheers
Andrey

PS current value for hive heap size in the configs is 1024 MB (default). I have tried 3*1024, but it did not help.


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>