I’ve been working with the Sentiment tutorial using the sandbox and I was able to get it working and look at the data in Excel. Now I’m trying to adapt the code to use twitter data that I captured with Java. I think I have the data in the same format as the tutorial (raw JSON). I uploaded the data manually to a new folder with the file browser as .gz files. I edited the hiveddl.sql script to change the table/view names and the location of the data for the tweets_raw table. When I run the command “hive -f hiveddl_sample5.sql” (the script I edited) I am getting a “Java heap space” error – see below. I found some forum posts and tried editing some of the settings but I’m still getting this error. How can I solve this so the hiveddl_sample5.sql script will complete? Also what do I need to restart to make any setting changes take effect. I’ve been restarting the virtual in virtualbox each time and it takes a while.
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1418895176421_0001, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1418895176421_0001/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1418895176421_0001
Hadoop job information for Stage-2: number of mappers: 3; number of reducers: 1
2014-12-18 01:35:40,980 Stage-2 map = 0%, reduce = 0%
2014-12-18 01:36:31,795 Stage-2 map = 100%, reduce = 100%
Ended Job = job_1418895176421_0001 with errors
Error during job, obtaining debugging information…
Examining task ID: task_1418895176421_0001_m_000001 (and more) from job job_1418895176421_0001
Task with the most failures(4):
—–
Task ID:
task_1418895176421_0001_m_000001
URL:
—–
Diagnostic Messages for this Task:
Error: Java heap space
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 3 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec