Hi folks,
Been struggling with this one for a while now. 16 node cluster using Flume 1.5 and it was running fine. We tried to change to compressed Sink using .bz2 and then went back to regular text. Ever since we made this change, we’re getting Sink errors below. Nothing in the data node logs only these spitting out in the Flume logs. The .tmp file doesn’t get renamed and just sits out on the channel for days. Example:
-rw-r–r– 3 hdfs hdfs 5488 2015-08-20 18:55 /db/live/mobile_info/year=2015/month=08/day=20/_FlumeData.1440110214141.tmp (from the 20th)
Does anyone have an idea what is wrong here? No known network issues that we’re aware of and the cluster isn’t hammered with queries all day so not sure what else to do here.
Thanks for any help. The only thing we’ve seen in researching this issue is when someone sets up a 1-2 node cluster and it’s trying to create 3 copies of the block but this is a 16 node cluster so there are plenty of nodes.
Eric
2015-08-24 09:17:37,632 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://nameservice1:8020/db/live/log_data/year=2015/month=08/day=06/_FlumeData.1438818087674.tmp failed. Will retry again in 180 seconds.
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[10.100.55.61:50010, 10.100.55.62:50010], original=[10.100.55.61:50010, 10.100.55.62:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via ‘dfs.client.block.write.replace-datanode-on-failure.policy’ in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:960)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1026)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1175)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:924)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:486)
2015-08-24 09:17:37,633 INFO org.apache.flume.sink.hdfs.BucketWriter: Close tries incremented