Hello,
I am using AvroMultipleOutputs class to dynamically sort the data to output files in reducer but when the number of output files is higher than it fails with following eorror:
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data/pif-dob-categorize/2014/08/26/14/_temporary/1/_temporary/attempt_1408977992653_0120_r_000000_0/HISTORY/20131204/64619-r-00000.avro could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2503) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:…
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1231) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) Container killed by the ApplicationMaster. Container killed on request. Exit code is 143
I checked HDFS and datanodes all seems ok, enough space no network partition etc. Details can be found on SO
It seems somehow related to the number of files created/opened as for smaller amount of data job is ok! What settings can limit the number of output files?
Thx