As per the HDFS Read process,
First client opens FileSystem Object and gets the Data Node address of first two blocks of the file from Name Node and then opens InputStream.
Then, opens connection with datanode and reads first block and then closes the connnection with Data Node.
After reading the blocks, it decides upon next batch and gets the details and does the streaming repeatedly till the file reading completes.
As per the details from the blog and definitive guide, every batch will have two blocks.
Is there any property to decide this number.? can we increase/decrease the number of blocks per batch.?
Also, i have a question.
Of course, increasing number of blocks will unnecessarily waste memory for holding the address details.
But, opening connection with data node for every block again and again or communicating with Name Node again and again does consume network bandwidth.? isnt it.?