Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

is there a way to decide number of blocks per batch in reading hdfs file.

$
0
0

As per the HDFS Read process,

First client opens FileSystem Object and gets the Data Node address of first two blocks of the file from Name Node and then opens InputStream.

Then, opens connection with datanode and reads first block and then closes the connnection with Data Node.
After reading the blocks, it decides upon next batch and gets the details and does the streaming repeatedly till the file reading completes.

As per the details from the blog and definitive guide, every batch will have two blocks.
Is there any property to decide this number.? can we increase/decrease the number of blocks per batch.?

Also, i have a question.
Of course, increasing number of blocks will unnecessarily waste memory for holding the address details.
But, opening connection with data node for every block again and again or communicating with Name Node again and again does consume network bandwidth.? isnt it.?


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>