Hello Srinivasarao,
When an HDFS client reads a file, it makes an RPC call to the NameNode called getBlockLocations. Here is the method signature:
@Idempotent public LocatedBlocks getBlockLocations(String src, long offset, long length) throws AccessControlException, FileNotFoundException, UnresolvedLinkException, IOException;
The NameNode returns information on every block in the source file, for the byte range starting from offset through (offset + length).
When the HDFS client makes this RPC call, the length parameter is derived from configuration property dfs.client.read.prefetch.size. In 2.x versions of Hadoop/HDP, If this property is unspecified in hdfs-site.xml, then the default value is (10 * dfs.blocksize). In other words, the default behavior is to prefetch 10 blocks worth of location information, assuming the file uses the default block size as defined in dfs.blocksize.
If you want, you can try tuning dfs.client.read.prefetch.size. The value is expressed in number of bytes, and ideally it is some multiple of block size. However, the default value is something that we’ve seen works well in practice. Personally, I have never had reason to tune this.
I hope this helps.
–Chris