Hi there,
we have installed the latest version of the HDP (2.2) on Linux via Ambari automated setup. I have uploaded a couple of TBs of data to test the performance of the cluster. After the check I have seen that the chunk size of the table files (ORC, compression=zip) is 16 MB. HDFS parameter dfs.blocksize is set to 134217728 (default value), which I assumed was in bytes (so, equal to 128 MB per chunk). There are more then enough files to form chunks of 128 MB size. Could you please tell me where do I miss the point? Is the chunk size in bits, not bytes? Or is it not applicable to ORC files with compression?
Thanks a lot in advance.
Cheers
Andrey