Hello,
I am currently developing a C# MapReduce job. I would like to get the currently processed file name, however I cannot find out how to do it…
Here are the options I tried:
– Using the HDInsight SDK, tried to use the MapperContext.InputFileName property: it is always empty.
– Without the SDK, tried to use the method described here: https://social.msdn.microsoft.com/Forums/azure/en-us/0e4adee7-2a94-41a7-b5b1-08fb780e5d8f/get-file-name-being-read-in-c-mapper?forum=hdinsight (using the environment variables).
With this second option (without the SDK), I cannot even find the environment variables mentionned in the discussion. So perhaps since 2012 this has been changed.
For both solutions, I tried with different configurations: a folder name (to include all the files inside), a folder name and a placeholder (to include only files with a certain extension), and finally I tried to specifically mention a file name. But no environment variable is available to get this info.
I thought about pre-processing my files (all TXT files) and include the file name in the first row, but I am not really satisfied with this option and am looking if anything else can be done…
So do you have any idea how it is possible to proceed with this?
Thanks for any help.
Matt