Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Reply To: Hadoop for document archiving

$
0
0

Hello,

 

I am not sure if Hadoop is the right choice for you. There are probably more specialised document stores which will satisfy you better. I would be interested to know why you are looking at Hadoop. Is it just the fact that your documents total over 10 Tb? Do you intend to do some Natural Language Processing on the documents (in which case look at GATE and possibly Behemoth)

Hive is an SQL database for Hadoop. I can’t quite see how it helps here. The only way I can think of is if you have lots of flat metadata  which you use to find documents.

The thing of most help here is SolR. This can act as a text search engine so that you can find documents easily. However there are other document stores out there. SolR is not really part of hadoop – but like Elastic Search is often used as part of a front end to visualise Hadoop generated data. SolR is bundled with Hortonworks HDP so perhaps that is a moot point.

You ask

<br style=”box-sizing: border-box; font-family: ‘Helvetica Neue’, Helvetica, Arial, ‘Open Sans’, ‘Lucida Grande’, sans-serif; font-size: 14.4px; line-height: 21.6px;” /><span style=”font-family: ‘Helvetica Neue’, Helvetica, Arial, ‘Open Sans’, ‘Lucida Grande’, sans-serif; font-size: 14.4px; line-height: 21.6px;”>How can we use HDFS for this.</span><br style=”box-sizing: border-box; font-family: ‘Helvetica Neue’, Helvetica, Arial, ‘Open Sans’, ‘Lucida Grande’, sans-serif; font-size: 14.4px; line-height: 21.6px;” /><span style=”font-family: ‘Helvetica Neue’, Helvetica, Arial, ‘Open Sans’, ‘Lucida Grande’, sans-serif; font-size: 14.4px; line-height: 21.6px;”>How can I copy file from windows to hdfs</span>

 

One possibility would be the NFS Gateway and treat HDFS as a remote network drive.There are lots of other options which depend on your requirements and current technical abilities.

 

I would however start off by resorting to Google before asking this sort of question. You would find this answer pretty quickly

 

http://stackoverflow.com/questions/9389339/hadoop-as-document-store-database

Goodluck


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>