Quantcast
Channel: Hortonworks » All Replies
Viewing all articles
Browse latest Browse all 3435

Trouble in Solr while indexing XML files into Hadoop

$
0
0

Hi to all from Istanbul, Turkey,

I’m trying to index XML files (ipod_other.xml from lucidworks’ example files, converted into sequence file format), using SolrXMLIngestMapper jars.
I’ve modified the schema.xml file by making the necesssary addions of the fields stated in the ipod_other.xml file.

Here’s my command:
hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.SolrXMLIngestMapper -c hdp1 -i /user/hadoop/output/1420812982906sfu/part-r-00000 -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -s http://dc2vmhadappt01:8983/solr

In the end I constatly get “Didn’t ingest any documents, failing” error.

OR

If I use DirectoryIngestMapper jar, that uses Tika I suppose,
I do not get any errors but I can not see any documents indexed, nothing happens.

hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c collection1 -i /user/solr/data/xml/*.xml -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -s http://dc2vmhadappt01:8983/solr

Anybody out there to help me out with this problem, any help is appreciated..

Thanks

Here are the addions to the schema.xml:

<field name=”id” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”name” multiValued=”true” stored=”true” type=”text_en” indexed=”true”/>
<field name=”sku” type=”text_en_splitting_tight” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”manu” type=”text_general” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”cat” type=”string” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”features” type=”text_general” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”includes” type=”text_general” indexed=”true” stored=”true” termVectors=”true” termPositions=”true” termOffsets=”true” />
<field name=”weight” type=”float” indexed=”true” stored=”true”/>
<field name=”price” type=”float” indexed=”true” stored=”true”/>
<field name=”popularity” type=”int” indexed=”true” stored=”true” />
<field name=”inStock” type=”boolean” indexed=”true” stored=”true” />
<field name=”store” type=”location” indexed=”true” stored=”true”/>
<dynamicField name=”*_dt” type=”date” indexed=”true” stored=”true”/>
<field name=”data_source” stored=”false” type=”text_en” indexed=”true”/>

And here is the ipod_other.xml file;

<add> <doc>
<field name=”id”>F8V7067-APL-KIT</field>
<field name=”name”>Belkin Mobile Power Cord for iPod w/ Dock</field>
<field name=”manu”>Belkin</field>
<field name=”cat”>electronics</field>
<field name=”cat”>connector</field>
<field name=”features”>car power adapter, white</field>
<field name=”weight”>4</field>
<field name=”price”>19.95</field>
<field name=”popularity”>1</field>
<field name=”inStock”>false</field>
<field name=”store”>45.17614,-93.87341</field>
<field name=”manufacturedate_dt”>2005-08-01T16:30:25Z</field>
</doc> </add>


Viewing all articles
Browse latest Browse all 3435


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>