Quantcast
Viewing all articles
Browse latest Browse all 3435

Trouble in Solr while indexing XML files into Hadoop

Hi to all from Istanbul, Turkey,

I’m trying to index XML files (ipod_other.xml from lucidworks’ example files, converted into sequence file format), using SolrXMLIngestMapper jars.
I’ve modified the schema.xml file by making the necesssary addions of the fields stated in the ipod_other.xml file.

Here’s my command:
hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.SolrXMLIngestMapper -c hdp1 -i /user/hadoop/output/1420812982906sfu/part-r-00000 -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -s http://dc2vmhadappt01:8983/solr

In the end I constatly get “Didn’t ingest any documents, failing” error.

OR

If I use DirectoryIngestMapper jar, that uses Tika I suppose,
I do not get any errors but I can not see any documents indexed, nothing happens.

hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c collection1 -i /user/solr/data/xml/*.xml -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -s http://dc2vmhadappt01:8983/solr

Anybody out there to help me out with this problem, any help is appreciated..

Thanks

Here are the addions to the schema.xml:

<field name=”id” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”name” multiValued=”true” stored=”true” type=”text_en” indexed=”true”/>
<field name=”sku” type=”text_en_splitting_tight” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”manu” type=”text_general” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”cat” type=”string” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”features” type=”text_general” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”includes” type=”text_general” indexed=”true” stored=”true” termVectors=”true” termPositions=”true” termOffsets=”true” />
<field name=”weight” type=”float” indexed=”true” stored=”true”/>
<field name=”price” type=”float” indexed=”true” stored=”true”/>
<field name=”popularity” type=”int” indexed=”true” stored=”true” />
<field name=”inStock” type=”boolean” indexed=”true” stored=”true” />
<field name=”store” type=”location” indexed=”true” stored=”true”/>
<dynamicField name=”*_dt” type=”date” indexed=”true” stored=”true”/>
<field name=”data_source” stored=”false” type=”text_en” indexed=”true”/>

And here is the ipod_other.xml file;

<add> <doc>
<field name=”id”>F8V7067-APL-KIT</field>
<field name=”name”>Belkin Mobile Power Cord for iPod w/ Dock</field>
<field name=”manu”>Belkin</field>
<field name=”cat”>electronics</field>
<field name=”cat”>connector</field>
<field name=”features”>car power adapter, white</field>
<field name=”weight”>4</field>
<field name=”price”>19.95</field>
<field name=”popularity”>1</field>
<field name=”inStock”>false</field>
<field name=”store”>45.17614,-93.87341</field>
<field name=”manufacturedate_dt”>2005-08-01T16:30:25Z</field>
</doc> </add>


Viewing all articles
Browse latest Browse all 3435

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>