Hi to all from Istanbul, Turkey,
I’m trying to index XML files (ipod_other.xml from lucidworks’ example files, converted into sequence file format), using SolrXMLIngestMapper jars.
I’ve modified the schema.xml file by making the necesssary addions of the fields stated in the ipod_other.xml file.
Here’s my command:
hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.SolrXMLIngestMapper -c hdp1 -i /user/hadoop/output/1420812982906sfu/part-r-00000 -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -s http://dc2vmhadappt01:8983/solr
In the end I constatly get “Didn’t ingest any documents, failing” error.
OR
If I use DirectoryIngestMapper jar, that uses Tika I suppose,
I do not get any errors but I can not see any documents indexed, nothing happens.
hadoop jar jobjar com.lucidworks.hadoop.ingest.IngestJob -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.DirectoryIngestMapper -c collection1 -i /user/solr/data/xml/*.xml -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -s http://dc2vmhadappt01:8983/solr
Anybody out there to help me out with this problem, any help is appreciated..
Thanks
Here are the addions to the schema.xml:
<field name=”id” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”name” multiValued=”true” stored=”true” type=”text_en” indexed=”true”/>
<field name=”sku” type=”text_en_splitting_tight” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”manu” type=”text_general” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”cat” type=”string” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”features” type=”text_general” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”includes” type=”text_general” indexed=”true” stored=”true” termVectors=”true” termPositions=”true” termOffsets=”true” />
<field name=”weight” type=”float” indexed=”true” stored=”true”/>
<field name=”price” type=”float” indexed=”true” stored=”true”/>
<field name=”popularity” type=”int” indexed=”true” stored=”true” />
<field name=”inStock” type=”boolean” indexed=”true” stored=”true” />
<field name=”store” type=”location” indexed=”true” stored=”true”/>
<dynamicField name=”*_dt” type=”date” indexed=”true” stored=”true”/>
<field name=”data_source” stored=”false” type=”text_en” indexed=”true”/>
And here is the ipod_other.xml file;
<add> <doc>
<field name=”id”>F8V7067-APL-KIT</field>
<field name=”name”>Belkin Mobile Power Cord for iPod w/ Dock</field>
<field name=”manu”>Belkin</field>
<field name=”cat”>electronics</field>
<field name=”cat”>connector</field>
<field name=”features”>car power adapter, white</field>
<field name=”weight”>4</field>
<field name=”price”>19.95</field>
<field name=”popularity”>1</field>
<field name=”inStock”>false</field>
<field name=”store”>45.17614,-93.87341</field>
<field name=”manufacturedate_dt”>2005-08-01T16:30:25Z</field>
</doc> </add>