Hey Saygin,
I haven’t used Solr in a while and I never indexed xml or sequence files.
I did have to make some changes to the schema.xml file as far as I remember.
The problem I had was that the directory ingest mapper did not know what files it is looking at, where as in the CSV example I was specifying the CSV class directory.
It was then returning meta data from Apache Tika and Apache Solr did not know what it was so it fell over.
By including the below code in the schema.xml , the returned values are being handled correctly and the map reduce job completes successfully.
<field name=”id” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”text” multiValued=”true” stored=”true” type=”text_en” indexed=”true”/>
<field name=”data_source” stored=”false” type=”text_en” indexed=”true”/>
<field name=”body” type=”text_en” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”parsing_time” type=”text_en” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”Content-Length” type=”text_en” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”Content-Encoding” type=”text_en” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”parsing” type=”text_en” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”Content-Type” type=”text_en” indexed=”true” stored=”true” multiValued=”true”/>
<copyField dest=”text” source=”id”/>
<copyField dest=”text” source=”body”/>
<copyField dest=”text” source=”parsing_time”/>
<copyField dest=”text” source=”Content-Length”/>
<copyField dest=”text” source=”Content-Encoding”/>
<copyField dest=”text” source=”parsing”/>
<copyField dest=”text” source=”Content-Type”/>
I am assuming that the xml/sequence file will be returning meta data that you are not handling in your schema.xml file. This should show up in the errors…