Package org.apache.nutch.any23
This packages uses the Apache Any23 library
for parsing and extracting structured data in RDF format from a
variety of Web documents. The supported formats can be found
at Apache Any23.
-
Class Summary Class Description Any23IndexingFilter This implementation ofIndexingFilteradds a triple(s) field to theNutchDocument.Any23ParseFilter This implementation ofHtmlParseFilteruses the Apache Any23 library for parsing and extracting structured data in RDF format from a variety of Web documents.