程序包 | 说明 |
---|---|
org.opencms.search.extractors |
Contains a generic, low-level framework for extration of plain text content out of various popular file formats.
|
限定符和类型 | 类和说明 |
---|---|
class |
A_CmsTextExtractor
Base utility class that allows extraction of the indexable "plain" text from a given document format.
|
class |
CmsExtractorHtml
Extracts the text from an HTML document.
|
class |
CmsExtractorMsOfficeOLE2
Extracts text data from a VFS resource that is an OLE 2 MS Office document.
|
class |
CmsExtractorMsOfficeOOXML
Extracts text data from a VFS resource that is an OOXML MS Office document.
|
class |
CmsExtractorOpenOffice
Extracts the text from OpenOffice documents (.ods, .odf).
|
class |
CmsExtractorPdf
Extracts the text from a PDF document.
|
class |
CmsExtractorRtf
Extracts the text from a RTF document.
|
限定符和类型 | 方法和说明 |
---|---|
static I_CmsTextExtractor |
CmsExtractorOpenOffice.getExtractor()
Returns an instance of this text extractor.
|
static I_CmsTextExtractor |
CmsExtractorMsOfficeOOXML.getExtractor()
Returns an instance of this text extractor.
|
static I_CmsTextExtractor |
CmsExtractorHtml.getExtractor()
Returns an instance of this text extractor.
|
static I_CmsTextExtractor |
CmsExtractorRtf.getExtractor()
Returns an instance of this text extractor.
|
static I_CmsTextExtractor |
CmsExtractorPdf.getExtractor()
Returns an instance of this text extractor.
|
static I_CmsTextExtractor |
CmsExtractorMsOfficeOLE2.getExtractor()
Returns an instance of this text extractor.
|