public final class CmsExtractorMsOfficeOOXML extends A_CmsTextExtractor
Supported formats are MS Word (.docx), MS PowerPoint (.pptx) and MS Excel (.xlsx).
The OLE 2 format was introduced in Microsoft Office version 97 and was the default format until Office version 2007 and the new XML-based OOXML format.
限定符和类型 | 方法和说明 |
---|---|
I_CmsExtractionResult |
extractText(java.io.InputStream in)
Extracts the text and meta information from the document on the input stream.
|
static I_CmsTextExtractor |
getExtractor()
Returns an instance of this text extractor.
|
combineContentItem, extractText, extractText, extractText, extractText, removeControlChars
public static I_CmsTextExtractor getExtractor()
public I_CmsExtractionResult extractText(java.io.InputStream in) throws java.lang.Exception
I_CmsTextExtractor
The encoding of the input stream is either not required (the document type may have one common default encoding) or the extractor is able to divine the encoding from the provided input stream automatically.
Delivers is the same result as calling
when I_CmsTextExtractor.extractText(InputStream, String)
String == null
.
extractText
在接口中 I_CmsTextExtractor
extractText
在类中 A_CmsTextExtractor
in
- the input stream for the document to extract the text fromjava.lang.Exception
- if the text extration failsI_CmsTextExtractor.extractText(java.io.InputStream, java.lang.String)