public final class CmsHtmlStripper
extends java.lang.Object
All tags that are not explicitly allowed via invocation of one of the
addPreserve...
methods will be missing in the result of the method
.stripHtml(String)
Instances are reusable but not shareable (multithreading). If configuration should be changed
between subsequent invocations of
method
stripHtml(String)
has to be called.
reset()
构造器和说明 |
---|
CmsHtmlStripper()
Default constructor that turns echo on and uses the settings for replacing tags.
|
CmsHtmlStripper(boolean useTidy)
Creates an instance with control whether tidy is used.
|
限定符和类型 | 方法和说明 |
---|---|
boolean |
addPreserveTag(java.lang.String tagName)
Adds a tag that will be preserved by
. |
void |
addPreserveTagList(java.util.List<java.lang.String> preserveTags)
Convenience method for adding several tags to preserve.
|
void |
addPreserveTags(java.lang.String tagList,
char separator)
Convenience method for adding several tags to preserve
in form of a delimiter-separated String.
|
void |
reset()
Resets the configuration of the tags to preserve.
|
java.lang.String |
stripHtml(java.lang.String html)
Extracts the text from the given html content, assuming the given html encoding.
|
public CmsHtmlStripper()
public CmsHtmlStripper(boolean useTidy)
useTidy
- if true tidy will be usedpublic boolean addPreserveTag(java.lang.String tagName)
stripHtml(String)
.tagName
- the name of the tag to keep (case insensitive)public void addPreserveTagList(java.util.List<java.lang.String> preserveTags)
preserveTags
- a List<String>
with the case-insensitive tag names of the tags to preserveaddPreserveTag(String)
public void addPreserveTags(java.lang.String tagList, char separator)
The String will be
with CmsStringUtil.splitAsList(String, char, boolean)
tagList
as the first argument, separator
as the
second argument and the third argument set to true (trimming - support).
tagList
- a delimiter-separated String with case-insensitive tag names to preserve by
stripHtml(String)
separator
- the delimiter that separates tag names in the tagList
argumentaddPreserveTag(String)
public void reset()
This is called from the constructor and only has to be called if this instance is reused with a differen configuration (of tags to keep).
public java.lang.String stripHtml(java.lang.String html) throws org.htmlparser.util.ParserException
Additionally tags are replaced / removed according to the configuration of this instance.
html
- the content to extract the plain text from.org.htmlparser.util.ParserException
- if something goes wrong.