Class AbstractXmlParser

    • Field Detail

      • PATTERN_ENTITY_1

        private static final java.util.regex.Pattern PATTERN_ENTITY_1
        Entity pattern for HTML entity, i.e.   "|^\\s]+)(\\s)+\"(\\s)*(&[a-zA-Z]{2,6};)(\\s)*\"(\\s)*>
        see http://www.w3.org/TR/REC-xml/#NT-EntityDecl.
      • PATTERN_ENTITY_2

        private static final java.util.regex.Pattern PATTERN_ENTITY_2
        Entity pattern for Unicode entity, i.e. & "|^\\s]+)(\\s)+\"(\\s)*(&(#x?[0-9a-fA-F]{1,5};)*)(\\s)*\"(\\s)*>"
        see http://www.w3.org/TR/REC-xml/#NT-EntityDecl.
      • ignorableWhitespace

        private boolean ignorableWhitespace
      • collapsibleWhitespace

        private boolean collapsibleWhitespace
      • trimmableWhitespace

        private boolean trimmableWhitespace
      • entities

        private java.util.Map<java.lang.String,​java.lang.String> entities
      • validate

        private boolean validate
      • addDefaultEntities

        private boolean addDefaultEntities
        If set the parser will be loaded with all single characters from the XHTML specification. The entities used:
        • http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
        • http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
        • http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
    • Constructor Detail

      • AbstractXmlParser

        public AbstractXmlParser()
    • Method Detail

      • parse

        public void parse​(java.io.Reader source,
                          Sink sink,
                          java.lang.String reference)
                   throws ParseException
        Parses the given source model and emits Doxia events into the given sink.
        Specified by:
        parse in interface Parser
        Parameters:
        source - not null reader that provides the source document. You could use newReader methods from ReaderFactory.
        sink - A sink that consumes the Doxia events.
        reference - the reference
        Throws:
        ParseException - if the model could not be parsed.
      • initXmlParser

        protected void initXmlParser​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser)
                              throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Initializes the parser with custom entities or other options.
        Parameters:
        parser - A parser, not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem initializing the parser
      • getAttributesFromParser

        protected SinkEventAttributeSet getAttributesFromParser​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser)
        Converts the attributes of the current start tag of the given parser to a SinkEventAttributeSet.
        Parameters:
        parser - A parser, not null.
        Returns:
        a SinkEventAttributeSet or null if the current parser event is not a start tag.
        Since:
        1.1
      • parseXml

        private void parseXml​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                              Sink sink)
                       throws org.codehaus.plexus.util.xml.pull.XmlPullParserException,
                              MacroExecutionException
        Parse the model from the XmlPullParser into the given sink.
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
        MacroExecutionException - if there's a problem executing a macro
      • handleStartTag

        protected abstract void handleStartTag​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                               Sink sink)
                                        throws org.codehaus.plexus.util.xml.pull.XmlPullParserException,
                                               MacroExecutionException
        Goes through the possible start tags.
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
        MacroExecutionException - if there's a problem executing a macro
      • handleEndTag

        protected abstract void handleEndTag​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                             Sink sink)
                                      throws org.codehaus.plexus.util.xml.pull.XmlPullParserException,
                                             MacroExecutionException
        Goes through the possible end tags.
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
        MacroExecutionException - if there's a problem executing a macro
      • handleText

        protected void handleText​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                  Sink sink)
                           throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handles text events.

        This is a default implementation, if the parser points to a non-empty text element, it is emitted as a text event into the specified sink.

        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events. Not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      • handleCdsect

        protected void handleCdsect​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                    Sink sink)
                             throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handles CDATA sections.

        This is a default implementation, all data are emitted as text events into the specified sink.

        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events. Not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      • handleComment

        protected void handleComment​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                     Sink sink)
                              throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handles comments.

        This is a default implementation, all data are emitted as comment events into the specified sink.

        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events. Not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      • handleEntity

        protected void handleEntity​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                    Sink sink)
                             throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handles entities.

        This is a default implementation, all entities are resolved and emitted as text events into the specified sink, except:

        • the entities with names #160, nbsp and #x00A0 are emitted as nonBreakingSpace() events.
        Parameters:
        parser - A parser, not null.
        sink - the sink to receive the events. Not null.
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      • handleUnknown

        protected void handleUnknown​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                     Sink sink,
                                     int type)
        Handles an unknown event.

        This is a default implementation, all events are emitted as unknown events into the specified sink.

        Parameters:
        parser - the parser to get the event from.
        sink - the sink to receive the event.
        type - the tag event type. This should be one of HtmlMarkup.TAG_TYPE_SIMPLE, HtmlMarkup.TAG_TYPE_START, HtmlMarkup.TAG_TYPE_END or HtmlMarkup.ENTITY_TYPE. It will be passed as the first argument of the required parameters to the Sink Sink.unknown(String, Object[], org.apache.maven.doxia.sink.SinkEventAttributes) method.
      • isIgnorableWhitespace

        protected boolean isIgnorableWhitespace()

        isIgnorableWhitespace.

        Returns:
        true if whitespace will be ignored, false otherwise.
        Since:
        1.1
        See Also:
        setIgnorableWhitespace(boolean)
      • setIgnorableWhitespace

        protected void setIgnorableWhitespace​(boolean ignorable)
        Specify that whitespace will be ignored. I.e.:
        <tr> <td/> </tr>
        is equivalent to
        <tr><td/></tr>
        Parameters:
        ignorable - true to ignore whitespace, false otherwise.
        Since:
        1.1
      • isCollapsibleWhitespace

        protected boolean isCollapsibleWhitespace()

        isCollapsibleWhitespace.

        Returns:
        true if text will collapse, false otherwise.
        Since:
        1.1
        See Also:
        setCollapsibleWhitespace(boolean)
      • setCollapsibleWhitespace

        protected void setCollapsibleWhitespace​(boolean collapsible)
        Specify that text will be collapsed. I.e.:
        Text   Text
        is equivalent to
        Text Text
        Parameters:
        collapsible - true to allow collapsible text, false otherwise.
        Since:
        1.1
      • isTrimmableWhitespace

        protected boolean isTrimmableWhitespace()

        isTrimmableWhitespace.

        Returns:
        true if text will be trim, false otherwise.
        Since:
        1.1
        See Also:
        setTrimmableWhitespace(boolean)
      • setTrimmableWhitespace

        protected void setTrimmableWhitespace​(boolean trimmable)
        Specify that text will be collapsed. I.e.:
        <p> Text </p>
        is equivalent to
        <p>Text</p>
        Parameters:
        trimmable - true to allow trimmable text, false otherwise.
        Since:
        1.1
      • getText

        protected java.lang.String getText​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser)

        getText.

        Parameters:
        parser - A parser, not null.
        Returns:
        the XmlPullParser.getText() taking care of trimmable or collapsible configuration.
        Since:
        1.1
        See Also:
        XmlPullParser.getText(), isCollapsibleWhitespace(), isTrimmableWhitespace()
      • getLocalEntities

        protected java.util.Map<java.lang.String,​java.lang.String> getLocalEntities()
        Return the defined entities in a local doctype. I.e.:
         <!DOCTYPE foo [
           <!ENTITY bar "&#x160;">
           <!ENTITY bar1 "&#x161;">
         ]>
         
        Returns:
        a map of the defined entities in a local doctype.
        Since:
        1.1
      • isValidate

        public boolean isValidate()

        isValidate.

        Returns:
        true if XML content will be validate, false otherwise.
        Since:
        1.1
      • setValidate

        public void setValidate​(boolean validate)
        Specify a flag to validate or not the XML content.
        Parameters:
        validate - the validate to set
        Since:
        1.1
        See Also:
        AbstractParser.parse(Reader, Sink)
      • getAddDefaultEntities

        public boolean getAddDefaultEntities()
        Since:
        2.0.0-M4
      • setAddDefaultEntities

        public void setAddDefaultEntities​(boolean addDefaultEntities)
        Since:
        2.0.0-M4
      • addEntity

        private void addEntity​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                               java.lang.String entityName,
                               java.lang.String entityValue)
                        throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Add an entity given by entityName and entityValue to entities.
        By default, we exclude the default XML entities: &amp;, &lt;, &gt;, &quot; and &apos;.
        Parameters:
        parser - not null
        entityName - not null
        entityValue - not null
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if any
        See Also:
        XmlPullParser.defineEntityReplacementText(String, String)
      • addLocalEntities

        private void addLocalEntities​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                      java.lang.String text)
                               throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handle entities defined in a local doctype as the following:
         <!DOCTYPE foo [
           <!ENTITY bar "&#x160;">
           <!ENTITY bar1 "&#x161;">
         ]>
         
        Parameters:
        parser - not null
        text - not null
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if any
      • addDTDEntities

        private void addDTDEntities​(org.codehaus.plexus.util.xml.pull.XmlPullParser parser,
                                    java.lang.String text)
                             throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
        Handle entities defined in external doctypes as the following:
         <!DOCTYPE foo [
           <!-- These are the entity sets for ISO Latin 1 characters for the XHTML -->
           <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
                  "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
           %HTMLlat1;
         ]>
         
        Parameters:
        parser - not null
        text - not null
        Throws:
        org.codehaus.plexus.util.xml.pull.XmlPullParserException - if any