Class BaseUTF8Encoding

    • Constructor Detail

      • BaseUTF8Encoding

        protected BaseUTF8Encoding​(int[] EncLen,
                                   int[][] Trans)
    • Method Detail

      • getCharsetName

        public java.lang.String getCharsetName()
        Description copied from class: Encoding
        The name of the equivalent Java Charset for this encoding. Defaults to the name of the encoding. Subclasses can override this to provide a different name.
        Overrides:
        getCharsetName in class UnicodeEncoding
        Returns:
        the name of the equivalent Java Charset for this encoding
      • isNewLine

        public boolean isNewLine​(byte[] bytes,
                                 int p,
                                 int end)
        Description copied from class: AbstractEncoding
        onigenc_is_mbc_newline_0x0a / used also by multibyte encodings
        Overrides:
        isNewLine in class AbstractEncoding
      • codeToMbcLength

        public int codeToMbcLength​(int code)
        Description copied from class: Encoding
        Returns character length given a code point Oniguruma equivalent: code_to_mbclen
        Specified by:
        codeToMbcLength in class Encoding
      • mbcToCode

        public int mbcToCode​(byte[] bytes,
                             int p,
                             int end)
        Description copied from class: Encoding
        Returns code point for a character Oniguruma equivalent: mbc_to_code
        Specified by:
        mbcToCode in class Encoding
      • trailS

        static byte trailS​(int code,
                           int shift)
      • trail0

        static byte trail0​(int code)
      • codeToMbc

        public int codeToMbc​(int code,
                             byte[] bytes,
                             int p)
        Description copied from class: Encoding
        Extracts code point into it's multibyte representation
        Specified by:
        codeToMbc in class Encoding
        Returns:
        character length for the given code point Oniguruma equivalent: code_to_mbc
      • mbcCaseFold

        public int mbcCaseFold​(int flag,
                               byte[] bytes,
                               IntHolder pp,
                               int end,
                               byte[] fold)
        Description copied from class: AbstractEncoding
        onigenc_ascii_mbc_case_fold
        Overrides:
        mbcCaseFold in class UnicodeEncoding
        Parameters:
        flag - case fold flag
        pp - an IntHolder that points at character head
        fold - a buffer where to extract case folded character Oniguruma equivalent: mbc_case_fold
      • utf8IsLead

        private static boolean utf8IsLead​(int c)
      • leftAdjustCharHead

        public int leftAdjustCharHead​(byte[] bytes,
                                      int p,
                                      int s,
                                      int end)
        utf8_left_adjust_char_head
        Specified by:
        leftAdjustCharHead in class Encoding
        Parameters:
        bytes - byte stream
        p - position
        s - stop
        end - end
      • isReverseMatchAllowed

        public boolean isReverseMatchAllowed​(byte[] bytes,
                                             int p,
                                             int end)
        onigenc_always_true_is_allowed_reverse_match
        Specified by:
        isReverseMatchAllowed in class Encoding