Package org.jcodings.specific
Class BaseUTF8Encoding
- java.lang.Object
-
- org.jcodings.Encoding
-
- org.jcodings.AbstractEncoding
-
- org.jcodings.MultiByteEncoding
-
- org.jcodings.unicode.UnicodeEncoding
-
- org.jcodings.specific.BaseUTF8Encoding
-
- All Implemented Interfaces:
java.lang.Cloneable
- Direct Known Subclasses:
NonStrictUTF8Encoding
,UTF8Encoding
abstract class BaseUTF8Encoding extends UnicodeEncoding
-
-
Field Summary
Fields Modifier and Type Field Description private static int
INVALID_CODE_FE
private static int
INVALID_CODE_FF
(package private) static boolean
USE_INVALID_CODE_SCHEME
private static int
VALID_CODE_LIMIT
-
Constructor Summary
Constructors Modifier Constructor Description protected
BaseUTF8Encoding(int[] EncLen, int[][] Trans)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
codeToMbc(int code, byte[] bytes, int p)
Extracts code point into it's multibyte representationint
codeToMbcLength(int code)
Returns character length given a code point Oniguruma equivalent:code_to_mbclen
int[]
ctypeCodeRange(int ctype, IntHolder sbOut)
utf8_get_ctype_code_rangejava.lang.String
getCharsetName()
The name of the equivalent Java Charset for this encoding.boolean
isNewLine(byte[] bytes, int p, int end)
onigenc_is_mbc_newline_0x0a / used also by multibyte encodingsboolean
isReverseMatchAllowed(byte[] bytes, int p, int end)
onigenc_always_true_is_allowed_reverse_matchint
leftAdjustCharHead(byte[] bytes, int p, int s, int end)
utf8_left_adjust_char_headint
mbcCaseFold(int flag, byte[] bytes, IntHolder pp, int end, byte[] fold)
onigenc_ascii_mbc_case_foldint
mbcToCode(byte[] bytes, int p, int end)
Returns code point for a character Oniguruma equivalent:mbc_to_code
(package private) static byte
trail0(int code)
(package private) static byte
trailS(int code, int shift)
private static boolean
utf8IsLead(int c)
-
Methods inherited from class org.jcodings.unicode.UnicodeEncoding
applyAllCaseFold, caseFoldCodesByString, caseMap, ctypeCodeRange, isCodeCType, isInCodeRange, propertyNameToCType
-
Methods inherited from class org.jcodings.MultiByteEncoding
isInRange, length, lengthForTwoUptoFour, mb2CodeToMbc, mb2CodeToMbcLength, mb2IsCodeCType, mb4CodeToMbc, mb4CodeToMbcLength, mb4IsCodeCType, mbnMbcCaseFold, mbnMbcToCode, missing, missing, safeLengthForUptoFour, safeLengthForUptoThree, safeLengthForUptoTwo, strCodeAt, strLength
-
Methods inherited from class org.jcodings.AbstractEncoding
asciiApplyAllCaseFold, asciiCaseFoldCodesByString, asciiMbcCaseFold, isCodeCTypeInternal
-
Methods inherited from class org.jcodings.Encoding
asciiToLower, asciiToUpper, digitVal, equals, getCharset, getIndex, getName, hashCode, isAlnum, isAlpha, isAscii, isAscii, isAsciiCompatible, isBlank, isCntrl, isDigit, isDummy, isFixedWidth, isGraph, isLower, isMbcAscii, isMbcCrnl, isMbcHead, isMbcWord, isNewLine, isPrint, isPunct, isSbWord, isSingleByte, isSpace, isUnicode, isUpper, isUTF8, isWord, isWordGraphPrint, isXDigit, length, load, load, maxLength, maxLengthDistance, mbcodeStartPosition, minLength, odigitVal, prevCharHead, rightAdjustCharHead, rightAdjustCharHeadWithPrev, setDummy, setName, setName, step, stepBack, strByteLengthNull, strLengthNull, strNCmp, toLowerCaseTable, toString, xdigitVal
-
-
-
-
Field Detail
-
USE_INVALID_CODE_SCHEME
static final boolean USE_INVALID_CODE_SCHEME
- See Also:
- Constant Field Values
-
INVALID_CODE_FE
private static final int INVALID_CODE_FE
- See Also:
- Constant Field Values
-
INVALID_CODE_FF
private static final int INVALID_CODE_FF
- See Also:
- Constant Field Values
-
VALID_CODE_LIMIT
private static final int VALID_CODE_LIMIT
- See Also:
- Constant Field Values
-
-
Method Detail
-
getCharsetName
public java.lang.String getCharsetName()
Description copied from class:Encoding
The name of the equivalent Java Charset for this encoding. Defaults to the name of the encoding. Subclasses can override this to provide a different name.- Overrides:
getCharsetName
in classUnicodeEncoding
- Returns:
- the name of the equivalent Java Charset for this encoding
-
isNewLine
public boolean isNewLine(byte[] bytes, int p, int end)
Description copied from class:AbstractEncoding
onigenc_is_mbc_newline_0x0a / used also by multibyte encodings- Overrides:
isNewLine
in classAbstractEncoding
-
codeToMbcLength
public int codeToMbcLength(int code)
Description copied from class:Encoding
Returns character length given a code point Oniguruma equivalent:code_to_mbclen
- Specified by:
codeToMbcLength
in classEncoding
-
mbcToCode
public int mbcToCode(byte[] bytes, int p, int end)
Description copied from class:Encoding
Returns code point for a character Oniguruma equivalent:mbc_to_code
-
trailS
static byte trailS(int code, int shift)
-
trail0
static byte trail0(int code)
-
codeToMbc
public int codeToMbc(int code, byte[] bytes, int p)
Description copied from class:Encoding
Extracts code point into it's multibyte representation
-
mbcCaseFold
public int mbcCaseFold(int flag, byte[] bytes, IntHolder pp, int end, byte[] fold)
Description copied from class:AbstractEncoding
onigenc_ascii_mbc_case_fold- Overrides:
mbcCaseFold
in classUnicodeEncoding
- Parameters:
flag
- case fold flagpp
- anIntHolder
that points at character headfold
- a buffer where to extract case folded character Oniguruma equivalent:mbc_case_fold
-
ctypeCodeRange
public int[] ctypeCodeRange(int ctype, IntHolder sbOut)
utf8_get_ctype_code_range- Specified by:
ctypeCodeRange
in classEncoding
-
utf8IsLead
private static boolean utf8IsLead(int c)
-
leftAdjustCharHead
public int leftAdjustCharHead(byte[] bytes, int p, int s, int end)
utf8_left_adjust_char_head- Specified by:
leftAdjustCharHead
in classEncoding
- Parameters:
bytes
- byte streamp
- positions
- stopend
- end
-
isReverseMatchAllowed
public boolean isReverseMatchAllowed(byte[] bytes, int p, int end)
onigenc_always_true_is_allowed_reverse_match- Specified by:
isReverseMatchAllowed
in classEncoding
-
-