tango.text.Unicode

License:

BSD style: see license.txt

Version:

Initial release: Sept 2007

Authors:

Peter

Provides case mapping Functions for Unicode Strings. As of now it is only 99 % complete, because it does not take into account Conditional case mappings. This means the Greek Letter Sigma will not be correctly case mapped at the end of a Word, and the Locales Lithuanian, Turkish and Azeri are not taken into account during Case Mappings. This means all in all around 12 Characters will not be mapped correctly under some circumstances.

ICU4j also does not handle these cases at the moment.

Unittests are written against output from ICU4j

This Module tries to minimize Memory allocation and usage. You can always pass the output buffer that should be used to the case mapping function, which will be resized if necessary.
char[] toUpper(const(char)[] input, char[] output = null)
Converts an Utf8 String to Upper case

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string

Converts an Utf8 String to Upper case

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
wchar[] toUpper(const(wchar)[] input, wchar[] output = null)
Converts an Utf16 String to Upper case

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
dchar[] toUpper(const(dchar)[] input, dchar[] output = null)
Converts an Utf32 String to Upper case

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
char[] toLower(const(char)[] input, char[] output = null)
Converts an Utf8 String to Lower case

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
wchar[] toLower(const(wchar)[] input, wchar[] output = null)
Converts an Utf16 String to Lower case

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
dchar[] toLower(const(dchar)[] input, dchar[] output = null)
Converts an Utf32 String to Lower case

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
char[] toFold(const(char)[] input, char[] output = null)
Converts an Utf8 String to Folding case Folding case is used for case insensitive comparsions.

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
wchar[] toFold(const(wchar)[] input, wchar[] output = null)
Converts an Utf16 String to Folding case Folding case is used for case insensitive comparsions.

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
dchar[] toFold(const(dchar)[] input, dchar[] output = null)
Converts an Utf32 String to Folding case Folding case is used for case insensitive comparsions.

Parameters:

inputString to be case mapped
outputthis output buffer will be used unless too small

Returns:

the case mapped string
bool isDigit(dchar ch)
Determines if a character is a digit. It returns true for decimal digits only.

Parameters:

chthe character to be inspected
bool isLetter(int ch)
Determines if a character is a letter.

Parameters:

chthe character to be inspected
bool isLetterOrDigit(int ch)
Determines if a character is a letter or a decimal digit.

Parameters:

chthe character to be inspected
bool isLower(dchar ch)
Determines if a character is a lower case letter.

Parameters:

chthe character to be inspected
bool isTitle(dchar ch)
Determines if a character is a title case letter. In case of combined letters, only the first is upper and the second is lower. Some of these special characters can be found in the croatian and greek language.

See Also:

http://en.wikipedia.org/wiki/Capitalization

Parameters:

chthe character to be inspected
bool isUpper(dchar ch)
Determines if a character is a upper case letter.

Parameters:

chthe character to be inspected
bool isWhitespace(dchar ch)
Determines if a character is a Whitespace character. Whitespace characters are characters in the General Catetories Zs, Zl, Zp without the No Break spaces plus the control characters out of the ASCII range, that are used as spaces: TAB VT LF FF CR FS GS RS US NL

WARNING:

look at isSpace, maybe that function does more what you expect.

Parameters:

chthe character to be inspected
bool isSpace(dchar ch)
Detemines if a character is a Space character as specified in the Unicode Standard.

WARNING:

look at isWhitespace, maybe that function does more what you expect.

Parameters:

chthe character to be inspected
bool isPrintable(dchar ch)
Detemines if a character is a printable character as specified in the Unicode Standard.

Parameters:

chthe character to be inspected