Updated: libunistring 1.1-1

Erwin Waterlander waterlan@xs4all.nl
Fri Oct 28 06:41:28 GMT 2022

libunistring (source package)
libunistring5 (runtime library)
libunistring-devel (development library and include files)
libunistring-doc (documentation)


New in 1.1:
* The data tables and algorithms have been updated to Unicode version 15.0.0.

New in 1.0:
* The license has changed from "LGPLv3+ or GPLv2" to "LGPLv3+ or GPLv2+".
* The data tables and algorithms have been updated to Unicode version 14.0.0.
* The functions u8_uctomb, u16_uctomb, u32_uctomb now support strings larger
  than 2 GiB by taking an 'n' argument of type ptrdiff_t (instead of int).
* The functions u*_possible_linebreaks and u*_width_linebreaks now make it
  easier to work with strings that contain CR-LF sequences: In this case,
  in the returned array, it will return UC_BREAK_CR_BEFORE_LF followed by
* There are new properties for recognizing pictographic symbols and
  regional indicators:
    - UC_PROPERTY_EMOJI                  uc_is_property_emoji
    - UC_PROPERTY_EMOJI_PRESENTATION     uc_is_property_emoji_presentation
    - UC_PROPERTY_EMOJI_MODIFIER         uc_is_property_emoji_modifier
    - UC_PROPERTY_EMOJI_MODIFIER_BASE    uc_is_property_emoji_modifier_base
    - UC_PROPERTY_EMOJI_COMPONENT        uc_is_property_emoji_component
    - UC_PROPERTY_EXTENDED_PICTOGRAPHIC  uc_is_property_extended_pictographic
    - UC_PROPERTY_REGIONAL_INDICATOR     uc_is_property_regional_indicator
* Fixed multithread-safety bugs on Cygwin, native Windows, and Haiku.



Text files are nowadays usually encoded in Unicode, and may consist of 
very different scripts – from Latin letters to Chinese Hanzi –, with 
many kinds of special characters – accents, right-to-left writing marks, 
hyphens, Roman numbers, and much more. But the POSIX platform APIs for 
text do not contain adequate functions for dealing with particular 
properties of many Unicode characters. In fact, the POSIX APIs for text 
have several assumptions at their base which don't hold for Unicode 

This library provides functions for manipulating Unicode strings and for 
manipulating C strings according to the Unicode standard.

homepage: http://www.gnu.org/s/libunistring/
license: LGPL


This library consists of the following parts:

<unistr.h> elementary string functions
<uniconv.h> conversion from/to legacy encodings
<unistdio.h> formatted output to strings
<uniname.h> character names
<unictype.h> character classification and properties
<uniwidth.h> string width when using nonproportional fonts
<uniwbrk.h> word breaks
<unilbrk.h> line breaking algorithm
<uninorm.h> normalization (composition and decomposition)
<unicase.h> case folding
<uniregex.h> regular expressions (not yet implemented)
<unigbrk.h> grapheme cluster breaking

Who needs libunistring?

libunistring is for you if your application involves non-trivial text 
processing, such as upper/lower case conversions, line breaking, 
operations on words, or more advanced analysis of text. Text provided by 
the user can, in general, contain characters of all kinds of scripts. 
The text processing functions provided by this library handle all 
scripts and all languages.

libunistring is for you if your application already uses the ISO C / 
POSIX <ctype.h>, <wctype.h> functions and the text it operates on is 
provided by the user and can be in any language.

libunistring is also for you if your application uses Unicode strings as 
internal in-memory representation

Erwin Waterlander

More information about the Cygwin-announce mailing list