aboutsummaryrefslogtreecommitdiffstats
path: root/src/utf8.c
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright ranges for 2023Lukas Fleischer2023-04-111-1/+1
| | | | Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Update copyright ranges for 2022Lukas Fleischer2022-03-111-1/+1
| | | | Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Update copyright rangesLukas Fleischer2020-01-301-1/+1
| | | | Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Key bindings for UTF-8 encoded charactersLars Henriksen2018-06-031-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Internally characters (keys) have two representations: integers and key names. Key names are characters strings, usually the name of the character; e.g., the character A has the representations 65 and "A", and the tab character the representations 9 and "TAB". The function keys_int2str() turns the integer representation of a key/character into the key name. For display purposes the key names are usually confined to have display width at most three. Some curses pseudo-keys have longer key names; e.g., the back-tab character is "KEY_BTAB". A long key name makes a character difficult to recognize in the status bar menu. The key name of a multibyte, UTF-8 encoded character is the conventional Unicode name of the code point; e.g., the character ü has key name "U+00FC" because ü is the code point 0xFC. Most of these look alike in the status bar menu. The patch makes the key name of a multibyte character look like that of a singlebyte character: the character itself, i.e. the key name of the character ü is "ü". The main tool is implementation of a utf8_encode() routine. Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Rename utf8_ord() to utf8_decode()Lars Henriksen2018-06-031-2/+2
| | | | | | | Purely for readability and in preparation for the counterpart utf8_encode(). Signed-off-by: Lars Henriksen <LarsHenriksen@get2net.dk> Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Update UTF-8 base codeLars Henriksen2017-12-071-17/+8
| | | | | | | | | | UTF-8 encodes characters in one to four bytes (since 2003). Because 0 is a valid code point, the decode function utf8_ord() should return -1, not 0, on error. As a consequence utf8_width() should return 0 for a continuation byte (as it did previously). Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Factor out UTF-8 code point decodingLukas Fleischer2017-08-301-16/+15
| | | | Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Update copyright rangesLukas Fleischer2017-01-121-1/+1
| | | | Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Refactor UTF-8 choppingLukas Fleischer2016-02-261-0/+23
| | | | | | | Add a function that makes sure a string does not exceed a given display size. If the string is too long, dots ("...") are appended. Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Update copyright rangesLukas Fleischer2016-01-301-1/+1
| | | | Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
* Update copyright rangesLukas Fleischer2015-02-071-1/+1
| | | | Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* Use a macro to determine the size of arraysLukas Fleischer2013-05-041-1/+1
| | | | | | | | Use following macro instead of "sizeof(x) / sizeof(x[0])" everywhere: #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* Use tabs instead of spaces for indentationLukas Fleischer2013-04-141-277/+279
| | | | | | | | | | | This completes our switch to the Linux kernel coding style. Note that we still use deeply nested constructs at some places which need to be fixed up later. Converted using the `Lindent` script from the Linux kernel code base, along with some manual fixes. Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* Fix braces in if-else statementsLukas Fleischer2013-02-171-1/+2
| | | | | | | | | | From the Linux kernel coding guidelines: Do not unnecessarily use braces where a single statement will do. [...] This does not apply if one branch of a conditional statement is a single statement. Use braces in both branches. Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* Update copyright rangesLukas Fleischer2013-02-041-1/+1
| | | | | | | Add 2013 to the copyright range for all source and documentation files. Reported-by: Frederic Culot <frederic@culot.org> Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* Switch to Linux kernel coding styleLukas Fleischer2012-05-211-277/+268
| | | | | | | | | | | | | | Convert our code base to adhere to Linux kernel coding style using Lindent, with the following exceptions: * Use spaces, instead of tabs, for indentation. * Use 2-character indentations (instead of 8 characters). Rationale: We currently have too much levels of indentation. Using 8-character tabs would make huge code parts unreadable. These need to be cleaned up before we can switch to 8 characters. Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* Update copyright rangesLukas Fleischer2012-03-261-1/+1
| | | | | | Add 2012 to the copyright range for all source and documentation files. Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* utf8_width() performance improvementsLukas Fleischer2011-07-021-39/+50
| | | | | | | | | * Sort character width lookup table by character ranges. * Use binary search instead of linear search for UTF-8 character width lookups which will speed up utf8_width() (O(log n) instead of O(n)). Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>
* Add basic UTF-8 helper functionsLukas Fleischer2011-06-291-0/+333
Add utf8_width() and utf8_strwidth() which can be used to calculate the display width of a single character or a string, respectively. A lookup table is used to spot double width characters, as well as composing characters. There currently isn't any code to deal with ambigious characters. Signed-off-by: Lukas Fleischer <calcurse@cryptocrack.de>