Key bindings for UTF-8 encoded characters

Internally characters (keys) have two representations: integers and key names. Key names are characters strings, usually the name of the character; e.g., the character A has the representations 65 and "A", and the tab character the representations 9 and "TAB". The function keys_int2str() turns the integer representation of a key/character into the key name. For display purposes the key names are usually confined to have display width at most three. Some curses pseudo-keys have longer key names; e.g., the back-tab character is "KEY_BTAB". A long key name makes a character difficult to recognize in the status bar menu. The key name of a multibyte, UTF-8 encoded character is the conventional Unicode name of the code point; e.g., the character ü has key name "U+00FC" because ü is the code point 0xFC. Most of these look alike in the status bar menu. The patch makes the key name of a multibyte character look like that of a singlebyte character: the character itself, i.e. the key name of the character ü is "ü". The main tool is implementation of a utf8_encode() routine. Signed-off-by: Lukas Fleischer <lfleischer@calcurse.org>
author: Lars Henriksen <LarsHenriksen@get2net.dk> 2018-03-26 18:44:08 +0200
committer: Lukas Fleischer <lfleischer@calcurse.org> 2018-06-03 11:26:12 +0200
commit: 7078556f9d055cb46339d436add2a03cc8abbc71 (patch)
tree: 49cc66196b08fffa061987fbba6490ebda4c2a1f /src/utf8.c
parent: 431e4a00e7792d3421c7122c32ca5df81505caf3 (diff)
download: calcurse-7078556f9d055cb46339d436add2a03cc8abbc71.tar.gz
calcurse-7078556f9d055cb46339d436add2a03cc8abbc71.zip
1 files changed, 38 insertions, 1 deletions
diff --git a/src/utf8.c b/src/utf8.c
index 6b04331..b1976af 100644
--- a/src/utf8.c
+++ b/src/utf8.c
@@ -291,7 +291,44 @@ int utf8_decode(const char *s)
 	}
 }
 
-/* Get the width of a UTF-8 character. */
+/*
+ * Encode a Unicode code point.
+ * Return a pointer to the resulting UTF-8 encoded character.
+ */
+char *utf8_encode(int u)
+{
+	static char c[5]; /* 4 bytes + string termination */
+
+	/* 0x0000 - 0x007F: 0xxxxxxx */
+	if (u < 0x80) {
+		*(c + 1) = '\0';
+		*c = u;
+	/* 0x0080 - 0x07FF: 110xxxxx 10xxxxxx */
+	} else if (u < 0x800) {
+		*(c + 2) = '\0';
+		*(c + 1) = (u       & 0x3F) | 0x80;
+		*c       = (u >> 6)         | 0xC0;
+	/* 0x0800 - 0xFFFF: 1110xxxx 10xxxxxx 10xxxxxx */
+	} else if (u < 0x10000) {
+		*(c + 3) = '\0';
+		*(c + 2) = (u       & 0x3F) | 0x80;
+		*(c + 1) = (u >> 6  & 0x3F) | 0x80;
+		*c       = (u >> 12)        | 0xE0;
+	} else if (u < 0x110000) {
+	/* 0x10000 - 0x10FFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx */
+		*(c + 4) = '\0';
+		*(c + 3) = (u       & 0x3F) | 0x80;
+		*(c + 2) = (u >> 6  & 0x3F) | 0x80;
+		*(c + 1) = (u >> 12 & 0x3F) | 0x80;
+		*c       = (u >> 18)        | 0xF0;
+	} else {
+		return NULL;
+	}
+
+	return c;
+}
+
+/* Get the display width of a UTF-8 character. */
 int utf8_width(char *s)
 {
 	int val, low, high, cur;
author	Lars Henriksen <LarsHenriksen@get2net.dk>	2018-03-26 18:44:08 +0200
committer	Lukas Fleischer <lfleischer@calcurse.org>	2018-06-03 11:26:12 +0200
commit	7078556f9d055cb46339d436add2a03cc8abbc71 (patch)
tree	49cc66196b08fffa061987fbba6490ebda4c2a1f /src/utf8.c
parent	431e4a00e7792d3421c7122c32ca5df81505caf3 (diff)
download	calcurse-7078556f9d055cb46339d436add2a03cc8abbc71.tar.gz calcurse-7078556f9d055cb46339d436add2a03cc8abbc71.zip