Various data encoding schemes.
ASCII
See the ASCII table.
The TerSCII character set is designed to serve in the world of ternary information processing systems.
With 26 letters in the English alphabet and comparable numbers in other western and middle-eastern alphabets, the first power of 3 that lends itself to representing a reasonable character set is 34 or 81. A 4-trit character allows encoding the Roman alphabet in both upper and lower case, plus 10 digits and a modest (but insufficient) set of control characters and punctuation marks. In this environment, a code extension system comparable to that of Unicode invites a character code built on 81-character blocks.
| Basic Roman Block | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 00 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 0 | ES | SP | 0 | 9 | I | R | _ | i | r |
| 1 | EL | - | 1 | A | J | S | a | j | s |
| 2 | ET | ' | 2 | B | K | T | b | k | t |
| 3 | LR | , | 3 | C | L | U | c | l | u |
| 4 | OP | ; | 4 | D | M | V | d | m | v |
| 5 | RL | : | 5 | E | N | W | e | n | w |
| 6 | SU | . | 6 | F | O | X | f | o | x |
| 7 | HT | ! | 7 | G | P | Y | g | p | y |
| 8 | SD | ? | 8 | H | Q | Z | h | q | z |
| Code | Meaning |
|---|---|
| ES | End of String, analogous to NULL |
| EL | End of Line, analogous to LF or CR/LF |
| ET | End of Text file |
| LR | Left to Right rendering of following text |
| OP | OverPrint following text on previous char |
| RL | Right to Left rendering of following text |
| SU | Shift Up (superscript) following by 1/3 baseline |
| HT | Horizontal Tab in current rendering direction |
| SD | Shift Down (subscript) following by 1/3 baseline |
| SP | Space |
| 01 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| 0 | ‘ | ||||||||
| 1 | * | ||||||||
| 2 | ’ | ||||||||
| 3 | / | ||||||||
| 4 | | | ||||||||
| 5 | \ | ||||||||
| 6 | ‹ | ||||||||
| 7 | ◊ | ||||||||
| 8 | › |
UTF-8 is capable of encoding all 1,112,064 valid Unicode code points using up to four code bytes.
- starting bytes are
11xx xxxx - continuation bytes are
10xx xxxx
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 0x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
| 1x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | CAN | SUB | ESC | FS | GS | RS | US |
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
| 8x | +0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | +8 | +9 | +A | +B | +C | +D | +E | +F |
| 9x | +10 | +11 | +12 | +13 | +14 | +15 | +16 | +17 | +18 | +19 | +1A | +1B | +1C | +1D | +1E | +1F |
| Ax | +20 | +21 | +22 | +23 | +24 | +25 | +26 | +27 | +28 | +29 | +2A | +2B | +2C | +2D | +2E | +2F |
| Bx | +30 | +31 | +32 | +33 | +34 | +35 | +36 | +37 | +38 | +39 | +3A | +3B | +3C | +3D | +3E | +3F |
| Cx | [2] | [2] | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| Dx | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| Ex | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| Fx | 4 | 4 | 4 | 4 | 4 | [4] | [4] | [4] | [5] | [5] | [5] | [5] | [6] | [6] |
This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.
¢ = c2 a2
| 0xC2 Controls and Latin-1 Supplement | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| U+008x | XXX | XXX | BPH | NBH | IND | NEL | SSA | ESA | HTS | HTJ | VTS | PLD | PLU | RI | SS2 | SS3 |
| U+009x | DCS | PU1 | PU2 | STS | CCH | MW | SPA | EPA | SOS | XXX | SCI | CSI | ST | OSC | PM | APC |
| U+00Ax | NBSP | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | ª | « | ¬ | SHY | ® | ¯ |
| U+00Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
| U+00Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
| U+00Dx | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
| U+00Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
| U+00Fx | ð | ñ | ò | ó | ô | õ | ö | ÷ | ø | ù | ú | û | ü | ý | þ | ÿ |
λ : ce bb
| Greek and Coptic | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| U+037x | Ͱ | ͱ | Ͳ | ͳ | ʹ | ͵ | Ͷ | ͷ | ͺ | ͻ | ͼ | ͽ | ; | Ϳ | ||
| U+038x | ΄ | ΅ | Ά | · | Έ | Ή | Ί | Ό | Ύ | Ώ | ||||||
| U+039x | ΐ | Α | Β | Γ | Δ | Ε | Ζ | Η | Θ | Ι | Κ | Λ | Μ | Ν | Ξ | Ο |
| U+03Ax | Π | Ρ | Σ | Τ | Υ | Φ | Χ | Ψ | Ω | Ϊ | Ϋ | ά | έ | ή | ί | |
| U+03Bx | ΰ | α | β | γ | δ | ε | ζ | η | θ | ι | κ | λ | μ | ν | ξ | ο |
| U+03Cx | π | ρ | ς | σ | τ | υ | φ | χ | ψ | ω | ϊ | ϋ | ό | ύ | ώ | Ϗ |
| U+03Dx | ϐ | ϑ | ϒ | ϓ | ϔ | ϕ | ϖ | ϗ | Ϙ | ϙ | Ϛ | ϛ | Ϝ | ϝ | Ϟ | ϟ |
| U+03Ex | Ϡ | ϡ | Ϣ | ϣ | Ϥ | ϥ | Ϧ | ϧ | Ϩ | ϩ | Ϫ | ϫ | Ϭ | ϭ | Ϯ | ϯ |
| U+03Fx | ϰ | ϱ | ϲ | ϳ | ϴ | ϵ | ϶ | Ϸ | ϸ | Ϲ | Ϻ | ϻ | ϼ | Ͻ | Ͼ | Ͽ |
三個和尚沒水å–
(Chinese Proverb)
Base64 is designed to carry data stored in binary formats across channels that only support text content.
Base64 is a binary-to-text encoding scheme that represents arbitrary binary data as 6-bit digits.
| Index | Binary | Char | Index | Binary | Char | Index | Binary | Char | Index | Binary | Char | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 000000 | A |
16 | 010000 | Q |
32 | 100000 | g |
48 | 110000 | w
| |||
| 1 | 000001 | B |
17 | 010001 | R |
33 | 100001 | h |
49 | 110001 | x
| |||
| 2 | 000010 | C |
18 | 010010 | S |
34 | 100010 | i |
50 | 110010 | y
| |||
| 3 | 000011 | D |
19 | 010011 | T |
35 | 100011 | j |
51 | 110011 | z
| |||
| 4 | 000100 | E |
20 | 010100 | U |
36 | 100100 | k |
52 | 110100 | 0
| |||
| 5 | 000101 | F |
21 | 010101 | V |
37 | 100101 | l |
53 | 110101 | 1
| |||
| 6 | 000110 | G |
22 | 010110 | W |
38 | 100110 | m |
54 | 110110 | 2
| |||
| 7 | 000111 | H |
23 | 010111 | X |
39 | 100111 | n |
55 | 110111 | 3
| |||
| 8 | 001000 | I |
24 | 011000 | Y |
40 | 101000 | o |
56 | 111000 | 4
| |||
| 9 | 001001 | J |
25 | 011001 | Z |
41 | 101001 | p |
57 | 111001 | 5
| |||
| 10 | 001010 | K |
26 | 011010 | a |
42 | 101010 | q |
58 | 111010 | 6
| |||
| 11 | 001011 | L |
27 | 011011 | b |
43 | 101011 | r |
59 | 111011 | 7
| |||
| 12 | 001100 | M |
28 | 011100 | c |
44 | 101100 | s |
60 | 111100 | 8
| |||
| 13 | 001101 | N |
29 | 011101 | d |
45 | 101101 | t |
61 | 111101 | 9
| |||
| 14 | 001110 | O |
30 | 011110 | e |
46 | 101110 | u |
62 | 111110 | +
| |||
| 15 | 001111 | P |
31 | 011111 | f |
47 | 101111 | v |
63 | 111111 | /
| |||
| Padding | = | |||||||||||||
Example
Many hands make light work.
When the quote is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME's Base64 scheme as follows:
TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu
- Base64 Encoder, Uxntal
Proquints are identifiers that are readable and pronounceable.
In Pronounceable Identifiers, Daniel S. Wilkerson proposes encoding a 16-bit string as a pronouncable quintuplets of alternating consonants and vowels as follows. Four-bits for consonants, and two-bits for vowels:
0 1 2 3 4 5 6 7 8 9 A B C D E F | 0 1 2 3 b d f g h j k l m n p r s t v z | a i o u
Separate proquints using dashes, which can go un-pronounced or be pronounced "eh".
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
| con | vow | con | vow | con | |||||||||||
Examples
Here are some IP dotted-quads and their corresponding proquints.
127.0.0.1 lusab-babad | 147.67.119.2 natag-lisaf 63.84.220.193 gutih-tugad | 212.58.253.68 tibup-zujah 63.118.7.35 gutuk-bisog | 216.35.68.215 tobog-higil 140.98.193.141 mudof-sakat | 216.68.232.21 todah-vobij 64.255.6.200 haguz-biram | 198.81.129.136 sinid-makam 128.30.52.45 mabiv-gibot | 12.110.110.204 budov-kuras
An ASCII string can also be encrypted in proquints, the sentence Many hands make light work. is represented as follows:
hujod kunun fadom kajov kidug fadot kajor kihob kudon kitom libob litoz lanor funor
It's also possible to transmit low-resolution graphical assets by using one's voice.
Other Implementations
A minimal encoder implementation of proquints in Uxntal:
@emit-proquint ( short* -- ) ( c1 ) DUP2 #0c emit-con ( v1 ) DUP2 #0a emit-vow ( c2 ) DUP2 #06 emit-con ( v2 ) DUP2 #04 emit-vow ( c3 ) #00 ( >> ) @emit-con ( val* sft -- ) SFT2 #000f AND2 ;&lut ADD2 LDA !emit-char &lut "bdfghjklmnprstvz @emit-vow ( val* sft -- ) SFT2 #0003 AND2 ;&lut ADD2 LDA !emit-char &lut "aiou
Another implementation, this time in Thue.
c0000::=~b c0001::=~d c0010::=~f c0011::=~g c0100::=~h c0101::=~j c0110::=~k c0111::=~l c1000::=~m c1001::=~n c1010::=~p c1011::=~r c1100::=~s c1101::=~t c1110::=~v c1111::=~z v00::=~a v01::=~i v10::=~o v11::=~u *-::=~- ::= cvcvc*cvcvc0000110001101110-0110111011001100
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English.
The Soundex code for a word consists of a letter followed by three digits: the letter is the first letter of the name, and the digits encode the consonants. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.
| 1 | B F P V |
| 2 | C G J K Q S X Z |
| 3 | D T |
| 4 | L |
| 5 | M N |
| 6 | R |
Rules
- Keep and capitalize the first letter.
- Skip vowels.
- Convert consonants into numbers(see table).
- Skip duplicates.
- Stop at 3 numbers, pad with zeroes if needed.
Examples
Tymczak -> T522 Soundex -> S532 Example -> E251 Sownteks -> S532 Ekzampul -> E251 Hilbert -> H416 Knuth -> K530 Ellery -> E460 Heilbronn -> H416 Kant -> K530 Ladd -> L300 Wheaton -> W350 Ashcraft -> A226 Burroughs -> B622 Burrows -> B620 Honeyman -> H555 Euler -> E460 Lukasiewicz -> L222 Lissajous -> L222 Robert -> R163 O'Hara -> O600 Jackson -> J250 Gauss -> G200 Ghosh -> G200 PFISTER -> P236 Lloyd -> L300
- Soundex Encoder, Uxntal
MIDI.
See the MIDI table.
midi.c
Play note G with velocity of 64.
cc -std=c89 -Wall midi.c -o midi
#include <linux/soundcard.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
int
error(char* msg, const char* err)
{
printf("Error %s: %s\n", msg, err);
return 0;
}
int
main(void)
{
char* device = "/dev/midi2";
unsigned char g_on[3] = {0x90, 0x43, 0x40};
unsigned char g_off[3] = {0x80, 0x43, 0x00};
int f = open(device, O_WRONLY, 0);
if(f < 0)
return error("Unknown", device);
printf("Note ON\n");
if(!write(f, g_on, sizeof(g_on)))
return error("Note", "ON");
sleep(2);
printf("Note OFF\n");
if(!write(f, g_off, sizeof(g_off)))
return error("Note", "OFF");
close(f);
return 0;
}
| Octave / Note | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| C | 16 | 33 | 65 | 131 | 262 | 523 | 1047 | 2093 | 4186 |
| C♯ | 17 | 35 | 69 | 139 | 277 | 554 | 1109 | 2217 | 4435 |
| D | 18 | 37 | 73 | 147 | 294 | 587 | 1175 | 2349 | 4699 |
| D♯ | 19 | 39 | 78 | 156 | 311 | 622 | 1245 | 2489 | 4978 |
| E | 21 | 41 | 82 | 165 | 330 | 659 | 1319 | 2637 | 5274 |
| F | 22 | 44 | 87 | 175 | 349 | 698 | 1397 | 2794 | 5588 |
| F♯ | 23 | 46 | 93 | 185 | 370 | 740 | 1480 | 2960 | 5920 |
| G | 25 | 49 | 98 | 196 | 392 | 784 | 1568 | 3136 | 6272 |
| G♯ | 26 | 52 | 104 | 208 | 415 | 831 | 1661 | 3322 | 6645 |
| A | 28 | 55 | 110 | 220 | 440 | 880 | 1760 | 3520 | 7040 |
| A♯ | 29 | 58 | 117 | 233 | 466 | 932 | 1865 | 3729 | 7459 |
| B | 31 | 62 | 123 | 247 | 494 | 988 | 1976 | 3951 | 7902 |
MIDI could only describe the tile mosaic world of the keyboardist, not the watercolor world of the violin.
Sixel is a graphics format made of 64 patterns six pixels high and one wide.
An image is encoded by breaking up the bitmap into a series of 6-pixel high strips that is then converted into a single ASCII character, offset by 0x3f so that the first sixel is encoded as ?. This ensures that the sixels remain within the printable character range of the ASCII character set.
| Enter Sixels Mode | DCS | 0x90 | Start sequence |
|---|---|---|---|
| q | 0x71 | End optional parameters | |
| Sixels Body | ! | 0x21 | RLE Encoding |
| $ | 0x24 | Beginning of current line | |
| - | 0x2d | Beginning of next line | |
| ?~ | 0x3x-0x7f | Sixels Tiles | |
| Leave Sixels Mode | ST | 0x9c | Terminate sequence |
RLE Encoding
The ! character, followed by a string of decimal digit characters, preceding any valid sixel-data character, causes that sixel to be repeated the number of times represented by the decimal string. RLE Encoding shouldn't be used for less than 4 repetitions. For example, seven repetitions of the sixel represented by the letter "A" could be transmitted either as AAAAAAA or !7A.
Tiles
The pixels can be read as a binary number, with the top pixel being the least significant bit. Add the value of the pixels together, which gives a number between 0 and 63 inclusive. This is converted to a character code by adding 63, which is the code of the question mark character, ?. The correspondence between each possible combination of six pixels and its sixel character is illustrated below.
Uxntal Implementation
@draw-sixels ( str* -- )
[ LIT2 02 -Screen/auto ] DEO
.Screen/x DEI2 ,&anchor STR2
&w ( -- )
LDAk [ LIT "- ] NEQ ?{
[ LIT2 &anchor $2 ] .Screen/x DEO2
.Screen/y DEI2k #0006 ADD2 ROT DEO2
!& }
LDAk [ LIT "? ] SUB ,&t STR
#0600
&l ( -- )
[ LIT &t $1 ] OVR SFT #01 AND .Screen/pixel DEO
INC GTHk ?&l
POP2
( | advance )
.Screen/x DEI2k INC2 ROT DEO2
.Screen/y DEI2k #0006 SUB2 ROT DEO2
& INC2 LDAk ?&w
POP2 JMP2r
@sample [ "???owYn||~ywo??-?IRJaVNn^NVbJRI $1 ]
- Sixels Viewer
- Sixels Converter, convert icn to sixel
- VT320 Soft Character Sets
Hershey is a textual vector format.
Originally created in 1967, the Hershey Fonts are among the earliest digital representations of type. A Hershey vector font file(.jhf) is a text-file in which each line represents a glyph encoded in five parts:
id[5]: The id of the glyph in decimal.length[3]: The number of points, in decimal.left[1]: The left position of the boundary box.right[1]: The right position of the boundary box.points[?]: A list of points, ending with a linebreak.
A letter is drawn by painting lines between points, a point is made of two ASCII characters representing each a signed value(x, y),
where capital R is 0, Q is -1, S is 1,
and so on. For example, NW is equal to -4,5. Here is an example
file containing 12 glyphs:
1 9MWRMNV RRMVV RPSTS
2 16MWOMOV ROMSMUNUPSQ ROQSQURUUSVOV
3 11MXVNTMRMPNOPOSPURVTVVU
4 12MWOMOV ROMRMTNUPUSTURVOV
5 12MWOMOV ROMUM ROQSQ ROVUV
6 9MVOMOV ROMUM ROQSQ
7 15MXVNTMRMPNOPOSPURVTVVUVR RSRVR
8 9MWOMOV RUMUV ROQUQ
9 3PTRMRV
10 7NUSMSTRVPVOTOS
11 9MWOMOV RUMOS RQQUV
12 6MVOMOV ROVUV
The position " R"(space followed by capital R) is special, it means lifting the pen, the position that follows will not draw a stroke. There is about 89 points of definition on each axis.
| & | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ~ |
| -44 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | +8 | +44 |
- Hershey Renderer, Uxntal
- Hershey Vector Font, Paul Bourke
- Hershey Fonts