XXIIVV

Various data encoding schemes.

ASCII

See the ASCII table.

The TerSCII character set is designed to serve in the world of ternary information processing systems.

With 26 letters in the English alphabet and comparable numbers in other western and middle-eastern alphabets, the first power of 3 that lends itself to representing a reasonable character set is 34 or 81. A 4-trit character allows encoding the Roman alphabet in both upper and lower case, plus 10 digits and a modest (but insufficient) set of control characters and punctuation marks. In this environment, a code extension system comparable to that of Unicode invites a character code built on 81-character blocks.

Basic Roman Block
00 012 345 678
0ESSP09IR_ir
1EL- 1AJSajs
2ET' 2BKTbkt
3LR, 3CLUclu
4OP; 4DMVdmv
5RL: 5ENWenw
6SU. 6FOXfox
7HT! 7GPYgpy
8SD? 8HQZhqz
CodeMeaning
ESEnd of String, analogous to NULL
ELEnd of Line, analogous to LF or CR/LF
ETEnd of Text file
LRLeft to Right rendering of following text
OPOverPrint following text on previous char
RLRight to Left rendering of following text
SUShift Up (superscript) following by 1/3 baseline
HTHorizontal Tab in current rendering direction
SDShift Down (subscript) following by 1/3 baseline
SPSpace
01 012 345 678
0
1 *
2
3 /
4 |
5 \
6
7
8

UTF-8 is capable of encoding all 1,112,064 valid Unicode code points using up to four code bytes.

0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN CAN SUB ESC FS GS RS US
2x SP ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F
9x +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +1A +1B +1C +1D +1E +1F
Ax +20 +21 +22 +23 +24 +25 +26 +27 +28 +29 +2A +2B +2C +2D +2E +2F
Bx +30 +31 +32 +33 +34 +35 +36 +37 +38 +39 +3A +3B +3C +3D +3E +3F
Cx [2] [2] 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Dx 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Ex 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Fx 4 4 4 4 4 [4] [4] [4] [5] [5] [5] [5] [6] [6]

This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

¢ = c2 a2
0xC2 Controls and Latin-1 Supplement
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+008x XXX XXX BPH NBH  IND NEL SSA ESA HTS HTJ VTS PLD PLU  RI   SS2 SS3
U+009x DCS PU1 PU2 STS CCH  MW  SPA EPA SOS XXX SCI  CSI   ST  OSC  PM  APC
U+00Ax NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ SHY ® ¯
U+00Bx ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
U+00Cx À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
U+00Dx Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
U+00Ex à á â ã ä å æ ç è é ê ë ì í î ï
U+00Fx ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
λ : ce bb
Greek and Coptic
0 1 2 3 4 5 6 7 8 9 A B C D E F
U+037x Ͱ ͱ Ͳ ͳ ʹ ͵ Ͷ ͷ ͺ ͻ ͼ ͽ ; Ϳ
U+038x ΄ ΅ Ά · Έ Ή Ί Ό Ύ Ώ
U+039x ΐ Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο
U+03Ax Π Ρ Σ Τ Υ Φ Χ Ψ Ω Ϊ Ϋ ά έ ή ί
U+03Bx ΰ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο
U+03Cx π ρ ς σ τ υ φ χ ψ ω ϊ ϋ ό ύ ώ Ϗ
U+03Dx ϐ ϑ ϒ ϓ ϔ ϕ ϖ ϗ Ϙ ϙ Ϛ ϛ Ϝ ϝ Ϟ ϟ
U+03Ex Ϡ ϡ Ϣ ϣ Ϥ ϥ Ϧ ϧ Ϩ ϩ Ϫ ϫ Ϭ ϭ Ϯ ϯ
U+03Fx ϰ ϱ ϲ ϳ ϴ ϵ ϶ Ϸ ϸ Ϲ Ϻ ϻ ϼ Ͻ Ͼ Ͽ

三個和尚沒水喝
(Chinese Proverb)

Base64 is designed to carry data stored in binary formats across channels that only support text content.

Base64 is a binary-to-text encoding scheme that represents arbitrary binary data as 6-bit digits.

Index Binary Char Index Binary Char Index Binary Char Index Binary Char
0 000000 A 16 010000 Q 32 100000 g 48 110000 w
1 000001 B 17 010001 R 33 100001 h 49 110001 x
2 000010 C 18 010010 S 34 100010 i 50 110010 y
3 000011 D 19 010011 T 35 100011 j 51 110011 z
4 000100 E 20 010100 U 36 100100 k 52 110100 0
5 000101 F 21 010101 V 37 100101 l 53 110101 1
6 000110 G 22 010110 W 38 100110 m 54 110110 2
7 000111 H 23 010111 X 39 100111 n 55 110111 3
8 001000 I 24 011000 Y 40 101000 o 56 111000 4
9 001001 J 25 011001 Z 41 101001 p 57 111001 5
10 001010 K 26 011010 a 42 101010 q 58 111010 6
11 001011 L 27 011011 b 43 101011 r 59 111011 7
12 001100 M 28 011100 c 44 101100 s 60 111100 8
13 001101 N 29 011101 d 45 101101 t 61 111101 9
14 001110 O 30 011110 e 46 101110 u 62 111110 +
15 001111 P 31 011111 f 47 101111 v 63 111111 /
Padding =

Example

Many hands make light work. 

When the quote is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME's Base64 scheme as follows:

TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu

Proquints are identifiers that are readable and pronounceable.

In Pronounceable Identifiers, Daniel S. Wilkerson proposes encoding a 16-bit string as a pronouncable quintuplets of alternating consonants and vowels as follows. Four-bits for consonants, and two-bits for vowels:

0 1 2 3 4 5 6 7 8 9 A B C D E F | 0 1 2 3
b d f g h j k l m n p r s t v z | a i o u

Separate proquints using dashes, which can go un-pronounced or be pronounced "eh".

0123 4567 89ab cdef
con vow con vow con

Examples

Here are some IP dotted-quads and their corresponding proquints.

127.0.0.1       lusab-babad | 147.67.119.2    natag-lisaf
63.84.220.193   gutih-tugad | 212.58.253.68   tibup-zujah
63.118.7.35     gutuk-bisog | 216.35.68.215   tobog-higil
140.98.193.141  mudof-sakat | 216.68.232.21   todah-vobij
64.255.6.200    haguz-biram | 198.81.129.136  sinid-makam
128.30.52.45    mabiv-gibot | 12.110.110.204  budov-kuras

An ASCII string can also be encrypted in proquints, the sentence Many hands make light work. is represented as follows:

hujod kunun fadom kajov kidug fadot kajor kihob
kudon kitom libob litoz lanor funor

It's also possible to transmit low-resolution graphical assets by using one's voice.

Other Implementations

A minimal encoder implementation of proquints in Uxntal:

@emit-proquint ( short* -- )
	( c1 ) DUP2 #0c emit-con
	( v1 ) DUP2 #0a emit-vow
	( c2 ) DUP2 #06 emit-con
	( v2 ) DUP2 #04 emit-vow
	( c3 ) #00 ( >> )

@emit-con ( val* sft -- )
	SFT2 #000f AND2 ;&lut ADD2 LDA !emit-char
	&lut "bdfghjklmnprstvz

@emit-vow ( val* sft -- )
	SFT2 #0003 AND2 ;&lut ADD2 LDA !emit-char
	&lut "aiou

Another implementation, this time in Thue.

c0000::=~b
c0001::=~d
c0010::=~f
c0011::=~g
c0100::=~h
c0101::=~j
c0110::=~k
c0111::=~l
c1000::=~m
c1001::=~n
c1010::=~p
c1011::=~r
c1100::=~s
c1101::=~t
c1110::=~v
c1111::=~z
v00::=~a
v01::=~i
v10::=~o
v11::=~u
*-::=~-
::=
cvcvc*cvcvc0000110001101110-0110111011001100

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English.

The Soundex code for a word consists of a letter followed by three digits: the letter is the first letter of the name, and the digits encode the consonants. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.

1B F P V
2C G J K Q S X Z
3D T
4L
5M N
6R

Rules

Examples

Tymczak     -> T522   Soundex     -> S532
Example     -> E251   Sownteks    -> S532
Ekzampul    -> E251   Hilbert     -> H416
Knuth       -> K530   Ellery      -> E460
Heilbronn   -> H416   Kant        -> K530
Ladd        -> L300   Wheaton     -> W350
Ashcraft    -> A226   Burroughs   -> B622
Burrows     -> B620   Honeyman    -> H555
Euler       -> E460   Lukasiewicz -> L222
Lissajous   -> L222   Robert      -> R163
O'Hara      -> O600   Jackson     -> J250
Gauss       -> G200   Ghosh       -> G200
PFISTER     -> P236   Lloyd       -> L300

MIDI.

See the MIDI table.

midi.c

Play note G with velocity of 64.

cc -std=c89 -Wall midi.c -o midi
#include <linux/soundcard.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

int
error(char* msg, const char* err)
{
  printf("Error %s: %s\n", msg, err);
  return 0;
}

int
main(void)
{
  char* device = "/dev/midi2";
  unsigned char g_on[3] = {0x90, 0x43, 0x40};
  unsigned char g_off[3] = {0x80, 0x43, 0x00};
  int f = open(device, O_WRONLY, 0);
  if(f < 0)
    return error("Unknown", device);
  printf("Note ON\n");
  if(!write(f, g_on, sizeof(g_on)))
    return error("Note", "ON");
  sleep(2);
  printf("Note OFF\n");
  if(!write(f, g_off, sizeof(g_off)))
    return error("Note", "OFF");
  close(f);
  return 0;
}

Octave / Note 0 1 2 3 4 5 6 7 8
C 16 33 65 131 262 523 1047 2093 4186
C♯ 17 35 69 139 277 554 1109 2217 4435
D 18 37 73 147 294 587 1175 2349 4699
D♯ 19 39 78 156 311 622 1245 2489 4978
E 21 41 82 165 330 659 1319 2637 5274
F 22 44 87 175 349 698 1397 2794 5588
F♯ 23 46 93 185 370 740 1480 2960 5920
G 25 49 98 196 392 784 1568 3136 6272
G♯ 26 52 104 208 415 831 1661 3322 6645
A 28 55 110 220 440 880 1760 3520 7040
A♯ 29 58 117 233 466 932 1865 3729 7459
B 31 62 123 247 494 988 1976 3951 7902
MIDI could only describe the tile mosaic world of the keyboardist, not the watercolor world of the violin.

Sixel is a graphics format made of 64 patterns six pixels high and one wide.

An image is encoded by breaking up the bitmap into a series of 6-pixel high strips that is then converted into a single ASCII character, offset by 0x3f so that the first sixel is encoded as ?. This ensures that the sixels remain within the printable character range of the ASCII character set.

Enter Sixels ModeDCS0x90Start sequence
q0x71End optional parameters
Sixels Body!0x21RLE Encoding
$0x24Beginning of current line
-0x2dBeginning of next line
?~0x3x-0x7fSixels Tiles
Leave Sixels ModeST0x9cTerminate sequence

RLE Encoding

The ! character, followed by a string of decimal digit characters, preceding any valid sixel-data character, causes that sixel to be repeated the number of times represented by the decimal string. RLE Encoding shouldn't be used for less than 4 repetitions. For example, seven repetitions of the sixel represented by the letter "A" could be transmitted either as AAAAAAA or !7A.

Tiles

The pixels can be read as a binary number, with the top pixel being the least significant bit. Add the value of the pixels together, which gives a number between 0 and 63 inclusive. This is converted to a character code by adding 63, which is the code of the question mark character, ?. The correspondence between each possible combination of six pixels and its sixel character is illustrated below.

Uxntal Implementation

@draw-sixels ( str* -- )
	[ LIT2 02 -Screen/auto ] DEO
	.Screen/x DEI2 ,&anchor STR2
	&w ( -- )
		LDAk [ LIT "- ] NEQ ?{
			[ LIT2 &anchor $2 ] .Screen/x DEO2
			.Screen/y DEI2k #0006 ADD2 ROT DEO2
			!& }
		LDAk [ LIT "? ] SUB ,&t STR
		#0600
	&l ( -- )
		[ LIT &t $1 ] OVR SFT #01 AND .Screen/pixel DEO
		INC GTHk ?&l
	POP2
	( | advance )
	.Screen/x DEI2k INC2 ROT DEO2
	.Screen/y DEI2k #0006 SUB2 ROT DEO2
	& INC2 LDAk ?&w
	POP2 JMP2r

@sample [ "???owYn||~ywo??-?IRJaVNn^NVbJRI $1 ]

Hershey is a textual vector format.

Originally created in 1967, the Hershey Fonts are among the earliest digital representations of type. A Hershey vector font file(.jhf) is a text-file in which each line represents a glyph encoded in five parts:

A letter is drawn by painting lines between points, a point is made of two ASCII characters representing each a signed value(x, y), where capital R is 0, Q is -1, S is 1, and so on. For example, NW is equal to -4,5. Here is an example file containing 12 glyphs:

    1  9MWRMNV RRMVV RPSTS
    2 16MWOMOV ROMSMUNUPSQ ROQSQURUUSVOV
    3 11MXVNTMRMPNOPOSPURVTVVU
    4 12MWOMOV ROMRMTNUPUSTURVOV
    5 12MWOMOV ROMUM ROQSQ ROVUV
    6  9MVOMOV ROMUM ROQSQ
    7 15MXVNTMRMPNOPOSPURVTVVUVR RSRVR
    8  9MWOMOV RUMUV ROQUQ
    9  3PTRMRV
   10  7NUSMSTRVPVOTOS
   11  9MWOMOV RUMOS RQQUV
   12  6MVOMOV ROVUV

The position " R"(space followed by capital R) is special, it means lifting the pen, the position that follows will not draw a stroke. There is about 89 points of definition on each axis.

& JKLMNOPQ R STUVWXYZ ~
-44 -8-7-6-5-4-3-2-1 0 +1+2+3+4+5+6+7+8 +44