Supplementary: Common HTML entities used for typography

By Ben Henick

11th October 2012: Material moved to webplatform.org

The Opera web standards curriculum has now been moved to the docs section of the W3C webplatform.org site. Go there to find updated versions of these docs, and much more besides!

12th April 2012: This article is obsolete

The web standards curriculum has been donated to the W3C web education community group, to become part of a much bigger educational resource. It is constantly being updated so that it remains current with modern web design practices and technologies. To find the most up-to-date web standards curriculum, visit the web education community group Wiki. Please make changes to this Wiki yourself, or suggest changes to Chris Mills, who is also the chair of the web education community group.

Introduction

There are a number of HTML entities that come in handy when there’s a need for first-rate typesetting. Many of those listed in Table 1 are useful only when used in foreign language copy (and copy written in specific dialects of English), so context should be taken into account before the choice is made to use them.

For the sake of portability, Unicode entity references should be reserved for use in documents certain to be written in the UTF-8 or UTF-16 character sets. In all other cases, the alphanumeric references should be used.

Character(s) Literal(s) Alphanumeric value(s) Unicode value(s) Prefer to
Cent (currency) ¢ ¢ ¢  
Pound (currency) £ £ £  
Section 1 § § §  
Copyright © © © (c)
Guillemets 2 « » « » « » "
Registered trademark ® ® ® (R)
Degree(s) ° ° °  
Plus/minus ± ± ± +/-
Pilcrow (paragraph) 3 ¶ ¶  
Middle dot 4 · · ·  
Fractional half 5 ½ ½ ¼ 1/2
En dash 6, 7 – – - for ranges
Em (long) dash 7, 8 — — - enclosed by spaces, or --
Single quotes 9, 10 ‘ ’ ‘ ’ ‘ ’ ' or '
Single low quote 11 ‚ ‚ ' or comma
Double quotes 9 “ ” “ ” “ ” ", ", '', or ``
Double low quote 11 „ „ " or ,,
Single & double daggers † ‡ † ‡ † ‡ * and **
Bullet • • *
Ellipsis 12 … … ...
Prime & double prime 13 ′ ″ ′ ″ ′ ″ ', '', ', ", minutes:seconds elapsed
Euro sign € €  
Trademark ™ ™ (tm)
Almost equal to ≈ ≈ ~
Not equal to ≠ ≠ !=
Less/greater than or equal to ≤ ≥ &le; &ge; &#8804; &#8805; <= or >=
Less/greater than < > &lt; &gt; &#062; &#060;

Table 1: HTML entities useful for proper typesetting, listed in order by decimal Unicode position.

Note that guillemets are used for quotes in certain European languages (such as French and Norsk); in these situations, you should always use q elements instead.

HTML entity usage notes

  1. Citations of statute law, eg, “29 USC § 794 (d),” are the matter most likely to reference this character.
  2. Guillemets often enclose the names of stories, songs, films, public accommodations (eg, «Rick’s Café Americain»), and popular toponyms in European languages, particularly those of the Romance sub-family. They are also used for quotes in certain European languages (such as French and Norsk); in these situations, you should always use q elements instead.
  3. The pilcrow, used to mark the beginning of paragraphs that might otherwise be ambiguous, is useful when setting teaser copy. The print distribution of Rolling Stone magazine has often used such an approach. In technical writing, it might also be useful for marking an orphaned first line of a paragraph. ¶ Paragraphs marked with this symbol will most often be assigned a display value of inline, which will be explained in the introduction to the CSS layout model.
  4. The middle dot is an anachronistic analogue to the decimal point, still used by some designers to enumerate amounts of decimalized currency.
  5. HTML also provides references to the code positions for one-quarter and three-quarters fractions.
  6. The en dash is used between two quantities or dates to suggest a range, and is indistinguishable from a proper minus sign (&minus;/&#8722;). However, it should always be distinguished from a hyphen (&#45;), which is used to separate the parts of an ad hoc compound word.
  7. Browsers create soft linebreaks after hyphens (see above), but not after en dashes or em dashes.
  8. The exclusive use of the em dash in English is to mark one or both ends of a dependent clause in lieu of parentheses, and to indicate that if spoken aloud the clause should be preceded and followed by uninflected pauses. In several other languages — particularly those of the Slavic sub-family — em dashes indicate dialogue from the beginning of a paragraph. Tradition dictates that this character not be enclosed itself by spaces, but the thoughtful user of markup may wish to do just that in order to avoid an especially ragged line.
  9. These are the members of the automated “Smart Quotes” set of characters incorporated into most popular word processing platforms. They are often encoded at vendor-specific code positions rather than Unicode or ISO Latin code positions, which can cause problems when they are copied into a Web document.
  10. The single close quote character is also used in English as the apostrophe.
  11. Low quotes are used in several Central and Eastern European langauges in preference to the analogous English opening quote characters.
  12. Since the ellipsis is a single character, the tracking of its constituent glyphs will not be affected by any value set for the letter-spacing or text-align properties.
  13. Primes are used to denote minutes (of both time elapsed and arc) and feet as units of measurement; the double prime in its turn denotes seconds and inches. The use of these characters in relation to units of time elapsed has decreased in popularity in recent years, a decrease that correlates strongly with the increased availability of word processing systems (and their common use by non-specialist operators). Many fonts use prime and double prime characters indistinguishable from single and double close quotes, but for reasons of portability these entities should still be used when called for, notwithstanding the characteristics of the intended display face.

About the author

Picture of the article author Ben Henick

Ben Henick has been building Web sites in one capacity or another since September 1995, when he took on his first Web project as an academic volunteer. Since then, most of his work has been done on a freelance basis.

Ben is a generalist; his skillset touches on nearly every aspect of site design and development, from CSS and HTML, to design and copywriting, to PHP/MySQL and JavaScript/Ajax.

He lives in Lawrence, Kansas, with three computers and zero television sets. You can read more about him and his work at henick.net.

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.

Comments

The forum archive of this article is still available on My Opera.

No new comments accepted.