Rich HTML editing in the browser: part 1

By Olav Junker Kjær

Introduction

In the very first browser, created by Tim Berners-Lee in 1990, web pages could be edited directly in the browser in WYSIWYG mode. The web was conceived as a read-write medium. Later browsers however, were basically read-only. Only plain text could be entered through form controls.

WYSIWYG editing in the browser returned to the mainstream with Internet Explorer 5: The new designMode property allowed a whole document to become editable by the user. At first the feature was somewhat overlooked, possible because it came among a flurry of equally underspecified, Windows-specific, proprietary extensions to IE.

In recent years the other competing browsers—Mozilla, Safari and Opera—have followed the lead and implemented editing similar to the implementation in IE. The WHATWG-group is working on standardizing the editing system—the designMode and contentEditable DOM properties intoduced in HTML 5. It seems in-browser WYSIWYG editing at last is about to become an integral part of the Web.

This article looks at the basic concepts and challenges involved in utilizing the HTML 5 editing features in recent browsers. The subjects covered are:

The different ways of enabling editing
The editing commands
The HTML generated by editing
The interaction with the DOM

The article is the first part of two. The second part will cover a detailed example of how to implement an editor.

Note: I am only considering the editing features in the latest major browser versions: Opera 9.5, Firefox 2+ and Safari 3, since previous versions are simply too buggy and inconsistent. The implementation in IE hasn’t changed significantly since IE 5.5)

Overview of the editing system

The editing system allows a page or part of a page to become editable. This has several aspects:

A caret indicates the current insertion point. The user can type, delete etc. using the keyboard, and move the caret or selection using keyboard or mouse.
Some browsers provide UI widgets to allow the user to resize and reposition pictures, tables and positioned elements.
A number of standard editing commands are built in—Bold, Italic, InsertLink, Paste, Undo and so on. These can be invoked by shortcut keys, or by script using the command API. It is quite easy to implement an editing toolbar using the command API.
Using the Range and Selection API, you can script any modification of the HTML you want. This can be used to implement custom editing commands.
The editing system allows you the change the HTML. It does not make any assumptions about what you want to do with the modified HTML once you’ve created it. If you for example want to post it back to the server, you have to script it.

There are a couple of caveats with the editing system:

The commands and editing behavior is generally underspecified, and the resulting HTML varies widely between browsers.
The implementation in IE has remained largely unchanged since IE 5.5 in the year 2000. The HTML generated by editing may frighten sensible persons. If you thought you had seen your last tag, you may be in for a surprise!

Enabling editing

There are two ways to create an editable section on a web page—the designMode and contentEditable properties.

A window or frame is turned editable by setting the designMode property on the document object to true. (Caveat: In IE this invalidates the document reference; you have to retrieve a new one from the window object). Typically an edit box is made using an IFrame in designMode.

Any element containing text can be made editable by setting its contentEditable property to true. (contentEditable is not supported in Firefox 2, but it is supported in Firefox 3 and IE, Opera and Safari.

Keyboard editing

Editing using the keyboard and mouse works more or less as you would expect from a simple editor. A caret is shown when the document has focus, and it can be moved around. Typing and deleting characters works predictably. Text selections can be moved, deleted or overwritten.

A very pleasant feature is that all keyboard editing is automatically recorded and undoable. (See later how to invoke the Undo command.)

Complex issues arise however when pressing the Enter/Return key. It is not immediately obvious what HTML should be generated by this, and indeed the generated HTML varies wildly between browsers, and depending on context. If the caret is positioned inside a (non-empty) p element, all browsers will close the current p element, insert a new one (with the same attributes) and position the caret inside it. (Mozilla will additionally insert a (superfluous) br element after the caret.) Example (in these examples the pipe symbol represents the caret):

<p>bla bla|</p>

After pressing Enter/Return in IE or Safari:

<p>bla bla</p>
<p>|</p>

If the caret is positioned at the end of a (non-empty) h1 element, all browsers will close the h1, but IE and Opera will insert a new p element and position the caret inside it. Safari will insert a new h1 element and position the caret inside it. Mozilla will not create any new elements, but will insert two br elements after the caret. For example:

<h1>bla bla|</h1>

After pressing Enter/Return in IE or Opera:

<h1>bla bla</h1>
<p>|</p>

But in Mozilla:

<h1>bla bla</h1>
|<br><br>

And in Safari:

<h1>bla bla</h1>
<h1>|</h1>

If you write text directly in the bodyelement (without other containing elements), and then press Enter/Return, Mozilla will insert a br element. IE and Opera will transform the previous text into a p and insert a new p. Safari will inset a div.

When typing enter inside a div, Safari, Opera and IE will close the current div and insert a new div. Mozilla will insert a br but stay inside the current div.

If there are nested block level elements around the current caret position, all browsers will only close (and replicate) the innermost one. The caret will stay inside the outer blocks.

The bottom line: This is really crappy! Surprisingly IE has the most sensible approach by always guaranteeing sensible block-level elements. Mozilla is particularly bad by using br instead of block-elements, which makes it impossible to style textual content in a sensible way.

Caret positioning

The caret moves in the spaces between characters. It is not visible how the caret is positioned relative to tags. The logic seems to be consistent among browsers though. Relative to block-level elements: the caret is away positioned inside the innermost block-level element. There is no way to position the caret between two paragraphs for example.

For example, look at the following; the pipe symbols indicate possible caret positions:

<p>|P|1|</p><p>|P|2|</p>
<div><p>|P|3|</p><div><p>|P|4|</p></div></div>

Relative to inline-elements, the caret is positioned outside all element boundaries if it is on the left side of the text; if it is on the right side, it is positioned inside element boundaries. For example:

<p>|A|<strong><em>B|</strong></em>C|</p>

So if you type new characters directly left of a range of bold text, the new text will not be bold. If you type directly right of the range, the new text will be bold.

Deletion

If you delete a paragraph-boundary, the result seems to be consistent: The leftmost block “wins”, and the content of the rightmost block is included in the leftmost:

<h1>Overskrift</h1><p>|Text</p>

If delete is pressed, this is the result:

<h1>Overskrift|Text</h1>

Safari, however uses a clever (or horrible, depending on your mood) trick to let the rightmost paragraph content retain its formatting:

<h1>Overskrift|<span class="Apple-style-span" style="font-size: 16px; font-weight: normal; ">Text</span></h1>

Object editing

Browsers support some special editing UI features.

IE allows you the resize images, tables, form controls or absolutely positioned elements by dragging the corners (when the object is selected, drag handles appear).

Mozilla also allows you to resize tables and images, and has some additional controls that allow the user to create new columns and rows. Mozilla additionally allows you to reposition absolutely positioned elements. The UI for these special features is completely proprietary and browsers-specific, and cannot be customized.

Editing commands

The different browsers support a number of editing commands. The HTML generated by the commands is not standardised and differs between browsers. For example, In IE “Bold” is generated like this:

<strong>Hello!</strong>

While Safari generates this:

<span class="Apple-style-span" style="font-weight: bold;">hello!</span>

The generated code is generally, at least in IE, slightly old-fashioned. The dreaded font tag (eg 23) is used for a number of commands, and the generated HTML is not valid XHTML and in some cases not even valid HTML!

Opera’s HTML implementation is close (but not that similar) to IE’s, using elements and so on. Safari generates formatting using s and inline CSS. The advantage of the Safari approach is that the generated HTML can validate as HTML 4.01 Strict.

Mozilla supports two modes—it can either generate presentational elements like IE/Opera or use style-attributes like Safari.

If you are concerned about valid HTML you should probably implement some clean-up filter on the server-side that transform the tag-soup into valid (X)HTML. (You should probably do this anyway, to prevent XSS-attacks).

Keyboard shortcuts

A number of the editing commands are supported directly through shortcuts, eg Ctrl/Cmd + B for bold, Ctrl/Cmd + Z for undo, etc. However these shortcuts vary among different localizations of the browsers.

The shortcut mappings cannot be reconfigured, but they can be overridden in script by intercepting keyboard-events.

The command API

You probably want to implement a toolbar to allow the user to execute the editing commands. This is done using the command API. This API does not look like your typical DOM API, as it is actually a scripting-enabled adaption of the IOleCommandTarget interface, which is the COM interface used in Microsoft applications for synchronizing toolbars to document editing.

The command API sits on the Document object and consists of a method called execCommand, and a bunch of methods starting with “query” which return info about the command.

All methods take a command ID as the first argument, which is a string with the name of the command. The methods are as follows.

ExecCommand

Executes the command on the current selection. Some commands toggle on and off—for example if you execute a bold command on a selection that is already bold, the selection is reverted to normal. Other commands require a value argument, for example forecolor requires a color code.

Some commands provide standard dialog boxes—the link command for example shows a dialog box that asks for the URL. The dialogs cannot be customized in any way, but it is possible to suppress them. For example:

result = document.execCommand(command, useDialog, value)

The different parts of this are as follows:

command: String; the name of the command.
useDialog: Boolean; shows the built-in dialog (not all commands have dialogs).
value: A value for the command to take. Not all commands take values; if a built-in dialog is shown, the value is taken from the dialog.
result: true if the command was executed, false if it was cancelled by the user (by cancelling the dialog) or if the command was not enabled.

If there is no selection (just a caret), text-formatting commands are applied inconsistently across browsers. If the caret is in the middle of a word, IE will apply the formatting to the whole word; other browsers will apply the format to the next character that is typed, unless the caret is moved beforehand.

QueryCommands

The query commands make most sense if you consider how they would be used to query for the state of a toolbar button depending on the document selection.

QueryCommandEnabled

Indicates whether the command may be executed on the current selection. For example, “unlink” is only enabled if the caret or selection is inside a link. If the selection is not in an editable area, all commands are disabled.

QueryCommandState

Indicates if it looks like the command has been executed on the selection, eg if the selection is bold, the state is true for the bold command.

QueryCommandValue

Returns the value for a given command for a selection. This corresponds to the value used in execCommand, eg ForeColor returns the colour code (as a string) for the current selection.

The format is different for different browsers. For example, ForeColor returns a hex colour code in IE (such as #ff0000), while in other browsers it returns an RGB expression, such as Rgb(255,0,0).

Some values even depend on the browser locale, for example the value for FormatBlock, which in IE returns a name for the paragraph in the language of the browser UI.

Commands like bold that don’t have a value just return false. (The API contains two additional methods, queryCommandSupported and queryCommandIndeterminate, but they are too unreliably implemented to be of any use.)

Range and Selection API

The built-in commands are useful to a certain extent, but there is no way to modify their behavior or provide custom implementation. Using the Range and Selection API, you can implement arbitrary HTML transformations, which you can use to simulate custom commands.

A caveat is that any transformation of the document using the DOM destroys the undo-stack that is used by the Undo/Redo-commands. This is not very user friendly, but may be an acceptable trade-off for making custom command available, depending on how your page is set up.

The range/selection API has two core classes:

Range—a continuous range of characters in a document. Ranges may overlap element boundaries. A range has a start point and an end point. If the start point equals the end point, the range is said to be collapsed.
Selection—represents the current user selection in the document. A selection contains a single range, which is highlighted. If the selection range is collapsed, it is displayed as a caret.

(Range and selection can be used outside of editable areas. You can create a selection in a read-only document. A selection in a read-only document cannot be collapsed though, since read-only text doesn’t show a caret.)

These concepts are similar in all browsers, but the concrete API is different in IE to how it is in the other browsers. IE uses its own proprietary range and selection API, while the other browsers uses the W3C DOM Range API combined with an unstandardized selection API.

A major difference is that in IE the content of a range is accessed in the form of a string with HTML markup. In the W3C DOM Range API, the content is accessed as a DOM node tree.

Range example

To show the different approaches, here is a command that applies the “code” inline element to the current selection.

In IE (editWindow is a reference to the frame that is in designMode):

var rng = editWindow.document.selection.createRange();
rng.pasteHTML("" + rng.htmlText + "");

In Mozilla:

var rng = editWindow.getSelection().getRangeAt(0);
rng.surroundContents(document.createElement("code"));

Control selection

IE supports the concept of control selection, which is different to an ordinary range selection. A control selection happens when you click on an object like an image, a form control, or the border of a table.

It is possible to select more than one control at a time in IE by Ctrl-clicking. Other browsers do not have a concept corresponding to control-selection; in those browsers a selection is always a text range.

Summary

This article has looked into the basic concept behind browser-based editing. Part two will feature a wealth of examples to show you how to implement web page editing systems using these APIs.

This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.