Architecture notes

Some of this is needs to be updated

            --------------------
 Front-end  |  Display and keyboard
            |---------------
            |  Browser engine (runs terminal.js)
            |---------------
            |  Communication stub
            ---------------
              ^
              | Optional network
              V
            --------------------
  Back-end  |  Communication stub
            |---------------
            |  Application
            ---------------

The DomTerm architecture allows for multiple front-end implementations and multiple back-end implementations. The front-end runs the actual terminal emulator (written in JavaScript) and manages the display. The front-end can be a window or tab in a general-purpose browser like Firefox or Chrome, or it can be a special-purpose browser optimized for DomTerm. The latter would drop the URL bar, add a menu bar and other controls as more suitable to a terminal emulator, and tweak a few minor security restrictions.

The back-end runs the actual application. The application can be a general-purpose shell, or a custom application, such as a programming language run-time or a chat server. The back-end can run the application under a PTY. Alternatively, the application can communicate using pipes, if you prefer to avoid PTYs or they are unavailable (as on Windows).

The front-end and back-end can be combined in the same process, using an embeddable browser. he current sample applications include a single-process terminal emulator that uses the JavaFX WebEngine, a JavaFX pop-up menu, and a PTY class. In this case the “communication stub” is WebEngine’s bridge between JavaScript and Java, plus communicating with the PTY. A C/C++ application could similarly use QtWebEngine and its C++/JavaScript bridge.

If the front-end and back-end are separate processes, they can communicate using a byte-stream protocol. Currently we stick to well-formed UTF8, because JavaScript’s support for byte arrays is still weak. The protocol is based on the xterm protocol: text and escape sequences from the application to the front-end; keystrokes and encoded events sent the other way. More complex data can be encoded using JSON. Most of protocol is handled by terminal.js. The communications stubs may generate or intercept some messages: For example a PTY stub will want to handle window size change events.

These data streams can be layered on top of other protocols, such as telnet, ssh, or WebSocket. Using WebSocket is convenient because it is built in to modern browsers, so the entire front-end (except terminal.js) is readily available.

Line vs character input modes

In line input mode we can end up with double echoing: As you edit the input line, it is displayed. Then when the line is sent, the slave will normally echo the input.

Ideally you’d want to integrate with the kernel terminal sub-system, to suppress echoing. In lieu of that, line editing mode could delete the input line from the DOM before sending them to the inferior. To avoid annoying flashing, this is done lazily: DomTerm waits to remove the input line until it gets some output from the inferior (usually the echo).

In addition to "char mode" and "line mode" (like the Emacs term mode) there is an "auto mode" which watches the states of the inferior pty to automatically switch between them. In autoEditing mode, if we’re currently in char mode, then a key event gets sent to the pty layer. If the pty is in non-canonical mode, the key event is sent to the server. If the pty is in canonical mode, then a message is sent back to the front-end, which switches to line mode, and processes the event.

Line structure

"Line" here refer to "visual line": A section of the DOM that should be treated as a line for cursor movement. Line breaks may come from the back-end, or be inserted by the line break algorithm.

The lineStarts array maps from a line number to the DOM location of the start of the corresponding line.

The lineEnds array maps to the end of each line. Always points to a span node with the line attribute set. Normally lineEnds[i] == lineStarts[i+1]; however, sometimes lineStarts[i] is the start of a <div> or other block element.

Colors and high-lighting

This needs updating.

Escape sequences (for example "\e[4m" - "underlined", or "\e[32m" - "set foreground color to green") are translated to <span> elements with "style" attributes (for example ‘<span style="text-decoration:underline">‘ or ‘<span style="color: green">‘). After creating such a ‘<span>‘ the current position is moved inside it.

If we’ve previously processed "set foreground color to green", and we see a request for "underlined" it is easy to ceate a nested ‘<span>‘ for the latter. But what if we then see "set foreground color to red"? We don’t want to nest <span style="color: red">‘ inside <span style="color: green">‘ - that could lead to some deep and ugly nesting. Instead, we move the cursor outside bot existing spans, and then create new spans for red and underlined.

The ‘<span>‘ nodes are created lazily just before characters are inserted, by ‘_adjustStyle‘, which compares the current active styles with the desired ones (set by ‘_pushStyle‘).

A possibly better approach would be to match each highlight style into a ‘class‘ attribute (for example ‘green-foreground-style‘ and ‘underlined-style‘). A default stylesheet can map each style class to the correspoding CSS rules. This has the advantage that one could override the highlighting appearance with a custom style sheet.

Predictive echo

Mosh implements local “tentative echo”, which makes network latency less a problem. DomTerm implements this leveraging the “deferred deletion” mechanism (used for line mode echo).

To do this we use a <span> that contains predicted input: an optional text node, the _caretNode, and an optional text node. The node has 3 additional properties: textBefore, textAfter, and pendingEcho. When output arrives from the server, the function _doDeferredDeletion is called, which replaces the span by the textBefore and textAfter, with the _caretNode in between; this is “real” (confirmed) output, before processing the new output. We also _doDeferredDeletion when unable to do echo predication.

Handling keyboard input is as follows: First, if _deferredForDeletion is null, we set it to a fresh span that wraps the _caretNode. As needed, any text node immediately before or after can be moved into the _deferredForDeletion span, also setting textBefore and textAfter. Then, for a printing character, we insert it before the caret, and append it to pendingEcho. For left or right arrow, delete, or backspace, if possible we adjust the _deferredForDeletion span appropriately, and add a special code to pendingEcho. If not possible, we _doDeferredDeletion, which we also do for other keys.

Calling _doDeferredDeletion just before handling output is correct but suboptimal if the output only contains part of the pending echo. In that case we try to create (after handling output) a new _deferredForDeletion span, whose pendingEcho string is a tail of the previous value. (We only do this if there are no changes to any other (logical) line.)

Line-breaking / pretty-printing

For a terminal emulator we need to preserve (not collapse) whitespace, and (usually) we want to line-break in the middle of a word.

These CSS properties come close:

white-space: pre-wrap; word-break: break-all

This is simple and fast. However:

  • It doesn’t help in inserting a visual indicator, like Emacs’s arrow, to indicate when a line was broken.

  • It doesn’t help managing the line table.

  • It doesn’t help with pretty-printing (for example grouping).

  • Chrome (version 52) seems to have from problems with break-all.

Hence we need to do the job ourselves.