Architecture notes

            --------------------
 Front-end  |  Display and keyboard
            |---------------
            |  Browser engine (runs terminal.js)
            |---------------
            |  Communication stub
            ---------------
              ^
              | Optional network
              V
            --------------------
  Back-end  |  Communication stub
            |---------------
            |  Application
            ---------------

The DomTerm architecture allows for multiple front-end implementations and multiple back-end implementations. The front-end runs the actual terminal emulator (written in JavaScript) and manages the display. The front-end can be a window or tab in a general-purpose browser like Firefox or Chrome, or it can be a special-purpose browser optimized for DomTerm. The latter would drop the URL bar, add a menu bar and other controls as more suitable to a terminal emulator, and tweak a few minor security restrictions.

The back-end runs the actual application. The application can be a general-purpose shell, or a custom application, such as a programming language run-time or a chat server. The back-end can run the application under a PTY. Alternatively, the application can communicate using pipes, if you prefer to avoid PTYs or they are unavailable (as on Windows).

The front-end and back-end can be combined in the same process, using an embeddable browser. he current sample applications include a single-process terminal emulator that uses the JavaFX WebEngine, a JavaFX pop-up menu, and a PTY class. In this case the “communication stub” is WebEngine’s bridge between JavaScript and Java, plus communicating with the PTY. A C/C++ application could similarly use QtWebEngine and its C++/JavaScript bridge.

If the front-end and back-end are separate processes, they can communicate using a byte-stream protocol. Currently we stick to well-formed UTF8, because JavaScript’s support for byte arrays is still weak. The protocol is based on the xterm protocol: text and escape sequences from the application to the front-end; keystrokes and encoded events sent the other way. More complex data can be encoded using JSON. Most of protocol is handled by terminal.js. The communications stubs may generate or intercept some messages: For example a PTY stub will want to handle window size change events.

These data streams can be layered on top of other protocols, such as telnet, ssh, or WebSocket. Using WebSocket is convenient because it is built in to modern browsers, so the entire front-end (except terminal.js) is readily available.

A roadmap

This is my vision of how terminals should be done in 2016.

The “terminal emulator” application that a user runs (for example by clicking an icon or from an existing command line) should be a small program that parses command-line arguments, then fires up (if necessary) a front-end process and a back-end process, and connects the two. This application would have an option to connect to an existing back-end session, supporting the functionality of GNU Screen.

The default backend would be a small WebSocket server that forks off a user process, by default using a PTY (when available), otherwise using pipes. It would be helpful if the back-end can also serve http/https, for serving the initial html page and JavaScript (unless those could be builtin to the front-end). A builtin http server could also support images and other non-textual data, if you don’t want to include it directly in the output, possibly using a data URI. The libwebsockets library seems a possible base for a WebSockets+http server.

More blue-sky, it would be nice if the terminal subsystem and/or the C library could have a special mode for PTYs running under DomTerm. For example, the terminal driver should not do canonical-mode input-line cooking, but should delegate that to DomTerm. (DomTerm approximates this by monitoring the terminal canon flag, but a more robust protocol would be better.) Type-ahead could also work better. It would also be nice to delimit output to standard error with the appropriate escape sequences.

WebSocket server

The included server uses Java-WebServer, which is very compact and light-weight. The java_websocket.jar is checked in for convenience (though that may change).

Each connection to the server creates a new process, but using the same command and args. (Multiple connections using --process will fail for some unknown reason.)

(An older WebSocket server uses libraries from the Tyrus project. These libraries are much bigger, but this implementation could be suitable for a JavaEE environment as it follows JSR-356.)

If using PTYs, which requires native code anyway, it may be better to use a server written in C or C++, such as libwebsockets.

Line vs character input modes

In line input mode we can end up with double echoing: As you edit the input line, it is displayed. Then when the line is sent, the slave will normally echo the input.

Ideally you’d want to integrate with the kernel terminal sub-system, to suppress echoing. In lieu of that, line editing mode could delete the input line from the DOM before sending them to the inferior. To avoid annoying flashing, this is done lazily: DomTerm waits to remove the input line until it gets some output from the inferior (usually the echo).

In addition to "char mode" and "line mode" (like the Emacs term mode) there is an "auto mode" which watches the states of the inferior pty to automatically switch between them. In autoEditing mode, if we’re currently in char mode, then a key event gets sent to the pty layer. If the pty is in non-canonical mode, the key event is sent to the server. If the pty is in canonical mode, then a message is sent back to the front-end, which switches to line mode, and processes the event.

Line structure

"Line" here refer to "visual line": A section of the DOM that should be treated as a line for cursor movement. Line breaks may come from the back-end, or be inserted by the line break algorithm.

The lineStarts array maps from a line number to the DOM location of the start of the corresponding line.

The lineEnds array maps to the end of each line. Always points to a span node with the line attribute set. Normally lineEnds[i] == lineStarts[i+1]; however, sometimes lineStarts[i] is the start of a <div> or other block element.

Colors and high-lighting

This needs updating.

Escape sequences (for example "\e[4m" - "underlined", or "\e[32m" - "set foreground color to green") are translated to <span> elements with "style" attributes (for example ‘<span style="text-decoration:underline">‘ or ‘<span style="color: green">‘). After creating such a ‘<span>‘ the current position is moved inside it.

If we’ve previously processed "set foreground color to green", and we see a request for "underlined" it is easy to ceate a nested ‘<span>‘ for the latter. But what if we then see "set foreground color to red"? We don’t want to nest <span style="color: red">‘ inside <span style="color: green">‘ - that could lead to some deep and ugly nesting. Instead, we move the cursor outside bot existing spans, and then create new spans for red and underlined.

The ‘<span>‘ nodes are created lazily just before characters are inserted, by ‘_adjustStyle‘, which compares the current active styles with the desired ones (set by ‘_pushStyle‘).

A possibly better approach would be to match each highlight style into a ‘class‘ attribute (for example ‘green-foreground-style‘ and ‘underlined-style‘). A default stylesheet can map each style class to the correspoding CSS rules. This has the advantage that one could override the highlighting appearance with a custom style sheet.

Line-breaking / pretty-printing

For a terminal emulator we need to preserve (not collapse) whitespace, and (usually) we want to line-break in the middle of a word.

These CSS properties come close:

white-space: pre-wrap; word-break: break-all

This is simple and fast. However:

  • It doesn’t help in inserting a visual indicator, like Emacs’s arrow, to indicate when a line was broken.

  • It doesn’t help managing the line table.

  • It doesn’t help with pretty-printing (for example grouping).

  • Chrome (version 52) seems to have from problems with break-all.

Hence we need to do the job ourselves.