Concurrency and parallelism

Implementation Strategy

Concurrency is the separation of tasks to provide interleaved execution. Parallelism is the simultaneous execution of multiple pieces of work in order to increase speed. Here are some ways that we take advantage of both:

  • Task-based architecture. Major components in the system should be factored into actors with isolated heaps, with clear boundaries for failure and recovery. This will also encourage loose coupling throughout the system, enabling us to replace components for the purposes of experimentation and research.
  • Concurrent rendering. Both rendering and compositing are separate threads, decoupled from layout in order to maintain responsiveness. The compositor thread manages its memory manually to avoid garbage collection pauses.
  • Tiled rendering. We divide the screen into a grid of tiles and render each one in parallel. Tiling is needed for mobile performance regardless of its benefits for parallelism.
  • Layered rendering. We divide the display list into subtrees whose contents can be retained on the GPU and render them in parallel.
  • Selector matching. This is an embarrassingly parallel problem. Unlike Gecko, Servo does selector matching in a separate pass from flow tree construction so that it is more easily parallelized.
  • Parallel layout. We build the flow tree using a parallel traversal of the DOM that respects the sequential dependencies generated by elements such as floats.
  • Text shaping. A crucial part of inline layout, text shaping is fairly costly and has potential for parallelism across text runs. Not implemented.
  • Parsing. We have written a new HTML parser in Rust, focused on both safety and compliance with the specification. We have not yet added speculation or parallelism to the parsing.
  • Image decoding. Decoding multiple images in parallel is straightforward.
  • Decoding of other resources. This is probably less important than image decoding, but anything that needs to be loaded by a page can be done in parallel, e.g. parsing entire style sheets or decoding videos.
  • GC JS concurrent with layout - Under most any design with concurrent JS and layout, JS is going to be waiting to query layout sometimes, perhaps often. This will be the most opportune time to run the GC.

For information on the design of WebXR see the in-tree documentation.

Challenges

  • Parallel-hostile libraries. Some third-party libraries we need don't play well in multi-threaded environments. Fonts in particular have been difficult. Even if libraries are technically thread-safe, often thread safety is achieved through a library-wide mutex lock, harming our opportunities for parallelism.
  • Too many threads. If we throw maximum parallelism and concurrency at everything, we will end up overwhelming the system with too many threads.

JavaScript and DOM bindings

We are currently using SpiderMonkey, although pluggable engines is a long-term, low-priority goal. Each content task gets its own JavaScript runtime. DOM bindings use the native JavaScript engine API instead of XPCOM. Automatic generation of bindings from WebIDL is a priority.

Multi-process architecture

Similar to Chromium and WebKit2, we intend to have a trusted application process and multiple, less trusted engine processes. The high-level API will in fact be IPC-based, likely with non-IPC implementations for testing and single-process use-cases, though it is expected most serious uses would use multiple processes. The engine processes will use the operating system sandboxing facilities to restrict access to system resources.

At this time we do not intend to go to the same extreme sandboxing ends as Chromium does, mostly because locking down a sandbox constitutes a large amount of development work (particularly on low-priority platforms like Windows XP and older Linux) and other aspects of the project are higher priority. Rust's type system also adds a significant layer of defense against memory safety vulnerabilities. This alone does not make a sandbox any less important to defend against unsafe code, bugs in the type system, and third-party/host libraries, but it does reduce the attack surface of Servo significantly relative to other browser engines. Additionally, we have performance-related concerns regarding some sandboxing techniques (for example, proxying all OpenGL calls to a separate process).

I/O and resource management

Web pages depend on a wide variety of external resources, with many mechanisms of retrieval and decoding. These resources are cached at multiple levels—on disk, in memory, and/or in decoded form. In a parallel browser setting, these resources must be distributed among concurrent workers.

Traditionally, browsers have been single-threaded, performing I/O on the "main thread", where most computation also happens. This leads to latency problems. In Servo there is no "main thread" and the loading of all external resources is handled by a single resource manager task.

Browsers have many caches, and Servo's task-based architecture means that it will probably have more than extant browser engines (e.g. we might have both a global task-based cache and a task-local cache that stores results from the global cache to save the round trip through the scheduler). Servo should have a unified caching story, with tunable caches that work well in low-memory environments.