分享到新浪微博 分享到QQ空间 打印

[公告] Google 确认 Chrome 即将支持 GPU 硬件加速

Google 确认 Chrome 即将支持 GPU 硬件加速

Chromium Blog 的最新文章确认了 Chrome 即将支持 GPU 硬件加速,未来开发者可以将复杂页面的渲染工作交给强大的电脑显卡来进行。这其实已经是一个现代浏览器的趋势,Microsoft 的 IE9 已经确认将支持 GPU 硬件加速,而 Firefox 最新的 4.0 Beta for Windows 已经启用了这个功能的支持。



最近的 Chromium 已经装备了 GPU 硬件加速功能,需要用 --enable-accelerated-compositing 这个命令行参数来启用。之后你可能没有什么太直观的感受,除了一些复杂页面的载入时间会有所改善。
如果你对 Chrome 中的硬件加速实现比较感兴趣的话,可以看看 Google 刚刚发布的相关设计文档


via rww, chromiumblog谷奥

http://www.cnbeta.com/articles/120570.htm
附件: 您所在的用户组无法下载或查看附件
谁不怀念苏联,谁就没有良心;谁想回到苏联,谁就没有头脑.

Woodu.ME--从零开始的博客生活

TOP

GPU Accelerated Compositing in Chrome

Vangelis Kokkevis

Summary
            This document provides background and details on the implementation of hardware accelerated compositing in Chrome.

Introduction

Traditionally, web browsers relied entirely on the CPU to render web page content. With capable GPUs becoming an integral part of even the smallest of devices and with rich media such as video and 3D graphics playing an increasingly important role to the web experience, attention has turned on finding ways to make more effective utilization of the underlying hardware to achieve better performance and power savings.  There's clear indication that getting the GPU directly involved with compositing the contents of a web page can result in very significant speedups. The largest gains are to be had from eliminating unecessary (and very slow) copies of large data, especially copies from video memory to system memory. The most obvious candidates for such optimizations are the <video> element and the WebGL canvas, both of which can generate their results in areas of memory that that CPU doesn't have fast access to.

Delegating compositing of the page layers to the GPU provides other benefits as well.  In most cases, the GPU can achieve far better efficiency than the CPU (both in terms of speed and power draw) in drawing and compositing operations that involve large numbers of pixels as the hardware is designed specifically for these types of workloads.

WebKit Rendering Basics

The source code for the WebKit rendering engine is vast and complex (and somewhat scarcely documented!). For the purposes of this document I've extracted some of the the important basic blocks.
Nodes and the DOM tree
In WebKit, the contents of a web page are internally stored as a tree of Node objects called the DOM tree.  Each HTML element on a page as well as text that occurs between elements is associated with a Node. The top level Node of the DOM tree is always a Document Node.
RenderObjects, the Render Tree and the GraphicsContext
Each node in the DOM tree that produces visual output has a corresponding RenderObject.  RenderObject's are stored in a parallel tree structure, called the Render Tree.  A RenderObject knows how to present (paint) the contents of the Node on a display surface. It does so by issuing the necessary draw calls to the GraphicsContext associated with the page renderer.  The GraphicsContext is ultimately responsible for writing the pixels on the bitmap that gets displayed to the screen.
RenderLayers
Each RenderObject is associated with a RenderLayer either directly or indirectly via an ancestor RenderObject. RenderObjects that share the same coordinate space (e.g. are affected by the same CSS transform) typically belong to the same RenderLayer.  RenderLayers exist so that the elements of the page are composited in the correct order to properly display overlapping content, semi-transparent elements, etc.  There's a number of conditions that will trigger the creation of a new RenderLayer for a particular RenderObject, as defined inRenderBoxModelObject::requiresLayer() and overwritten for some derived classes. In general a RenderObject warrants the creation of a RenderLayer if:
It's the root object for the page
It has explicit CSS position properties (relative, absolute or a transform)
It is transparent
Has overflow, an alpha mask or reflection
Corresponds to <canvas> element that has a 3D (WebGL) context
Corresponds to a <video> element

Notice that there isn't a one-to-one correspondence between RenderObject's and RenderLayer's.  A particular RenderObject is associated either with the RenderLayer that was created for it if there is one or with the RenderLayer of the first ancestor that has one.

RenderLayers form a tree hierarchy as well.  The root node is the RenderLayer corresponding to the root element in the page and the descendants of every node are layers visually contained within the parent layer.  The children of each RenderLayer are kept into two sorted lists both sorted in ascending order, the negZOrderList containing child layers with negative z-indices (and hence layers that go below the current layer) and the posZOrderList containt child layers with positive z-indices (layers that go above the current layer).


The Rendering paths
WebKit fundamentally renders a web page by traversing the RenderLayer hierarchy starting from the root layer.  The WebKit codebase contains two distinct code paths for rendering the contents of a page, the software path and hardware accelerated path.  As the name suggests, the hardware accelerated path is there to make use of GPU acceleration for compositing some of the RenderLayer contents and code for it lives behind the ACCELERATED_COMPOSITING compile time flag. Currently Chrome uses the software path exclusively.  Safari on the Mac (and most likely iOS) follows the hardware accelerated path which makes heavy use of Apple's proprietary CoreAnimation API. It's also worth noting that 3D CSS transforms are only available with the hardware accelerated path as a pure software implementation would be prohibitively slow.
The Software Path
In the software path, the page is rendered by sequentially painting all the RenderLayers, from back to front, directly into a single destination bitmap.  The RenderLayer hierarchy is traversed recursively starting from the root and the bulk of the work is done in RenderLayer::paintLayer() which performs the following basic steps (the list of steps is simplified here for clarity):
Determines whether the layer intersects the damage rect for an early out.
Recursively paints the layers below this one by calling paintLayer() for the layers in the negZOrderList.
Asks RenderObjects associated with this RenderLayer to paint themselves. This is done by recursing down the RenderTree starting with the RenderObject which created the layer.  Traversal stops whenever a RenderObject associated with a different RenderLayer is found.
Recursively paints the layers above this one by calling paintLayer() for the layers in the posZOrderList.
RenderObjects paint themselves into the destination bitmap by issuing draw calls into the shared GraphicsContext (implemented in Chrome via Skia for win/linux).  Note that the GraphicsContext itself has no concept of layers with the exception of the case where a layer is semi-transparent.  In that case the RenderLayer calls GraphicsContext::beginTransparencyLayer() before asking the RenderObjects to draw. In the Skia implementation, the call to beginTransparencyLayer() causes all subsequent draw calls to render in a separate bitmap, which gets composited with the original one when the layer drawing is complete and a matching call to endTransparencyLayer() is made by the GraphicsLayer.

The Hardware Accelerated Path
The difference between the hardware accelerated path and the software path is that, when hardware acceleration is enabled, some (but not all) of the RenderLayer's get their own backing surface (compositing layer) into which they paint instead of drawing directly into the common bitmap for the page. A subsequent compositing pass composites all the backing surfaces onto the destination bitmap. The compositor is responsible for applying the necessary transformations (as specified by the layer's CSS transform properties) to each bitmap before compositing it. Since painting of the layers is decoupled from compositing, invalidating one of these layers only results in repainting the contents of that layer alone and recompositing.  In contrast, with the software path, invalidating any layer requires repainting all layers (at least the overlapping portions of them) below and above it which unnecessarily taxes the CPU.

While in theory every single RenderLayer could paint itself into a separate backing surface to avoid unnecessary repaints, in practice this could be quite wasteful in terms of memory (vram especially). In the current WebKit implementation, one of the following conditions must to be met for a RenderLayer to get its own compositing layer (see RenderLayerCompositor::requiresCompositingLayer()):
Layer has 3D or perspective transform CSS properties
Layer is used by <video> element using accelerated video decoding
Layer is used by a <canvas> element with a 3D context
Layer uses a CSS animation for its opacity or uses an animated webkit transform
Layer has a descendant that has a compositing layer
Layer has a sibling with a lower z-index which has a compositing layer (in other words the layer is rendered on top of a composited layer)

Two significant implications of WebKit's implementation of accelerated compositing are:
Even with hardware acceleration enabled, pages that don't contain a <video> or WebGL elements and don't make use of 3D CSS transformations/animation use the software path.
Pages with composited RenderLayer's will always render via the compositor.

H/W Accelerated Compositing

Code related to the compositor lives inside WebCore, behind the USE(ACCELERATED_COMPOSITING) guards. Part of the code is shared among all platforms and part of it is Chrome-specific. Thankfully, the WebKit code is structured such that implementing the compositor for Chrome required no changes to the core WebKit codebase and all the Chrome-specific code are provided in platform-specific source files that live in platform/graphics/chromium much the same way we've done with the GraphicsContext and GraphicsContextSkia.
With the addition of the accelerated compositor, in order to eliminate costly memory transfers, the final rendering of the browser's tab area is handled directly by the GPU. This is a significant departure from the current model in which the Renderer process passes (via IPC and shared memory) over a bitmap with the page's contents to the Browser process for display:



Software Rendering Architecture

With the current un-accelerated implementation, compositing of the RenderLayer's takes place in the WebKit code (via Skia or CG) and runs on the CPU.  In the h/w accelerated architecture, compositing of the h/w accelerated layers with the rest of the page contents happens on the GPU via calls to the platform specific 3D APIs (GL / D3D). The code ultimately responsible for making these calls is encapsulated in a library running inside the Renderer process, the Compositor.  The Compositor library is essentially using the GPU to composite rectangular areas of the page into a single bitmap.
The GPU Process

Restricted by its sandbox, the Renderer process (where WebKit and the compositor live) cannot directly issue calls to the 3D APIs provided by the OS (GL/D3D). For that reason we use a separate process to do the rendering. We call this process the GPU Process.  The GPU process is specifically designed to provide access to the system's 3D APIs from within the Renderer sandbox or the even more restrictive Native Client "jail". It works via a client-server model with the client being the code running in the restricted environment and the server the code that actually makes the calls into the graphics APIs which works is as follows:
The client (code running in the Renderer or within a NaCl module), instead of issuing calls directly to the system APIs, serializes them and puts them in a ring buffer (Command Buffer) residing in memory shared between itself and the server process.
The server (GPU process running in a less restrictive sandbox that allows access to the platform's 3D APIs) picks up the serialized commands from shared memory, parses them and executes the appropriate graphics calls, outputting directly to a window.


The GPU Process
The commands accepted by the GPU process are patterned closely after the GL ES 2.0 API (for example there's a command corresponding to glClear, one to glDrawArrays, etc). Since most GL calls don't have return values, the client and server can work mostly asynchronously which keeps the performance overhead fairly low.  Any necessary synchronization between the client and the server, such as the client notifying the server that there's additional work to be done, is handled via an IPC mechanism.  It's also worth noting that in addition to providing storage for the command buffer, shared memory is used for passing larger resources such as bitmaps for textures, vertex arrays, etc between the client and the server.  From the client's perspective, an application has the option to either write commands directly into the command buffer or use the GL ES 2.0 API via a client side library that we provide which handles the serialization behind the scenes. Both the compositor and WebGL currently use the GL ES client side library for convenience. On the server side, commands received via the command buffer are converted to calls into either desktop GL (on mac and linux) or D3D (on windows) via ANGLE .

Currently Chrome uses a single GPU process per browser instance, serving requests from all the renderer processes and any plugin processes. The GPU process, while single threaded, can multiplex between multiple command buffers, each one of which is associated with its own rendering context.

The GPU process architecture offers several benefits including:
Security: The bulk of the rendering logic remains in the sandboxed Renderer process.
Robustness: A GPU process crash (e.g. due to faulty drivers) doesn't bring down the browser.
Uniformity: Standardizing on OpenGL ES 2.0 as the rendering API for the browser regardless of the platform allows for a single, easier to maintain codebase across all OS ports of Chrome.
The Compositor

The code
The bulk of the chromium implementation for the compositor lives in WebCore's platform/graphics/chromium directory. The compositing logic is mostly in LayerRendererChromim.cpp and the implementations of the various composited layer types are in {Content|Video|Image}LayerChromium.cpp files. The compositor is implemented on top of the GL ES 2.0 client library which proxies the graphics calls to the GPU process.

When a page renders via the compositor, all its pixels are drawn directly onto the window via the GPU process. The compositor maintains a hierarchy of GraphicsLayers which is constructed by traversing the RenderLayer tree and updated as the page changes. With the exception of WebGL and video layers, the contents of each of the layer are first drawn into a system memory bitmap and then uploaded to a texture. The compositor keeps track of which layers have changed since the last time they were drawn and only updates the textures as needed. Rendering the contents of a page is simply a matter of doing a depth first traversal of the GraphicsLayer hierarchy and drawing a texture quad for each layer with the appropriate transformation.  




Compositing with the GPU process


At a minimum, when doing h/w accelerated compositing, the GPU process handles a single graphics context, the one used by the compositor.  However, in the presence of GPU-accelerated content in the page (such as WebGL or Pepper3D plugin instances), the GPU process needs to be able to juggle mutliple graphics contexts, each associated with its own Command Buffer, Shared Memory, IPC channel and a GL context.  The way composition of GraphicsLayers whose contents are created directly on GPU works is that instead of them rendering straight into the backbuffer, they render into a texture (using a Frame Buffer Object) that the compositor context grabs and uses when rendering the layer.  It's important to note that in order for the compositor's GL context to have access to a texture generated by an offscreen GL context, all GL contexts used by the GPU process are created such that they share resources. The resulting architecture looks like:



Handling mutliple contexts

The flags
Use the --enable-accelerated-compositing command line flag to enable the compositor on any of the three platforms and head to a page like this one or this one to see it in action. If you are curious about the structure of the composited layers, use the --show-composited-layer-borders flag.

As mentioned earlier, accelerated compositing in WebKit (and Chromium) kicks in only if certain types of content appear on the page. An easy trick to force a page to switch over to the compositor is to supply a -webkit-transform:translateZ(0) for an element in the page.
谁不怀念苏联,谁就没有良心;谁想回到苏联,谁就没有头脑.

Woodu.ME--从零开始的博客生活

TOP

Fx好像用这个来显示点阵了还是啥来者的...
饮鸩止渴

TOP