WebGL is a new web standard for browsers which aims to bring 3D graphics to any page on the internet. It has recently been enabled by default in Firefox 4 and Google Chrome, and can be turned on in the latest builds of Safari. Context has an ongoing interest in researching new areas affecting the security landscape, especially when it could have a significant impact on our clients.
We found that:
We found that:
- A number of serious security issues have been identified with the specification and implementations of WebGL(Graphics Library).
- These issues can allow an attacker to provide malicious code via a web browser which allows attacks on the GPU and graphics drivers. These attacks on the GPU via WebGL can render the entire machine unusable.
- Additionally, there are other dangers with WebGL that put users’ data, privacy and security at risk.
- These issues are inherent to the WebGL specification and would require significant architectural changes in order to remediate in the platform design. Fundamentally, WebGL now allows full (Turing Complete) programs from the internet to reach the graphics driver and graphics hardware which operate in what is supposed to be the most protected part of the computer (Kernel Mode).
- Browsers that enable WebGL by default put their users at risk to these issues.
WebGL
Throughout the history of the Web there has been a drive to allow greater interactivity and expressiveness in web content. Starting with the initial forays into scripting, extensive plugin capability and ActiveX through support for HTML5 functionality such as the video or canvas tags, more and more complexity has been provided in the browser by default.
At each stage in the evolution of the modern browser existing security tenets have had to be re-evaluated to ensure new functionality does not open up any serious attack vectors. As an example, before scripting was introduced there was no easy mechanism for a malicious page to gain access to another site’s content; therefore there would be no need for implementing a same-origin policy. Security decisions made during the early days of the browser may no longer be appropriate to modern advancements, especially ones regarding this cross-domain access of content.
While the theft of data is a serious issue the integrity of the browser and the host operating environment should also not be forgotten when introducing new technology. Sometimes the benefits may prove to be more of a curse.
As an example take binary browser-plugin support (e.g. ActiveX or the Netscape Plugin Application Programming Interface). The support makes it very easy for third parties to extend the functionality of the browser and provide callable interfaces for web pages. This in turn opens the attack surface of the browser to potentially a larger corpus of code, some of which is almost certainly badly written. It might then become difficult for a browser vendor to secure the platform, as it might not even be their code which is an issue, leading to such band-aids as IE’s Killbits and Firefox’s plugin checker to block or inform the user an update might be necessary. In the end the only secure way to use such wide ranging native content is probably not to have it in the first place, but the general consensus at the moment is the benefits outweigh the security risk.
This leads to the topic of this blog post, WebGL. If it is something you have yet to hear of you almost certainly will soon. WebGL’s goal is to introduce an OpenGL derived 3D API to the browser, accessible through JavaScript from any Web page which wants to use it. The recently released Mozilla Firefox 4 browser has enabled support by default, as has Google’s Chrome browser since version 9 and Safari (WebKit) 5.
This in itself should not really be controversial, however the way in which it is implemented, coupled with the way current PC and Graphics Processor architectures are designed has led some to question the security of the approach. Context has performed some initial investigations into possible security concerns which seem to be inherent in the specification, leading to questions as to whether it should be currently available on supporting platforms, and if its benefits actually outweigh its risks.
Quick Overview of 3D Graphics Pipeline
First let’s start with a very simplified overview of how 3D graphics is implemented in most modern PC style architectures.
Figure 1 - Simple Diagram of Graphics Pipeline
At the lowest level is the Graphics Processor (GPU) hardware itself, this does not necessarily implement any specific API (almost certainly it is a proprietary interface developed by the manufacturer), however it should at least support all the functionality expected at the programming API level. Almost all modern 3D hardware contains individual programmable units (usually referred to as shaders), these can be individually programmed by the user-mode processes. The native format of the shader code is generally specific to the hardware vendor; however common languages exist to permit cross platform code to be developed.
Above the hardware is a driver which tends to run within kernel mode on the operating system; its job is to handle the low-level hardware aspects and to provide a standardised interface (e.g. WDDM) through which other components of the operating system can access the GPU.
Next is the scheduling, this could be implemented in a number of different locations, for example in the kernel driver itself, by the OS or entirely in user mode. Its responsibility is to share access to the GPU between individual programs running on the same machine. In a more traditional environment this would not be necessary because only one application (for example a windowing manager) would actually need direct access to the GPU at any one time. In a 3D scenario the requirement to directly access the shaders and memory to upload texture and geometry means this must be managed appropriately.
The final piece in the stack is the interface library, which is the main route through which user processes access the graphics library. This is the final level of abstraction, removing where possible any hardware specific functionality. Common interface libraries are Direct3D (which also has some kernel functionality) and the cross-platform OpenGL.
They provide APIs to create the 3D geometry to be displayed, compilers to convert shader programs into a more suitable representation for the GPU and manage the allocation and uploading of texture information to video card memory.
The Trouble with WebGL
Based on the simplified description it is possible to go into what is currently the issue with the way WebGL is specified, designed and implemented. Traditional browser content would not normally have direct access to the hardware in any form, if you drew a bitmap it would be handled by some code in the browser with responsibility for drawing bitmaps.
This would then be likely to delegate that responsibility to an OS component, which would perform the drawing itself. While this distinction is blurring somewhat with the introduction of 2D graphics acceleration in all the popular browsers it is still the case that the actual functionality of the GPU is not directly exposed to a web page. The salient facts are that the content is pretty easy to verify, has a measurable render time relative to the content, and generally contains little programmable functionality (at least which would be exposed to the graphics hardware).
WebGL on the other hand provides, by virtue of its functional requirements, access to the graphics hardware. Shader code, while not written in the native language of the GPU, are compiled, uploaded then executed on the graphics hardware. Render times for medium to complex geometry can be difficult to determine ahead of time from the raw data as it is hard to generate an accurate value without first rendering it; a classic chicken and egg issue. Also some data can be hard to verify and security restrictions can be difficult to enforce once out of the control of the WebGL implementation.
This might not be such an issue, except for the fact that the current hardware and graphics pipeline implementations are not designed to be pre-emptable or maintain security boundaries. Once a display list has been placed on the GPU by the scheduler it can be difficult to stop it, at least without causing obvious, system-wide visual corruption and instabilities.
By carefully crafting content it is possible to seriously impact the OS’s ability to draw the user interface, or worse. The difficultly in verifying all content and maintain security boundaries also have potential impact on the integrity of the system and user data.
Up to now the manufacturers of graphics hardware haven’t really needed to worry about an un-trusted use case for their products. Certainly the issues of integrity and denial of service would be considerations even for native programs, but the developers will generally have a vested interest in making sure that their programs do not cause problems. A malicious actor would need to convince someone to install their bad code, at which point attacking the graphics hardware might be the least of the user’s worries. Graphics drivers are generally not written with security as their main focus, performance is likely to be most critical. Security costs a significant amount in both man-hours and monetary terms, there seems to be little incentive for the manufacturers to harden their products (potentially at the expense of performance) to support the WebGL in its current form.
Even if security issues are identified it is unclear what the patch strategy employed by the large GPU manufacturers would be. Searching Security Focus for either ATI or NVIDIA only produces a few publically disclosed vulnerabilities (dating back to 2008); a Google search for related security bulletins also does not bring up any information. Considering the complexity of the drivers and hardware interactions it seems hard to believe that there has never been an exploitable bug in their software which needed immediate remediation.
Of course the patching situation might not be helped by the typical restrictions on OEM products, especially laptops. Typically in these situations the reference driver provided by the GPU manufacturer is blocked from installing on a laptop, making any security update considerably more difficult to deploy.
During the development of WebGL it seems that all the browser vendors supporting it have encountered issues with certain drivers being unstable or crashing completely. The current work around for this seems to be a driver black list (or in Chrome’s case not running WebGL on Windows XP at all). (See https://wiki.mozilla.org/Blocklisting/Blocked_Graphics_Drivers). This does not seem to be a very tenable approach long term.
Denial of Service
The risk of denial of service is one of the most well known security issues facing WebGL, not least because it is even documented in the current standards documentation (see https://www.khronos.org/registry/webgl/specs/1.0/#4.4). Basically because of the almost direct access the WebGL API has to the graphics hardware it is possible to create shader programs or a set of complex 3D geometry which can cause the hardware to spend a significant proportion of its time rendering. It is easy to trivialise client denial of service attacks when the only affected component is the browser process (there are numerous ways of doing this already), however in this case the attack can completely prevent a user being able to access their computer, making it considerably more serious.
In certain circumstances Context has observed the operating system crashing (i.e. Blue Screen of Death). These crashes can be benign (from an exploitability sense) to ones where the driver code has faulted causing potentially exploitable conditions. No further details of actual exploitable vulnerabilities or the code used to generate them is to be disclosed at this time.
Windows 7 and Vista seem to fair slightly better in this regard, if the GPU locks up for around 2 seconds the OS will force it to be reset. This stops all applications from using any 3D graphics during this period of reset. However these OSes also have a maximum limit to how many times this can happen in a short time window before the kernel will force a bug check (Blue Screen of Death) anyway (see http://msdn.microsoft.com/en-us/windows/hardware/gg487368.aspx).
Of course as it is a known issue there are efforts to mitigate it, for example the ANGLE project (http://code.google.com/p/angleproject/) includes a shader validator to eliminate simple infinite loop cases, which is used in Firefox 4 and Chrome. This validation cannot possibly block all cases leading to denial of service, especially when you can create large geometry and shaders which don’t contain loops but still take substantial amounts of time to execute.
At this point it would seem to be reasonable to provide a proof of concept; however Context did not need to even write one as Khronos provides one in their WebGL SDK. See https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/extra/lots-of-polys-example.html. This page has been found to completely lock the desktop on OSX, reliably crash XP machines and cause GPU resets on Windows 7.
Cross-Domain Image Theft
One of the fundamental security boundaries in the specification of the Document Object Model and browser handling of JavaScript is the domain boundary. This is to prevent content served from say, www.evil.com being able to access authenticated/trusted resources on www.mybanking.com. Whether content is permitted to be accessed across this boundary very much depends on the type of resource being accessed. This is sometimes referred to as “Right to Embed” vs. “Right to Read”.
For example it is perfectly acceptable to embed an image from outside of your domain because the underlying APIs never gave you a mechanism to read the actual content (outside of image dimensions, and an indication of success or failure to load). On the other hand trying to use the XMLHttpRequest object to pull content from outside your domain (and therefore giving you access to the raw data) is generally not permitted.
Before the introduction of the ‘Canvas’ element, which is being standardised in HTML5 there were not many options for stealing the raw data of images cross domain. To combat this, an ‘origin-clean’ flag was implemented. This flag is initially set to true and is set to false if any cross domain image or content is used on the canvas (see http://www.w3.org/TR/html5/the-canvas-element.html#security-with-canvas-elements). Once the ‘origin-clean’ flag is false you can no longer call the APIs such as ‘toDataURL’ to extract the image content.
The WebGL API is built on top of the ‘Canvas’ element and so extends the concept of the flag to also encompass the use of cross-domain textures (see https://www.khronos.org/registry/webgl/specs/1.0/#4.2). This would be the end of it except for one slight issue. As already discussed with regards to denial or service it is possible to cause shading code and geometry drawing to take a non-trivial amount of time. One of the resources which a shader can directly access is the pixel data of textures, which once it reaches the shading code it no longer has any concept of origin. Therefore it is possible to develop a timing attack to extract pixel values even if we cannot read them directly. This can be done by changing how long a shader runs depending on the colour or brightness of a pixel and measuring the time the drawing process takes in JavaScript. This is a standard attack technique in the security field although it is most often used for breaking cryptographic systems. In relation to WebGL it has already been mentioned in a public mailing list that this could be an issue (see http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2011-March/030882.html).
Of course an attacker might not even need to extract the entire pixel data of the image for this to be of use. For example it could be used to compare a cross-domain image to another known image, returning a simple true or false value. As an example imagine a web site which returns a profile picture on a fixed URL, the content determined by the session cookie stored in the browser. An attacker might be able to compare this cross-domain image against a known list of profile pictures to identify when specific a specific person is using the malicious site.
Therefore as part of our investigations into WebGL a proof-of-concept has been developed to demonstrate the attack is practical (if a little slow). To access the PoC go here. It has been tested in Firefox 4 and Chrome 11, on Windows XP, Windows 7 and Mac OSX. It works best in Firefox. It should be noted that Context do not hold any of the data captured on the page, everything is done on the client. For those without a WebGL capable machine or browser there is also a short video here.
Figure 2 - Flow Diagram Showing Stages of Image Capture
This is something which we believe can only be fixed by changing the nature of cross-domain image access in the specification of WebGL. The could be achievable via blocking all cross-domain images or using something like CORS (http://www.w3.org/TR/cors/) to only permit specific image content to be accessed from certain domains.
Conclusions
Based on this limited research Context does not believe WebGL is really ready for mass usage, therefore Context recommends that users and corporate IT managers consider disabling WebGL in their web browsers.
While there is certainly a demand for high-performance 3D content to be made available over the web, the way in which WebGL has been specified insufficiently takes into account the infrastructure required to support it securely. This is evident from the development of ways to mitigate the underlying security issues by introducing validation layers and driver black-lists; however this still pushes much of the responsibility of securing WebGL on the hardware manufacturers. Perhaps the best approach would be to design a specification for 3D graphics from the ground up with these issues in mind.