Goal

Currently, the mixed-content restrictions in Firefox and the WebSocket API spec [WS-API] prevent use of insecure ws:// URLs in webapps served over HTTPS. This is a problem, because the protocol used over the WebSocket connection may include a suitable method for ensuring confidentiality and integrity of the data, so the restriction is tighter than that actually required for security.

This document justifies loosening the restriction in browsers, for suitable WebSocket subprotocols.

Use cases

A webapp must be securely served: a user should not ever have to type a password into a page which was served over HTTP insecurely. Arbitrary JavaScript could have been injected to skim the user’s input and activity.

A webapp for SSH or VNC provides a core example: the webapp has to be able to make peer-to-peer connections to other machines, which do not have centrally-issued identities. Most or all of the readers of this document will be users who commonly connect to SSH servers which do not have a globally-registered DNS name (eg named only on a private subnet), and these servers are troublesome to provision with certificates that are recognised by a browser’s CA system. Therefore, using TLS for the connection is not straightforward, let alone the fact that it is redundant because the protocol does its own management of identities and encryption. (A JavaScript client can implement an encrypted protocol over a WebSocket connection through the WebCrypto API.)

Providing a VNC or SSH client in the browser really needs a way to allow access selectively to ws:// connections from the HTTPS-served page.

Is a WebSocket connection ‘passive’ or ‘active’ content?

Browsers have a notion of active and passive content - content which can perform an action to affect multiple elements across a page, or has a purely passive role. A browser executes the contents of resources loaded by <style> or <script> tags, so these are clearly active, but it safely parses and displays a resource specified by an <img> tag. Classifying WebSockets and XHR is difficult, because strictly speaking they’re neither active nor passive, but can be used as either.

For safety, browsers classify XHR as active, because it is overwhelmingly common for a document to be returned and then inserted into the page. Historically, Firefox has seen WebSockets as active, but Chrome has not.

The recommendation of this document is to sidestep this debate by simply refining the notion of ‘mixed-content’ for WebSocket connections to take into account the WebSocket subprotocol. If a connection is deemed not to be mixed, then it doesn’t matter whether it’s active or passive.

The recommendation: don’t treat secure WebSocket subprotocols as mixed content

The WebSocket API currently includes this requirement:

If secure is false but the origin of the entry script has a scheme component that is itself a secure protocol, e.g. HTTPS, then throw a SecurityError exception and abort these steps. HTML, 10.3.2 The WebSocket interface

I suggest altering this to place step 4 above 2 (normalising the protocols argument) and altering step 2 as follows:

If secure is false and protocols is an empty array or includes an insecure protocol, but the origin of the entry script has a scheme component that is itself a secure protocol, e.g. HTTPS, then throw a SecurityError exception and abort these steps.

The notion of ‘insecure protocol’ must be defined. Protocols which include their own encryption and integrity control need not be encapsulated by TLS to match the security of the containing page. Browsers could either:

Extend the WebSocket API to allow a webapp to declare that an unknown subprotocol is ‘secure’. For example, the elements in the protocol array could be taken to be either strings, or an object with string name and bool isSecure attributes.
Simply regard any explicit WebSocket subprotocol as secure. Use of subprotocols is not widespread at all at the moment, and this may be an acceptable starting point. (The mixed-content checks are ultimately advisory and are a second-layer of protection which don’t confer security by themselves, they just break sites that are doing something dodgy. If site admins could be trusted to be virtuous, mixed-content protection wouldn’t be needed.)
Include a configuration parameter with some sensible initial values. Getting support for new protocols would be a big pain though and an annoying barrier to entry.

Finally, the section 10.3.3 "Feedback from the protocol" needs to be updated:

Change the protocol attribute's value to the subprotocol in use, if is not the null value. If it is the null value and secure is false but the origin’s scheme is a secure protocol, then:

Change the readyState attribute’s value to CLOSED (3).

Close the WebSocket connection.

Fire a simple event named error at the WebSocket object.

What about other request types?

The same argument applies to XHR and Events, but these are not as important as getting WebSockets to work.

For example, it’s possible that content served as an XHR response will be properly verified by the requesting page. The webapp’s JavaScript could include a list of content hashes (served securely), then do some non-TLS XHR requests, and check that the result matches one of the hashes before using the response.

These use-cases are unknown (or extremely rare) in the wild. WebSockets is the most general protocol, in that any application of XHR requests or Server-Sent Events could also be done with WebSockets, so it’s not high priority to do something too general which includes XHR and Events as well.

Other possibilities

I was initially looking for some sort of generic solution that works for XHR (and SS-Events), and I wrote a few Firefox patches along different lines. I didn’t post them though because I was pretty unconfident they’d be accepted. The idea of a generic solution is to provide a way for a resource to say that it isn't actually mixed: suppose a non-TLS WebSocket or XHR connection is made. The browser could send an additional header (Access-Control-Request-Protocol) and get back an additional header (Access-Control-Protocol: encrypted). If the response has the right header, then the browser knows that the JavaScript is doing the encryption. The browser can't guarantee it’s encrypted, but we've made enough hurdles here to get the mixed content heuristic working well enough.

(Remember: mixed content detection is a heuristic! It’s there to make it hard for site admins to write bad sites by breaking sites when they try and do silly things, like mixed content. The blocker doesn’t make a site secure though — that's achieved through actually using TLS everywhere, and the admin can do that without any mixed-content blocking from the browser. It’s just there to create a secure ecosystem.)

I had a draft of a spec and an implementation too that looked a bit like CSP (per-origin protection) rather than CORS (per-resource).

Ultimately, those proposals just look too ugly when speced out to be convincing. I think it’s OK to ditch XHR and Events just because they’re a subcase of WebSockets really, so using the WebSockets subprotocols to get the right behaviour is sufficient.