We're accumulating a significant number of bugs related to cross-linking between the content and the CGI not being as relative as we would like. This is an attempt to design a solution for them all in a unified way, rather than solving one bug at the cost of exacerbating another. --smcv

Terminology

  • Absolute: starts with a scheme, like http://example.com/ikiwiki.cgi, https://www.example.org/

  • Protocol-relative: starts with // like //example.com/ikiwiki.cgi

  • Host-relative: starts with / like /ikiwiki.cgi

  • Relative: starts with neither / nor a scheme, like ../ikiwiki.cgi

What we need

  • Static content must be able to link to other static content

  • Static content must be able to link to the CGI

  • CGI-generated content must be able to link to arbitrary static content (it is sufficient for it to be able to link to the "root" of the destdir)

  • CGI-generated content must be able to link to the CGI

Constraints

  • URIs in RSS feeds must be absolute, because feed readers do not have any consistent semantics for the base of relative links

  • If we have a <base href> then HTML 4.01 says it must be absolute, although HTML 5 does relax this by defining semantics for a relative <base href> - it is interpreted relative to the "fallback base URL" which is the URL of the page being viewed (trouble with base in search, preview base url should be absolute)

  • It is currently possible for the static content and the CGI to be on different domains, e.g. www.example.com vs. cgi.example.com; this should be preserved

  • It is currently possible to serve static content "mostly over HTTP" (i.e. advertise a http URI to readers, and use a http URI in RSS feeds etc.) but use HTTPS for the CGI

  • If the static content is served over HTTPS, it must refer to other static content and the CGI via HTTPS (to avoid mixed content, which is a vulnerability); this may be either absolute, protocol-relative, host-relative or relative

  • If the CGI is served over HTTPS, it must refer to static content and the CGI via HTTPS; again, this may be either either absolute, protocol-relative, host-relative or relative (Protocol relative urls for stylesheet linking)

  • Because reverse proxies and w3mmode exist, it must be possible to configure ikiwiki to not believe the HTTPS, etc., CGI variables, and force a particular scheme or host (W3MMode still uses http://localhost?, Using reverse proxy; base URL is http instead of https, Dot CGI pointing to localhost. What happened?)

  • For relative links in page-previews to work correctly without having to have global state or thread state through every use of htmllink etc., cgitemplate needs to make links in the page body work as if we were on the page being previewed.

"Would be nice"

  • In general, the more relative the better

  • schmonz wants to direct all CGI pageviews to https even if the visitor comes from http (but this can be done at the webserver level by making http://example.com/ikiwiki.cgi a redirect to https://example.com/ikiwiki.cgi, so is not necessarily mandatory)

  • smcv has some sites that have non-CA-cartel-approved certificates, with a limited number of editors who can be taught to add SSL policy exceptions and log in via https; anonymous/read-only actions like do=goto should not go via HTTPS, since random readers would get scary SSL warnings (want to avoid ikiwiki using http or https in urls to allow serving both, CGI script and HTTPS)

  • It would be nice if the CGI did not need to use a <base> so that we could use host-relative URI references (/sandbox/) or scheme-relative URI references (//static.example.com/sandbox/) (see trouble with base in search)

As a consequence of the "no mixed content" constraint, I think we can make some assumptions:

  • if the cgiurl is http but the CGI discovers at runtime that it has been reached via https, we can assume that the https equivalent, or a host- or protocol-relative URI reference to itself, would work;

  • if the url is http but the CGI discovers at runtime that it has been reached via https, we can assume that the https equivalent of the url would work

In other words, best-practice would be to list your url and cgiurl in the setup file as http if you intend that they will most commonly be accessed via http (e.g. the "my cert is not CA-cartel approved" use-case), or as https if you intend to force accesses into being via https (the "my wiki is secret" use-case).

Regression test

I've added a regression test in t/relativity.t. We might want to consider dropping some of it or skipping it unless a special environment variable is set once this is all working, since it's a bit slow. --smcv

Remaining bugs

Arguable

  • Configure the url and cgiurl to both be https, then access the CGI via a non-https address. The stylesheet is loaded from the http version of the static site, but maybe it should be forced to https?

  • Configure url = "http://static.example.com/", cgiurl = "http://cgi.example.com/ikiwiki.cgi" and access the CGI via staging.example.net. Self-referential links to the CGI point to cgi.example.com, but maybe they should point to staging.example.net?

  • (possibly incomplete, look for TODO and ??? in relativity.t)