URL Rewriting

An essential step in the content processing of FIT is URL Rewriting.

FIT’s opaque URL model differs from the transparent URLs that Web proxies and CDNs use in that it distinguishes between frontend URLs and backend URLs.

The frontend URL (also called FIT URL) is the URL the user types or sees in the browser’s URL bar. Its hostname/domain resolves to the FIT Server. The optional site prefix is used to determine the appropriate Project and Site, if the FIT Server hosts multiple sites on one domain.

http://fitserver.example.com/site-prefix/;urlmarks/rest-of-URL?params

The rest of the URL carries the site path, which in turn is passed to the URL map. The URL map translates the site path to an absolute source URL (also called backend URL). The constructed main URL is loaded as the main content of the FIT request.

To avoid ambiguity in URL normalization and cookie exchange, the site path and optional URL Marks must be terminated with a slash character. If this character is missing, a redirect to an unambiguous URL containing the desired slash is carried out with status 301 Moved Permanently.

Given the simple URL map

<urlmap>
  <map path="/wiki/" source="http://en.wikipedia.org/"/>
  <map path="/" source="http://example.org/"/>
</urlmap>

the site path /rest-of-URL matches the fallback rule /. (The URL map does not handle query strings). Thus, the URL is expanded to http://example.org/rest-of-URL?params

Now, when e.g. an HTML document is loaded from the backend system example.org, the URLs in that document should point back to example.org. (They may look relative, but RFC 3986 defines that their normalized representation should start with http://example.org/).

Backend authors should write URLs that reference resources directly rather than pointing to FIT. The modus operandi of FIT is to rewrite URLs referencing the configured sources in its response to the client so that they point to the FIT site. This allows browsing the backend directly as well as through FIT.

Rewriting behaviour

URL Rewriting is performed on any content that is parsed into a DOM (i.e. HTML/XML documents) as well as CSS. FIT uses HTML (e.g. <a href) and CSS (e.g. url()) semantics to find URLs and assign the following roles to them:

Role Meaning
content URLs that point to other documents as main requests, e.g. other HTML pages in <a href
css External CSS documents
js External JavaScript documents
media URLs that point to other assets or subresources, primarily images. The request will go unchanged through pass mode
image URLs that point to images. Use this role to mark custom attributes or elements as images. This makes them go through image scaling if it is enabled. If image scaling is not enabled the URL is treated as role media. This does not enable features like delayed image loading or inlining of images.
is This role is only used when automatic image optimization is activated. The request will go through image scaling
external Link points away from FIT
srcset Treat arbitrary attributes as if they were srcset attributes

Currently FIT recognizes the following HTML attributes on elements and assigns roles:

Attribute Element Role
action form content
cite blockquote, del, ins, q content
data object media
formaction button, input content
href a, area, base content
href link rel="... stylesheet ..." css
href link rel="... icon ..." media
href link rel="alternate" external
href link (other) content
longdesc img content
manifest html content
poster video media or is
src audio, embed, source, track external
src frame, iframe content
src img, input type="image" media or is
src script js
src video media

Rewriting is carried out in two steps. After the DOM content is parsed, the normalizing pass resolves all URLs according to RFC 3986 to their absolute form. This makes it easy to aggregate content from different sources into a single document, where all URLs point to the appropriate resource.

After all user manipulations are done, the second rewriting pass changes all URLs that should lead back to FIT. URLs already pointing to FIT won’t be touched.

FIT URLs start with the protocol, domain/port and, if necessary, the site prefix. The protocol is determined by the protocol of the source URL (http stays http, https stays https). FIT URLs that do not change the protocol will be written relative, unless ai-url-absolute="true".

The absolute source URL must be an allowed source (see ACL), otherwise it will be left pointing away from FIT.

Normalizing behaviour can be affected by the following AI attributes. They affect URLs in any attribute of the current element that is known to carry URLs.

Attribute Meaning
ai-rewrite="false" Disables normalizing and rewriting for the current element
ai-url-attributes="att1 att2" Space-separated list of attributes that contain URLs for rewriting
ai-url-role="content" Role for rewriting URLs
ai-url-base="http://…" Base URL for normalizing. CAUTION: it behaves like xml:base and affects all descendant elements
ai-url-https="true/false" Force FIT URL protocol to be https or http without regard to the backend protocol
ai-url-skip="true/false" Normalize URL but skip rewriting pass (will point away from FIT)
ai-url-absolute="true/false" Force FIT URL to be written as absolute URL, even if the protocol is the same as in the current request
ai-url-resolver="base-url/urlmap" Default: base-url. If set to urlmap, the URL must be relative and start with /, and it is expanded according to the URL map. If no matching entry in the URL map is found, the behaviour is undefined. CAUTION: this option is not compatible with the text filter

There are also some global options in conf/config.xml that affect the rewriting behaviour:

<config>
  <url-rewriting>
    <force-https />
    <force-absolute />
    <trailing-marks />
  </url-rewriting>
</config>

Note that the corresponding element options ai-url-https and ai-url-absolute take precedence over the global options.

Example for the ai-url-resolver option

Let’s say the current site resides at http://m.example.com/, we are processing a request for http://m.example.com/shop/ and the following HTML code with two links has been loaded:

<a href="/">Shop</a>
<a href="/" ai-url-resolver="urlmap">Home</a>

The first link (Shop) is resolved relative to the base-url (default) and the second link (Home) according to the URL map.

<urlmap>
  <map path="/shop/" source="//shop.example.com/" />
  <map path="/$" source="//backend.example.com/index.html" />
</urlmap>

With the URL map above, the resulting effective URLs are:

<a href="http://m.example.com/shop/">Shop</a>
<a href="http://m.example.com/">Home</a>

URL Marks

URL Marks can be added to the links of an element using ai-mark- options. To set an URL Mark with name test and value SomeValue, use ai-mark-test="SomeValue". To set the URL Mark example without a value, use ai-mark-example="".

Basic Example

The following example assumes that the document is loaded from http://example.com/ and mapped to /:

<html>
<body>
<a href="/index.html" ai-mark-foo="bar" ai-mark-quux="" ai-url-https="true"><img src="/images/home.png"/></a>
</body>
</html>

After rewriting:

<html>
<body>
<a href="https://fitserver.example.com/site-prefix/;foo=bar;quux/index.html" ai-url-https="true"><img src="/site-prefix/;pass/images/home.png"/></a>
</body>
</html>

Note that the attribute for the a element does not affect the child element img.

Custom XML Example

Sometimes you may want to rewrite URLs in non-HTML documents. Use ai-url-attributes and ai-url-role to configure the URL rewriter.

The following example assumes that the document is loaded from http://example.com/ and mapped to /.

<person>
  <name value="John Doe"/>
  <homepage value="http://example.com/john" ai-url-attributes="value" ai-url-role="content" />
</person>
<person>
  <name value="John Doe"/>
  <homepage value="/site-prefix/john" />
</person>

Custom srcset Example

To rewrite a custom attribute like a srcset attribute, the srcset role must be used.

<img data-srcset="img1x.jpg 1x, img2x.jpg 2x, img4x.jpg 4x" ai-url-attributes="data-srcset" ai-url-role-data-srcset="srcset">