Objectif Lune joins Upland Software.Learn more >

Back to all How-tos

IIS reverse proxy for Workflow

Let me share a very important secret with you: Workflow is not a web server. While it can actually serve web pages and resources using the HTTP Server Input task, it should not be used as a public-facing web server, especially not to serve your whole website.

In this How-To we’ll explore the #1 best practice when it comes to providing end users with output generated by Workflow: use a public IIS instance with a reverse proxy pointing to a separate, private Workflow server.

This how-to is best served with a steaming hot cup o’ joe to an IIS user of at least intermediate level.

Before we start

Here are a few pre-requisites before we start working on the actual configuration:

  • IIS should be installed and have a public facing website already configured. In our example we’ll be using IIS 6.5 but IIS 7 should work equally well. We already have a website with the www.example.com binding, this is what we’ll use.
  • The Application Request Routing extension should be installed on IIS. There are many existing tutorials and how-tos on installing ARR so we’ll assume this is already done.
    • Also, in order for the URL Redirect feature to work we need to enable the proxy in ARR. To do this, in the IIS Manager window, under Connections (left-hand side), select the server you are using. In the main menu section (middle), under IIS, double click on the Application Request Routing Cache icon. In the Actions section (right-hand side), under Proxy, click on Server Proxy Settings… and tick the Enable proxy checkbox and click Apply (Actions section).
  • PlanetPress Workflow 8 or PReS Workflow 8 installation, either on the same server or a different server. Either will work, but for security reasons we suggest having them as a separate server, especially in production. We left the default port 8080, so make sure the firewall on that machine is letting through requests on port 8080!

Creating the URL Rewrite Rule

Once IIS and ARR are installed and ready, we’re go for launch on our URL Rewrites.

To access the URL Rewrite feature:

  • Open the Internet Information Services (IIS) Manager on the system
  • On the left pane, expand the server name, then sites, then your website (such as example.com or default web site)
  • In the center pane, double-click on the URL Rewrite icon.
  • To add a rule, click on Add Rule(s)… in the Actions pane on the right. I suggest just clicking Blank Rule and then Ok since I’ll be covering each option.

In the Edit Inbound Rule, we’ll establish a catch-all rule first. This is the simplest way to do it – redirect any calls to a folder (say, anything in /workflow/) to the workflow server. This means going to http://example.com/workflow/myprocess would trigger the Workflow process containing an HTTP Server Input task with myprocess as an Action.

  • Name: Enter anything, I like Workflow Global Catch-All
  • Requested URL: should be Matches the Pattern (inclusive matching) and Using should be Regular Expressions.
  • The Pattern is a regular expression that determines what we’ll be catching. For our example, the following will work fine: ^workflow/(.*)$ . Let’s break it down:
    • ^ means “begins with”, so that http://example.com/workflow/myprocess will be accepted but http://example.com/something/workflow/myprocess will not be matched by our rule.
    • workflow/ means a static /workflow/ in the URI
    • (.*)$ is a regex expression meaning anything until the end of the URI. So this will catch myprocess and store it in a variable.
  • In the Actions section, we’ll use the following options:
    • Action Type is Rewrite. This means the redirection is completely invisible to the user as it happens in the background.
    • Rewrite URL will be the actual URI where Workflow is located. We’ll also add the process name variable in there: http://myworkflowserver:8080/{R:1}
    • You should definitely check Append query string if you’re using them (adding ?myvar=something&foo=bar in your URLs and using those in your processes).
    • Click Apply in the Actions Pane, then Back to Rules.

Ok so now, when we call http://example.com/workflow/getInvoice?id=abc12345 in a browser, the reverse proxy will call http://myworkflowserver:8080/getInvoice?id=abc12345 and return the result of that call to the browser – without the browser ever knowing there’s a proxy in there!

Addendum 1: Individual Rules

You might not like the idea of the catch-all address, if you’re thinking “yeah but won’t this mean someone who knows how workflow functions could trigger anything on it?” and you’d be right. Of course, no one should even know that Workflow is there, but security through obscurity only gets you so far.

So what do we do? Add individual rules for each process, so that only these processes are accessible and nothing else. The instructions are the same as above, except that:

  • Pattern would be something like: ^my_process_1/(.*)$ to only accept only calls to http://example.com/my_process_1/something
  • Rewrite URL would be more static, such as http://myworkflowserver:8080/ (this ignores any further structure in the URI but will not ignore the query string unless you uncheck that option!)

You’ll have to add one rule for each of the processes you want to have accessible through IIS.

Hey by the way, while every example uses the same process name for the Pattern and Rewrite URL, you don’t actually need to. You can call /user_information/ and have the background process actually be /get_user_info/ if you want! As long as that matches the HTTP Server Input’s Action.

Addendum 2: Outbound Rules for non-embedded resources (HTML output only)

If you want easy mode on, completely ignore this addendum and just make sure that the Embed all resources option is checked in every single Create Web Contents task in your processes. If you’re worried about caching and bandwith, continue reading, this is getting interesting!

If you’re doing Print or Email output, this part is completely irrelevant to you. Only Web contents is affected.

Let me get technical for just one moment here and explain a little bit about the innards and guts of serving HTML contents from Connect: aka what happens when you run Create Web Contents in a Workflow process.

  • The template is merged with an optional record, generating an HTML file
  • Its resources, however (css, images, javascript), are stored on the Connect server’s database.
  • This HTML file has a <base href="http://myworkflowserver:8080/_voyager/localhost/9340/XX/X/"> in the header, containing the URL where these resources are located (basically, the resources are extracted from the template on the server).
  • The HTML is returned to the browser (presumably), and when it opens the file it will start requesting the rest of the resources from the above URL…

Now if you’ve been following, you realize the following: the browser doesn’t have access to that URL if it’s accessed from outside your network (which is the whole point of this exercise). The actual server name and port are kind of hard-coded and are based on how you called the process by HTTP. Meaning, if your reverse proxy used an IP instead, the base href might look like http://192.168.101.123:8080/ . This can be a problem when accessing from a device in a remote network. We would need to change the base href to the actual domain name like http://example.com/workflow .

So how to we fix that? More URL Rewrite magic! In this case, an Outbound Rule. This will modify the contents of the outgoing HTML file sent from workflow back to the end-user’s browser. Ok so, how do we do it?

  • To add a rule, click on Add Rule(s)… in the Actions pane on the right. This time choose the Outbound Rule‘s Blank rule option.
  • Name: Enter anything, I like Workflow Base HREF Fix
  • Under Preconditions, click Create new Precondition and use the following options:
    • Name: ResponseIsHTML
    • Using: Regular Expressions
    • Logical Grouping: Match All
    • Click Add
      • Condition input: {RESPONSE_CONTENT_TYPE} , Matches the pattern , ^text/html
      • Click OK
    • Click OK to save. This makes sures only HTML pages are influenced by this rule!
  • Under the Match section, use the following options:
    • Matching scope should be Response
    • Match the content within, use the drop-down and select only the Base (href attribute) option.
    • Content should be Matching the pattern, Using Regular Expressions.
    • And the Pattern should be the one produced in the base href… according to our examples above, it would be this: ^http://myworkflowserver/_voyager/(.*)
  • In the Action section, we’ll have to give the browser a public URL where he can request the resources, which in turn will use an inbound rewrite rule to fetch them in the appropriate private location (yes that’s a lot of rewrites, fortunately IIS is pretty good at that):
    • Action Type is obviously Rewrite.
    • The Value under Action Property will be (if we use the catch-all rewrite from above), something like http://example.com/workflow/_voyager/{R:1}.
Tags
workflow

Leave a Reply

Your email address will not be published. Required fields are marked *