Control HTTP access with IIS as a reverse proxy

Manuel Polling
OL Learn Blog Automation

When implementing OL Connect-based products, you often have to deal with inbound http requests to your server. To do this well, you need to think about secure connections, limiting access to expose only what’s needed, and possibly deal with other routing issues such as load balancing.

On a Windows Server, Microsoft’s IIS can be a simple solution for dealing with much of this, and this article explains how to set it up, some of the things it can be used for, and how that benefits your implementation.

Why use a reverse proxy

There are several reasons why it is a good idea to use a webserver in front of your services that handle inbound http requests. These considerations not only apply to OL Connect products, but to any service, and they can be addressed by any webserver, not just IIS.

Security

Putting IIS in front of Connect Workflow let’s you control what processes can be triggered from the outside. Solutions can have processes that are only meant to be triggered from the server-side, and some may have an http input, while that same solution may be receiving http requests from external clients.

Load balancing

When there’s a need to run multiple instances of a service, for resilience, scale, or both, IIS can be used as a load balancer.

Manageability

When clients are not connecting directly to services, they don’t depend on deployment details such as what ports are used, nor are they affected when these details change.

It’s easy to switch to a different instance of a service, either temporary or permanently.

Separation of concerns

When things like https are handled in IIS, they can be taken care of by regular IT instead of application specialists. This makes life easier for both.

Why use IIS

I am not an expert on webservers, nor am I a particular fan of any webserver, and IIS is usually not considered the best one out there by people who are experts. The reasons I chose to use IIS when I needed a gateway, were that (a) it comes with Windows Server, so it’s readily available, and (b) it has a user interface that makes it easier to find your way for a non-expert like me.

In addition, many OL Connect deployments are done at small and medium-sized enterprises with a limited IT staff who focus on Microsoft Windows. These may not have their own preference for webservers (they typically run their websites at an ISP who takes care of that for them), and introducing a particular one such as NGINX, or Apache introduces something they may find hard to manage. IIS can then be easier to pick up by the IT department. Since any webserver requires active maintenance to stay secure, it is important that a hand-over to the regular IT-staff is possible.

If you are running an environment where there is a preferred webserver with the know-how on how to set it up and maintain it, feel free to use that instead. It will have the same benefits to you, as IIS had for me.

Setting up IIS

Start the Add Roles and Features Wizard, which can be found in different ways on Windows Server 2019. Server Manager is the most straightforward.

In the wizard, navigate to the Select Server Roles page, and select Web Server (IIS). This probably triggers a dialog asking to Add features that are required for Web Server (IIS). Choose Add Features. This will give you the user interface that made my life easier. Press Next.

When you get to the Role Services page, you may want to review the options. I used the defaults.

Press Next and then Install on the Confirmation page.

Once installation is complete, you have a webserver. To check if it’s running, just browse to localhost, and you should find a generic default home page.

Now you can have a look at its configuration through the Internet Information Services (IIS) Manager. It lives in the Start menu under Windows Administrative Tools, but I can never remember its location and full name, so I just type IIS in the search box to launch it.

Add Routing and Rewriting

To allow IIS to sit in front of other applications, it needs to be able to route requests to those applications. This typically involves rewriting the requests. To add this capability to IIS, it needs the Application Request Routing extension, which relies on the URL Rewrite extension.

Download and install both. Make sure to first install the URL Rewrite extension.

Note: to download and install extensions from iis.net, you are typically referred to the Web Platform Installer. However, that WPI is scheduled to be retired on 1 July 2022. So, it may be better to manually download and install these extensions. Look for the very small additional downloads link just below the big green button.

These extensions will add icons to the views to configure your server and individual websites in the IIS Manager. Also note an additional Server Farms item in the left-hand tree view.

Configure ARR

In the IIS Manager window, under Connections (left-hand side), select the server you are using. In the main menu section (middle), under IIS, double click on the Application Request Routing Cache icon. In the Actions section (right-hand side), under Proxy, click on Server Proxy Settings… and tick the Enable proxy checkbox and click Apply (Actions section).

You are now ready to create rewrite rules for routing http requests.

Routing to Connect Workflow

Now that we have our IIS running, the first thing we want to do, is route traffic to Connect Workflow. Instead of just opening up the port that Workflow uses for http input in the server’s firewall, we will route requests on the regular http and https ports (80 and 443) to the port that was configured in Workflow.

To test whether things work, first create one or more Workflow processes that handle an HTTP request. Make sure to use the Node.js Server Input (the HTTP Server Input was put into the Legacy category with version 2022.1 for a reason, don’t use that anymore please).

I just created a single process with two actions that returns a bit of JSON that includes the action:

With that deployed, you can take a minute to see if you can reach these endpoints. When you try http://localhost:9090/workflow-foo on the VM you should get a response (if not, go fix that first). Try the same on your desktop, e.g., http://iis-sample:9090/workflow-foo, and you’ll get something like a “Connection timed out” error, because port 9090 is blocked. Try http://iis-sample/workflow-foo, and you should get a “404 Not Found” from IIS.

Note: in case you wonder where the iis-sample host name is coming from, I added that to the hosts file of my system. Moving forward, this will also allow me to play with https. So it may be a good idea for you to do that as well. It’s located here: C:\Windows\System32\drivers\etc\hosts (changes to that file immediately take effect).

So we can reach our VM on the regular http port, but we can’t reach Workflow from outside of the VM. Let’s go fix that. The simplest thing to do, is route every incoming request to Workflow.

In IIS Manager, open the URL Rewrite feature for your server.

  • Choose Add Rule(s)…
  • Choose Blank Rule
  • Enter a Name, e.g., “Workflow”
  • Enter a Pattern: .*
  • Enter a Rewrite URL: http://localhost:9090/{R:0}
  • Press Apply

Now go back to your desktop system and retry the http://iis-sample/workflow-foo URL. This initially gave me a JSON parse error in Firefox 🙂 If you are better at typing commas in the right spot, then you don’t have to go back in Workflow to fix your JSON. But even that error is already in indication that things are now working.

Congratulations, you now have your first rewrite rule in place. However, we have not achieved much yet: IIS will gladly route any request to Workflow (even the ones we don’t want), and we don’t even have https working yet.

What we have achieved though, is that clients don’t have to know port 9090, and that we can even change that port on the server-side without affecting clients; we just need to remember to change the rule in IIS as well. And what if we want to temporarily test with a different server? Just change the rewrite rule in IIS. You may have noticed that these changes are instantaneous.

Patterns and rewrite URLs

Before we move further, let’s back up a bit to what we did when configuring that rule. It has a pattern to match incoming requests, and it has a rewrite URL that contains something special: {R:0}.

The pattern for our rewrite rule is the regular expression “.*” (excluding the quotes). The . means match any character, and * matches the previous token zero or more times, so together they match any string. If this is new to you, then I recommend learning about regular expressions. Luckily for anyone who is not that familiar with regular expressions, there’s a Test pattern… button in the Rewrite rule dialog that let’s you test if your pattern makes sense. There are also some very good resources out there to learn and play with regular expressions. regex101.com is a recommended one.

IIS also lets you use wildcards, which means * matches any string, but these quickly become too limiting, and the documentation I found on them even seems outdated.

Then what about that {R:0} in the rewrite URL. This is a match group, they can also be referred to as back-references. In a regular expression, anything in parentheses () will create such a match group which can then be used as part of the rewrite URL. They are simply numbered from 1 in the order they occur, and match group 0 always refers to the entire pattern. The “R-with-braces” notation is specific for IIS, and not related to regular expressions.

Note that when we match any string, {R:0} contains the entire path and any query parameters of the URL, but not the protocol, hostname, or port.

When you look around in the rewrite rule edit view, you’ll notice that there’s a lot more options. Going over all of these in detail would take too much time. The following are worth mentioning: in addition to match groups, there are also server variables for standard things such as all parts of a URL, rules can have conditions, and there other action types besides Rewrite.

Limiting access

Now let’s move on and limit access to Workflow a bit more. Instead of forwarding any request to Workflow, it would be better to only allow requests that we actually expect to get from non-local clients. That way, we can also have processes with HTTP input for server-side use.

Suppose we are using Workflow to run a COTG solution. We want the COTG app to be able to download form data and submit a result. In Workflow, we can add a process for both, and let one listen to form1-submit, and the other to form1-data.

Once these processes exist, and are deployed, get your Postman out (or your favorite API client), and verify that you can reach these endpoints using both POST, and GET.

Now we want to change the rewrite rules in IIS to only route requests for these two endpoints, and not to our existing workflow-foo and workflow-bar endpoints. First, disable the generic “Workflow” rule we created earlier. Next, create a new Blank inbound rule, and name it “COTG”.

The pattern for this new rule has to specifically match traffic for our new endpoints. Both of these start with form1-, so we can try this pattern: form1-.*

Enter the same rewrite rule as before: http://localhost:9090/{R:0}, and Apply.

When we try http://iis-sample/form1-data, we get a response, and when we try http://iis-sample/workflow-foo, we get a 404 Not Found. So that’s good. But what if we try http://iis-sample/workflow-foo/form1-data? Now we get the same response as the call to just form1-data! We are not reaching an endpoint we are not supposed to reach, but still, that’s not what we want. The regular expression matches any URL that contains the pattern.

We have to make sure that form1- is at the beginning of the request. That’s easy with regular expressions, just add a ^ at the beginning: ^form1-.*, and Apply again. When we now retry  http://iis-sample/workflow-foo/form1-data, we get the 404 that we want for that URL.

How to go from here, depends on how restrictive you want to be, and what you expect to need. Here are a few examples.

Pattern Condition Description
^form1-data$ {REQUEST_METHOD} matches GETMatch the single endpoint only for GET requests.
^form1-submit$ {REQUEST_METHOD} matches POSTSame as above for the submit endpoint; combine these two rules for maximum control.
^form1-(data|submit)$   Match both endpoints in one rule.
^form[1-9][0-9]{0,2}-   Match form1- to form999-
^form[0-9]+-   Match form«any number»

Routing to different services

When your solution uses multiple services, IIS can sit in front of them all, so clients only see a single server that communicates on the regular http(s) port. This also prevents CORS errors on the client side, when a web page served by one service uses a URL from another.

An easy way to distinguish the traffic for the different services, is to look at the first part of the URL path. For instance, all URLs that start with /cotg/ can be routed to Workflow, while other traffic gets routed to a WordPress website.

Workflow of course uses the first part of the path to decide what process a request is for, so to still allow more than one process to listen for http requests, the /cotg/ part will have to be removed from the rewrite URL.

Let’s change our “COTG” rule as follows:

  • Pattern: ^cotg/(form1-(data|submit))$
  • Rewrite URL: http://localhost:9090/{R:1}
    note the 1 instead of the 0.

When you now press Test Pattern… and try the path cotg/form1-data, you’ll notice not just {R:0}, but also {R:1}, and {R:2}. Those are the match groups created by the () in our regex. Dropping the base /cotg/ part is done by only using {R:1} in the rewrite URL instead of the entire {R:0}.

Now, check the Stop processing of subsequent rules box, and press Apply.

To route all other traffic to a different port on the same server, add a new rule below the previous one (let’s call that “Web”), with a match-all pattern (i.e., .*) and a rewrite rule similar to the first one (e.g., http://localhost:8888/{R:0}).

You can also leave out that last rule, and handle all other traffic with IIS itself, or the rewrite URL could of course just as easily point to a different server.

Load balancing

Rewrite URLs can be used to perform load balancing of a service across multiple servers.

  • In IIS Manager, navigate to Server Farms and select Create Server Farm… (right-click context menu or right-hand pane).
  • Enter a name (“WF” for instance) and press Next. In Add Server, enter the address of each of your servers (hostname or IP address) and press Add.
    • You’ll need at least two (obviously), or there will be no load balancing.
    • Tip: if you’re doing this on your own VMWare, add the names of these servers to your host system’s hosts file, and your VMs will be able to use those names.
    • If the service on your target server is not using port 80, you can configure that port while adding the server: click the Advanced settings… link, scroll down to httpPort, and change it to what you need. You can of course also have an IIS on that server to redirect traffic from port 80 to your custom port.
  • Press Finish.

IIS now conveniently asks if it should create a URL rewrite rule to route all incoming requests to this server farm. Press Yes. (Or No if you are on a live server with active users, unless you are absolutely sure.)

When you now go back to your server’s URL Rewrite rules, there’ll be a new rule with a match any pattern that uses the name of your server farm as the host name in the rewrite URL. You can now adapt this as needed.

Let’s look at that server farm we just created. When you click the server farm itself, you have several options for controlling it.

Health Test lets you define a URL that can used to check if the servers are okay. It’s quite easy to create a simple process for that in Workflow.

Load Balance lets you control how requests are distributed across servers, so you can have one server take more load than the other.

When you click Servers (left-hand pane) under your server farm, you get a list of servers.

The most interesting thing here is the ability to take a server offline and bring it back online. This works instantaneous. It allows you to make changes to a server, while the other one does the work.

In combination with the Load Balance functionality, you can use this for all kinds of maintenance scenarios. For instance, take a server offline, apply updates, change the load balancing so the changed server only gets 10% of the requests, and bring it back online. Now only a fraction of users will be impacted if anything goes wrong.

You could also provision a new server and add it to the farm and then take an old server offline, and repeat until all servers are replaced.

Note that setting up load balancing on the IIS side is only part of the equation here: the work needs to be easily handled by multiple servers as well, or the solution needs to be designed for multiple servers. OL Connect, for instance, is not inherently designed for load balancing, so you’ll need to look at this. For example, an application that simply returns a PDF when you send it some data, can easily work with multiple servers. But when you send data with one request and expect that data to be used for another request that may go to a completely different server, then you must find a way to let these servers share that data. If those requests are all from the same client session, you can also look into client affinity.

Working with different software versions

When you are often switching software versions, something our own QA department is particularly familiar with, it can also help to use request routing: let the client talk to IIS and create a rule for each instance of the service. Just make sure to enable only one of those rules at a time. Switching instances is now a matter of disabling one rule and enabling another.

HTTPS

Until now, we have been dramatically unsafe: we have only been handling http so far. We can use IIS to set up https client communication.

IIS can also be the termination point for https and use regular http for routing requests locally. This has the advantage that managing certificates in IIS can be done by IT administrators with little or no experience in configuring the backend application, which gives a good separation of concerns and responsibilities. An application specialist (e.g., OL professional services) now does not need access to the private key of the certificates or the password that protects them.

To create a certificate in IIS Manager, click the server, and go to Server Certificates.

A self-signed certificate can be created straight from here, just select Create Self-Signed Certificate… in the right-hand pane and specify a friendly name that you will recognize.

If your server is part of a proper domain, then you may already have certificates for that. Alternatively, using Let’s Encrypt can be an option, although I have not yet had a chance to try this myself. There are several Let’s Encrypt acme clients that support IIS.

Once you have a certificate, you need to create a site binding for https. Navigate to the “Default Web Site” under Sites. Select Bindings… on the right-hand panel.

In the Site Bindings dialog, choose Add…

Choose Type https, select your certificate from the drop-down at the bottom, and press OK.

You should now be able to use https from the client side.

Note that your rewrite rules typically use http. If the rewrite URL references a different server, as it typically would for load balancing, then you have to consider whether this traffic between the IIS load balancer and these servers should be https as well. If so, then you can choose https in the rewrite rule. To get https on each server in a server farm, you can use IIS on those servers as well.

Making Connect web pages work

If you are creating web pages from Connect without embedding all resources, then the generated html will reference the external resources with URLs that are directly referencing the Connect server. The whole point of introducing IIS is to not let clients access those backend services directly, so we want that html to reference our IIS server.

This can be done by creating an outbound rule.

  • In IIS Manager, go to URL Rewrite.
  • Choose Add Rule(s)…
  • Under Outbound rules, choose Blank Rule
  • Enter a Name, e.g., “Web base href”
  • To make sure this rule only affects html responses, add a precondition:
    • Under Preconditions, click Create new Precondition
    • Enter a Name for the precondition, e.g., “Is HTML Response”
    • Choose Add… to add a condition
    • with Condition input {RESPONSE_CONTENT_TYPE},
    • and Pattern ^text/html
    • Press OK twice (for the condition and the precondition).
  • Under Match, the Matching scope should be Response
    • In the Match the content within drop-down, select Base (href attribute).
    • Enter the pattern: ^\/(.*\.OL-template_files\/public\/document\/)$
      (all forward slashes / are escaped by a backslash, which may not be required, but in this case it’s better to play safe because a / can indicate the end of a regex in certain syntaxes)
  • Under Action
    • Enter a Rewrite URL: https://iis-sample/olcwebcontent/{R:1}
    • We are adding the olcwebcontent prefix, which to be used by an inbound rule to route the resource requests.
  • Press Apply

This outbound rule will fix the base element embedded in the web page that’s served. To actually serve those resources, you need an additional inbound rule that will rewrite the URLs of referenced resources to direct them at the backend service (Workflow in this case).

For this, add an inbound rule with pattern ^olcwebcontent/(.*) and rewrite URL http://localhost:9090/{R:1}.

IIS in production environments

Although this article covers a lot of topics that can be useful for production environments, it is not an exhaustive guide on how to deploy IIS for production purposes. Production environments, or exposure to untrusted clients and/or environments, may require additional hardening that is not covered here.

Make sure to do your own due diligence in this regard, so you are comfortable that your setup is fit for purpose. A quick search can get you started, and there is also detailed information on hardening. Unfortunately, security is hard and takes a constant effort.

References

The starting point for this article was an older article on OL Learn. So kudos to former colleague Évelyne Lachance for writing that.

ARR is a Microsoft supported extension, and they also have documentation on it. For example about load balancing. Many of those pages mention ARR version 1 or version 2, so it’s not completely clear how up-to-date they are. There may be differences with ARR version 3.

The ARR 3 download page also refers to documentation (many or all links point the Microsoft pages mentioned above) and a forum.



Leave a Reply

Your email address will not be published. Required fields are marked *

All comments (1)

  • Jean-Cédric Hamel

    Very nice, thank you Manuel. This will becomes very handy!