Reverse proxying a sub-domain via Apache

Suppose you have a domain that hosts your website: domain.com, and the website is served with the venerable Apache HTTP server. Suppose, also, that you want to run some backend application on the same domain, perhaps using a sub-domain like sub.domain.com. Running an application on a non-standard port (not 80 or 443) is not a problem, but what it you need it to run on port 80? Apache occupies port 80 in order to serve domain.com, so at least on the surface this seems like a problem.

This post talks about how to make it work using the reverse-proxying capabilities of Apache. It assumes you control a virtual machine that has a top-level domain like domain.com mapped to it, and that the machine runs Linux.

Setting up Apache as a proxy with mod_proxy

If you need to brush up on proxy concepts, consider reading this series of posts first.

Assuming Apache is already installed and running on the server, you'll first have to enable the proxy module and restart the service:

$ sudo a2enmod proxy proxy_http
$ sudo systemctl restart apache2

Sub-domains typically have their own configuration file in /etc/apache2/sites-available. Create a new configuration file in that directory, named sub.domain.com.conf or some such; here's what should be in it (adjust as needed):

<VirtualHost *:80>
        ProxyPreserveHost On
        ProxyPass / http://127.0.0.1:5000/
        ProxyPassReverse / http://127.0.0.1:5000/

        ServerName sub.domain.com
        ServerAdmin your@email.com

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

This tells Apache that the sub.domain.com route should be proxied to a service running locally on port 5000; naturally, the service address can have a different port or run on a different domain altogether.

Next you'll want to register that configuration with Apache and restart it again:

$ sudo a2ensite sub.domain.com.conf
$ sudo systemctl restart apache2

Running the backend service

Now that Apache is all set up, it's time to run the actual backend service at port 5000. As an example, you can run this simple header debugging server:

$ go run http-server-debug-request-headers.go -addr 127.0.0.1:5000
2023/01/17 01:01:20 Starting server on 127.0.0.1:5000

To test that it runs properly, in a separate terminal (on the same machine!) let's run curl:

$ curl 127.0.0.1:5000/headers
hello /headers

And looking at the terminal where the server is running, you should see some useful logging:

2023/01/17 01:02:50 127.0.0.1:42406   GET     /headers        Host: 127.0.0.1:5000
User-Agent: curl/7.81.0
Accept: */*

If you've followed all the steps in this and the previous session, it should work via the sub-domain now (from any machine):

$ curl http://sub.domain.com/headers
hello /headers

Apache listens on port 80 for domain.com, and when it sees requests to sub.domain.com, it proxies them to the server running on port 5000 on the same machine.

If this doesn't work for you, take a careful look at the Apache logs - both the error log and the access log may be useful.

Bonus: TLS with Let's Encrypt

If your server is set up to serve domain.com via TLS using Let's Encrypt, I have good news for you -- it will just work for sub.domain.com as well!

Presumably you've set up Let's Encrypt certificates using certbot. Since we've now added an additional Apache configuration (sub.domain.com.conf), we should run certbot again:

$ sudo certbot --apache

And carefully follow the on-screen instructions. certbot should detect there's a new sub-domain to get a certificate for; if everything goes as expected, it succeeds and from that point on you should be able to access the backend server via HTTPS:

$ curl https://sub.domain.com/headers
hello /headers

Note that the backend Go server serves HTTP; the reverse proxy (Apache) terminates the TLS connection and passes HTTP to the backend server. This is a fairly common way to structure backends. While the backend server serves unencrypted traffic, it's not actually accessible from outside the machine (port 5000 is unlikely to be exposed). The only way to access it is via the reverse-proxy on sub.domain.com, which can use TLS if needed.

I was wondering how this works. certbot uses the HTTP challenge with Let's Encrypt, wherein it's asked to serve a special file on a special path (typically something like .well-known/acme-challenge) to prove to Let's Encrypt that it controls the domain. But here all requests get forwarded to the backend server...

After scratching my head for a minute I found the answer in certbot's logs, where it honestly explains its tricky ways. It turns out it adds a RewriteRule to our sub.domain.com.conf file for the duration of the Let's Encrypt handshake, sending any requests starting with .well-known/acme-challenge to a known disk location it controls. After all is done, it quietly removes these rules from the configuration file.