A Simple Apache2 Live Proxy

I help my friends at the corner bar set up a website using Google Sites. It works great. It's super easy to update and they have a lot of fun with it. I set up a frame-based redirect to map their domain name onto their Google Sites URL.

It worked fine for many months, but stopped working a short while ago. Why? Because Google started forcing all Google Sites pages to load over SSL -- over https instead of http. Loading them over http without SSL would just redirect you to the https version of a page.

Overall, a good thing. HTTPS Everywhere. I approve.

But, browsers nowadays want to prevent you from getting tricked into loading a site other than the one you've gone to. So if you try to do a frame-based redirect where part of it is https and part of it is not, the https stuff won't load. It'll just sit there blank, empty.

Thus, it broke the bar's website. The frame redirect wouldn't load the Google Site, it just sat there with a big blank screen.

The fix? Implement a quick-and-dirty live proxy on my linux box that will pull through the Google Site pages and display them (without using SSL) directly, as though they're actually hosted on my server. It turned it out to be pretty easy.

I configured the Apache virtualhost to allow *.cgi as a valid file type, an executable file time. I also configured the virtualhost to use index.cgi as the 404 error handler. This means that one simple little CGI will handle both the calls to the top level page and any subpages.

Here's what that index.cgi looks like:


#!/bin/bash
LYNX="/usr/bin/lynx"
cat << EOF
Content-Type: text/html; charset=utf-8

EOF
if echo "$REQUEST_URI" | grep -qi "[a-z]"
then
$LYNX -source "https://sites.google.com/$REQUEST_URI"
exit 0;
fi

$LYNX -source "https://sites.google.com/site/rogersparkoasis/" 

And here's what the Apache configuration options look like for that virtualhost:
<Directory "/var/www/oasis">
    Options +ExecCGI
    AddHandler cgi-script .cgi .x
</Directory>

ErrorDocument 404 /index.cgi

The proxy could be developed further to pull down and host images, or to rewrite URLs in the HTML output to link to the custom website URL instead of the Google Sites URL, but I think this is good enough for government work.

No comments: