pywiki: a Python web app without frameworks

A couple of weeks ago I went through Writing Web Applications in Go and my mind was blown away: I used to think that you need a web framework (at least a very minimal one like Flask) to make web applications (possibly because of my Python background). So I set out to do something similar in Python3 to figure out why its standard library is not enough.

gowiki

Gowiki is an example of a very tiny web application that is already useful: it can show text files (GET /view/<PAGE>) from ./pages/, edit them using a <textarea> (GET /edit/<page>) and save them back (POST /save/<PAGE>).

http.server: the basics

First of all, we need a web server. I use python3 -m http.server pretty often to serve a directory over http quickly, which looks promising to extend.

The documentation page for http.server greets us with a warning:

http.server is meant for demo purposes and does not implement the stringent security checks needed of real HTTP server. We do not recommend using this module directly in production.

Fine, that must be one of the reasons why nobody uses it. Let’s proceed at our own risk.

Let’s run the initial example:

import http.server as http

address = ('', 8000)
with http.HTTPServer(address, SimpleHTTPRequestHandler) as httpd:
    print('Listening on %s:%d' % address)
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass

Extending the http.server

The documentation looked pretty massive and intimidating at first, and after some initial reading I had a lot of questions:

should I extend HTTPServer, BaseHTTPRequestHandler, one of its subclasses? Turns out that HTTPServer is the “transport” part of the application (a subclass of TCPServer) and it takes a request handler (subclassed from BaseHTTPRequestHandler) as a mix-in to actually handle the requests; SimpleHTTPRequestHandler is a subclass of BaseHTTPRequestHandler that maps requests to files under the current directory; CGIHTTPRequestHandler extends SimpleHTTPRequestHandler with the ability to run CGI scripts from ./cgi-bin (in the very primitive CGI way: running a process for each request).
are there any request routing mechanism in the standard library? Nope, none: you need to handle routing manually, unlike Go;
where do I get the requested path (do I get any parameters to do_GET)? I needed to check with the source code and debug it to understand that do_GET relies on the state of its object;
what is send_response(), send_response_only(), send_error(), end_headers(), what is the protocol to call them? Which of those are internal methods and which can I call? I needed to read the source code of http.server to understand what they do exactly;
how do I write the response? I wrote some Python FastCGI script some time ago, so I knew about wfile file descriptors that can be used for this: indeed, there’s self.rfileand self.wfile;

import shutil
from pathlib import Path
from http import HTTPStatus as status

pages_dir = Path('./pages/')

class RequestHandlers(http.BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path.startswith('/view/'):
            return self.get_view()
        return self.send_error(status.NOT_FOUND)

    def respond_ok(self, body):
        self.send_response(status.OK)
        self.end_headers()

        shutil.copyfileobj(body, self.wfile)

    def get_view(self):
        title = self.path[len('/view/'):]

        p = pages_dir / (title + '.txt')
        if not p.exists():
            return self.send_error(status.NOT_FOUND)

        with p.open('rb') as f:
            self.respond_ok(f)

Plug it in http.HTTPServer(address, RequestHandlers) and we are ready to go.

Serving HTML

Now let’s serve html instead of plain text. At first, I had hard-coded html snippets in the source code using multi-line string literals and inserting values using formatting, but soon I needed to move templates into separate files and to do more advanced templating using string.Template:

...
import io
import string

templ_dir = Path('./tmpl/')

class RequestHandlers(http.BaseHTTPRequestHandler):
#   ...
    def respond_ok(self, body, mime='plain'):
        if isinstance(body, str):
            body = body.encode('utf8')
        if isinstance(body, bytes):
            body = io.BytesIO(body)

        self.send_response(status.OK)
        self.send_header('Content-Type', 'text/%s; charset=utf8' % mime)
        self.end_headers()
        
        shutil.copyfileobj(body, self.wfile)

    def format_tmpl(self, page, **values):
        template = open(templ_dir / tmpl).read()
        t = string.Template(template)
        return t.substitute(**values)

    def get_view(self):
        title = self.path[len('/view/'):]

        p = pages_dir / (title + '.txt')
        if not p.exists():
            return self.send_error(status.NOT_FOUND)

        with p.open('rb') as f:
            body = f.read().decode('utf8')
            page = self.format_tmpl('view.html', title=title, body=body)

            self.respond_ok(page, text='html')

and now we have a new file, tmpl/view.html:

<h1>${title}</h1>
<ul id="menu">
<li>[<a href="/">main</a>]</li>
<li>[<a href="/edit/${title}">edit</a>]</li>
</ul>
<hr>
<div><pre>${body}</pre></div>
<link rel="stylesheet" href="/file/style.css">

Wow, that’s a rather low-level trickery going on now in respond_ok(), but it will pay off later, when we need to serve files.

Editing…

The frontend part is a matter of a simple html page, tmpl/edit.html:

<h1>${title}<h1>
<ul id="menu"></ul>
<hr>
<div>
<form action="/save/${title}" method="POST">
<textarea cols="80" rows="20" autofocus name="body">${body}</textarea><br>
<input type="submit" value="Save">
</form>
</div>

and serving it is easy:

class RequestHandlers(http.BaseHTTPRequestHandler):
#   ...
    def do_GET(self):
        if self.path.startswith('/view/'):
            return self.get_view()
        if self.path.startswith('/edit/'):
            return self.get_edit()
        return self.send_error(status.NOT_FOUND)

    def get_edit(self):
        title = self.path[len("/edit/"):]

        body = ''
        p = pages_dir / (title + '.txt')
        if p.exists():
            body = p.open().read()

        page = self.format_tmpl('edit.html', title=title, body=body)

        self.respond_ok(page, text='html')

…and POSTing

Now, how do I do POST requests? The shocking answer is that http.server only provides do_POST and leaves you on your own. You need to read the sent form yourself, parse it (thanks to the standard library, there’s urllib.parse.parse_qs) and save it.

I spent some time debugging why self.rfile.read() just hangs the app: turns out, you need to read Content-Length and only read it from self.rfile (HTTP/1.1 can reuse connections for new requests).

class RequestHandlers(http.BaseHTTPRequestHandler):
#   ...
    def _is_form(self):
        """ checks is the request is a form sent from a page """
        formmime = 'application/x-www-form-urlencoded'
        return self.headers.get('content-type') == formmime

    def _parse_form(self):
        """ parse and save form into `self.form` """
        size = int(self.headers['content-length'])
        form = self.rfile.read(size)
        form = form.decode('ascii')
        form = parse_qs(form)
        self.form = form

    def form_value(self, key):
        """ get a form parameter by key or None """
        return self.form.get(key, [None])[0]

– and now we are ready to read form parameters!

    def do_POST(self):
        if self._is_form():
            self._parse_form()

        if self.path.startswith('/save/'):
            return self.post_save()

        return self.send_error(status.NOT_FOUND)

    def post_save(self):
        title = self.path[len("/save/"):]
        body = self.form_value('body')

        p = pages_dir / (title + '.txt')
        with p.open('w') as f:
            f.write(body)

        self.respond_redirect('/view/' + title)

Redirecting requires reinventing the wheel again:

    def respond_redirect(self, url):
        self.send_response(status.FOUND)
        self.send_header('Location', url)
        self.end_headers()

Now we can view, create and edit pages.

Main page

Adding the index page is very straightforward now. The only small problem is that there are no loops in string.Template, so the html representation for the list of pages must be built in Python:

    def get_main(self):
        pages = []
        for p in pages_dir.iterdir():
            if p.suffix == '.txt':
                pages.append(p.stem)

        pagelist = []
        for page in pages:
            pagelist.append(f'<li><a href="/view/{page}">{page}</a></li>')
        pagelist = '\n'.join(pagelist)

        body = self.format_tmpl('main.html', pagelist=pagelist)
        self.respond_ok(body, text='html')

    def post_goto(self):
        page = self.form_value('page')
        self.respond_redirect("/edit/" + page)

– and don’t forget to add get_main to do_GET and post_goto to do_POST! This definitely could be automated using getattr and dynamic method calls, but I am a little wary of calling code dynamically based on external requests.

main.html is:

<h1>pywiki</h1>
<hr>
<form action="/goto" method="POST">
<input type="text" name="page" required autofocus>
<input type="submit" value="Go">
</form>
<ul>
${pagelist}
</ul>

Serving files

What about serving files, e.g. adding a bit of CSS through /file/style.css? I could just fall back onto SimpleHTTPRequestHandler and use its do_GET, but let’s reinvent this too:

files_dir = Path('./file/')

class RequestHandlers(http.BaseHTTPRequestHandler):
#   ...

    def get_file(self):
        """ just serve files from ./file/ """
        fname = self.path[len("/file/"):]
        if '/' in fname:
            return self.send_error(status.BAD_REQUEST)

        fpath = files_dir / fname
        if not fpath.exists():
            return self.send_error(status.NOT_FOUND)

        with fpath.open('rb') as f:
            contenttype = 'application/octet-stream'
            if fname.endswith('.css'):
                contenttype = 'text/css'

            self.respond_ok(f, {'Content-Type': contenttype})

A nice side-effect here is that it’s not necessary to read file to memory at all, shutil.copyfileobj() can take care of this through sendfile(2).

General ergonomics compared to Go

absence of any complex examples in the documentation. This might be intentional: to discourage production usage, which is assumed by default in Go.
massive documentation with several big classes with directly accessible state; you need to read through and consult the source code to understand what is going on: what methods to call, in what sequence, what methods are low-level and high-level, it was not clear immediately how to read the request body and write the response (which is immediately obvious in Go: http.Request and http.ResponseWriter as parameters to a handler);
not a very complex, but non-trivial hierarchy of classes which are used to access complex state and run custom methods (instead of clean and isolated callbacks in Go with clear inputs and outputs);
a very bare HTTP implementation, without essentials like parsing form fields (Go’s http.Request.FormValue()) and redirects (just http.Redirect() in Go);
no html templating with loops, no html character escaping, compared to http/template in Go;

Conclusion

The source code of pywiki may be found here: https://git.dmytrish.net/lang-learn/pywiki.

I feel like I have written my own buggy and incomplete micro webframework at this point: low-level http manipulation, absence of good url routing, manual parsing, etc; http.server is clearly not supposed to be extended further and should be a finished demo of what can be done using other parts of the standard library.

There is no concurrency here: everything is blocking (although e.g. Flask works the same way); even though Python has asyncio now, the standard library does not ship an asyncio http server, which is a shame. Go gives you production-grade concurrency for free.

I haven’t built any security checks into my code, which might be disastrous.

So: you can write primitive web-applications using only the Python standard library. Should you do it? Judge for yourself.