@p7g world

Patrick Gingras


A static site generator in 100 lines of Python

WARNING: Opinions ahead, prepare to be offended.

Lately I’ve come to realize that the web (among other things) is being overtaken by bloat.

Every new UI framework is “blazing fast” and “modern” and it’s driving me crazy. We use JavaScript for everything, even static sites that just display information and have no user interaction whatsoever (you’re damn right I’m talking about Gatsby).

I have reached the point where I feel I must put my foot down and proclaim:

Enough is enough!

This is why, dear reader, I am ranting at you today.

What am I gonna do about it?

Not much really.

Today we’re going to go over how I built a static site generator using Python and a minimum of dependencies in 100 lines (including blanks).

You can see the result in front of your very eyeballs, and the source can be found on GitHub

Getting Started

Let’s get an idea of what we want from a static site generator (by we I mean me; I don’t really care that you want a blazing fast, modern website).

  1. No JavaScript. Already done. Wasn’t that easy?
  2. Fast build.
  3. Tiny build artifacts.
  4. Static host compatible.

To be honest, all of this is really easy. Just don’t do unnecessary things.

On that note, let’s actually get started.

Actually Getting Started

The project structure is going to be something like this:

.
├── Makefile
├── README.md
├── build.py
├── posts
│   └── ...
└── requirements.in

Can you tell that I just dumped the output of tree into a code block?

Respectively, these files are:

  • A Makefile, because make is great, no matter what you say.
  • A readme, because this project is on GitHub.
  • build.py, where all the magic happens (spoiler: it’s not magic).
  • A directory for posts.
  • A dependency list file.

Let’s go over the results we want from this program.

Build Output

The resulting output should be something like this:

build
├── index.html
└── posts
    └── a-static-site-generator-in-python
        └── index.html

Notice that we’re doing some CRAZY STUFF here.

There’s a directory there called a-static-site-generator-in-python (the slug of this post). Notice how, in your address bar, there’s no /index.html after it. This is the MAGIC of webservers. When you access a path like /posts/whatever, your webserver will first look for a file called whatever. If it can’t find that, it’ll look for a directory called whatever with an index.html file in it. Having found that, the contents of that file are sent back to the client.

Boom. Clean URLs, 90s style.

Site Structure

For now, we’re going to keep this simple. There will be two pages to generate:

  1. A directory that lists all posts in reverse chronological order; and
  2. A page per post displaying the content and metadata of that post

In a later instalment I’ll introduce tags.

So now we know what we (I) want. Let’s get started for real.

Getting Started for Real

To represent a post in our build tool, we’re going to use a class like this:

@dataclass
class Post:
    title: str
    description: Optional[str]
    date: datetime.date
    html: str

This includes all the information we’ll need to render an entry on the home page as well as the post itself on its own page.

Now that we’ve defined what data we need, we’ll need to find some way to gather it. First we’ll need to read all of the files from the posts directory.

That’s going to look something like this:

with os.scandir("posts") as it:
    for post_file in it:
        if not post_file.is_file() or post_file.name.startswith("."):
            continue

        with open(post_file.path, "r") as f:
            text = f.read()

        print(text)

Here’s what we’re doing:

  1. Get an iterator over the files in the posts directory.
  2. Iterate over that iterator.
  3. Skip any files that aren’t files or that start with a dot.
  4. Read the contents of the file.
  5. Print those contents to the screen.

Clearly we’re not doing anything useful yet, but I hope you can see where we’re going with this.

A List of Posts

Now that we can get access to all the posts, we need some way to get a hold of the metadata for the post. The way I decided to do this is with frontmatter. This is just a bit of yaml matter that you can stick in the front of a file.

To make our lives easier, we’re just going to use a library for this. The one I went with is python-frontmatter.

Let’s modify the code earlier to build up a list of Post objects.

posts = []

with os.scandir("posts") as it:
    for post_file in it:
        if not post_file.is_file() or post_file.name.startswith("."):
            continue

        with open(post_file.path, "r") as f:
            text = f.read()

        post_raw = frontmatter.loads(text)

        posts.append(
            Post(
                title=post_raw["title"],
                description=post_raw.get("description"),
                date=post_raw["date"],
                html=post_raw.content,
            )
        )

Now, we’re loading a dictionary of frontmatter from the file and making a Post object with it. This is cool and all, but it assumes that the content of the file (the part after the frontmatter) is HTML.

This would mean we’d have to write stuff like this:

---
title: My Cool Blog Post
description: This is just such a cool blog post.
date: 2020-02-28
---
<h1>This is the post title</h1>
<p>This is some post content.</p>

I don’t know about you, but I think that sucks. Let’s use markdown instead.

Another library. This time, I used mistletoe, which is a cool CommonMark implementation in pure Python.

To start using this is really simple. Where we create the Post object, just change it to look like this:

Post(
    title=post_raw["title"],
    description=post_raw.get("description"),
    date=post_raw["date"],
    html=mistletoe.markdown(post_raw.content),  # THIS IS WHAT CHANGED
)

The final thing we’re gonna do here is order the posts by the date they were published. I just added a line like this under the block above:

posts = list(sorted(posts, key=attrgetter("date"), reverse=True))

All this does is sort the posts by their date attribute in reverse order.

Generating the Home Page

Now that we have a list of ordered posts, we can pretty easily make the home page.

Doing so will require a bit of HTML wrangling, so I decided to install yet another dependency: yattag.

Yattag is a really cool library for generating HTML in plain old Python. No template files or string interpolation here! Yay!

Let’s use that list of posts to build up a home page:

doc, tag, text, line = yattag.Doc().ttl()

with tag("html"):
    with tag("body"):
        line("h1", "My cool blog")

        with tag("ul"):
            for post in posts:
                with tag("li"):
                    line("h2", post.title)
                    line(
                        "time",
                        post.date.strftime("%B %d, %Y"),
                        datetime=post.date.isoformat()
                    )
                    if post.description is not None:
                        line("p", post.description)

print(doc.getvalue())

Go ahead and add some posts to the posts folder and run what you’ve got now. You should see a bunch of HTML dumped to your terminal. Almost there!

Rather than print it all to the console, we should probably save it. Replace the print call with this:

shutil.rmtree("build")
os.makedirs("build", exist_ok=True)

with open("build/index.html", "w") as f:
    f.write(doc.getvalue())

If you run it now, you should get a build folder with an index.html in it. You can now open the build folder in your browser and see what we’ve got so far.

Generating the Post Pages

If you recall the discussion about site structure before, you’ll know that we’re going to place each post in its own directory. This is a technique that is nowadays referred to as filesystem routing. It is sometimes also known as the way things have worked since the dawn of time.

Let’s implement it.

for post in posts:
    post_dir = os.path.join("build/posts", post.slug)
    os.makedirs(post_dir, exist_ok=True)

    with open(os.path.join(post_dir, "index.html"), "w") as f:
        f.write(post.html)

Now, hold your horses. There are some other things we’ll need to add before that will work.

Where did that slug attribute on Post come from? Glad you asked. It’s time for a new dependency: python-slugify. This one takes a string and makes it into something URL friendly.

Let’s make use of this by adding a new method to the Post class:

class Post:
    ...  # whatever you already have

    @property
    def slug(self):
        return slugify(self.title)

With that out of the way, you can run the code and get a nice tree of files and folders that make up your website (probably; haven’t run this code myself).

We’ll probably want to put more valid HTML into those files though. The output of mistletoe.markdown isn’t going to have html or body tags. We can do something like what we did earlier for the home page:

for post in posts:
    post_dir = os.path.join("build/posts", post.slug)
    os.makedirs(post_dir, exist_ok=True)

    doc, tag, text, line = yattag.Doc().ttl()

    with tag("html"):
        with tag("body"):
            line("a", "Home", href="/")
            with tag("article"):
                with tag("header"):
                    line("h1", post.title)
                    line(
                        "time",
                        post.date.strftime("%B %d, %Y"),
                        datetime=post.date.isoformat(),
                    )
                with tag("main"):
                    doc.asis(post.html)

    with open(os.path.join(post_dir, "index.html"), "w") as f:
        f.write(doc.getvalue())

With this, we’re almost done.

Adding Post Links

This last step is really easy.

First, we need a way to get the path to a post. Let’s be Clean Coders™ and encapsulate this functionality within the Post class.

class Post:
    ...  # all the crap from before

    @property
    def url(self):
        return f"/posts/{self.slug}"

Wow, this Clean Code thing is pretty easy after all.

All we need to do now is put links on the home page. Change the home page rendering code to look like this:

doc, tag, text, line = yattag.Doc().ttl()

with tag("html"):
    with tag("body"):
        line("h1", "My cool blog")

        with tag("ul"):
            for post in posts:
                with tag("li"):
                    with tag("a", href=post.url):  # THIS IS WHAT CHANGED
                        line("h2", post.title)
                    line(
                        "time",
                        post.date.strftime("%B %d, %Y"),
                        datetime=post.date.isoformat()
                    )
                    if post.description is not None:
                        line("p", post.description)

Done.

In Conclusion

That wasn’t so hard, was it? By now you probably have a working static site generator, and you haven’t even written a line of JavaScript.

Now to figure out how to get rid of all this Python…

Anyway, for some ideas on how you can build on this concept, check out my repo here.

Now go on and write a blog post about how you wrote a static site generator, and put it on your blog generated by that very static site generator. That’s what I’m gonna do.