A Static Site Generator in Python
WARNING: Opinions ahead, prepare to be offended.
Lately I’ve come to realize that the web (among other things) is being overtaken by bloat.
Every new UI framework is “blazing fast” and “modern” and it’s driving me crazy. We use JavaScript for everything, even static sites that just display information and have no user interaction whatsoever (you’re damn right I’m talking about Gatsby).
I have reached the point where I feel I must put my foot down and proclaim:
Enough is enough!
This is why, dear reader, I am ranting at you today.
What am I gonna do about it?
Not much really.
Today we’re going to go over how I built a static site generator using Python and a minimum of dependencies in 100 lines (including blanks).
You can see the result in front of your very eyeballs, and the source can be found on GitHub
Getting Started
Let’s get an idea of what we want from a static site generator (by we I mean me; I don’t really care that you want a blazing fast, modern website).
- No JavaScript. Already done. Wasn’t that easy?
- Fast build.
- Tiny build artifacts.
- Static host compatible.
To be honest, all of this is really easy. Just don’t do unnecessary things.
On that note, let’s actually get started.
Actually Getting Started
The project structure is going to be something like this:
.
├── Makefile
├── README.md
├── build.py
├── posts
│ └── ...
└── requirements.in
Can you tell that I just dumped the output of tree
into a code block?
Respectively, these files are:
- A
Makefile
, becausemake
is great, no matter what you say. - A readme, because this project is on GitHub.
build.py
, where all the magic happens (spoiler: it’s not magic).- A directory for posts.
- A dependency list file.
Let’s go over the results we want from this program.
Build Output
The resulting output should be something like this:
build
├── index.html
└── posts
└── a-static-site-generator-in-python
└── index.html
Notice that we’re doing some CRAZY STUFF here.
There’s a directory there called a-static-site-generator-in-python
(the slug
of this post). Notice how, in your address bar, there’s no /index.html
after
it. This is the MAGIC of webservers. When you access a path like
/posts/whatever
, your webserver will first look for a file called whatever
.
If it can’t find that, it’ll look for a directory called whatever
with an
index.html
file in it. Having found that, the contents of that file are sent
back to the client.
Boom. Clean URLs, 90s style.
Site Structure
For now, we’re going to keep this simple. There will be two pages to generate:
- A directory that lists all posts in reverse chronological order; and
- A page per post displaying the content and metadata of that post
In a later instalment I’ll introduce tags.
So now we know what we (I) want. Let’s get started for real.
Getting Started for Real
To represent a post in our build tool, we’re going to use a class like this:
@dataclass
class Post:
title: str
description: Optional[str]
date: datetime.date
html: str
This includes all the information we’ll need to render an entry on the home page as well as the post itself on its own page.
Now that we’ve defined what data we need, we’ll need to find some way to gather
it. First we’ll need to read all of the files from the posts
directory.
That’s going to look something like this:
with os.scandir("posts") as it:
for post_file in it:
if not post_file.is_file() or post_file.name.startswith("."):
continue
with open(post_file.path, "r") as f:
text = f.read()
print(text)
Here’s what we’re doing:
- Get an iterator over the files in the
posts
directory. - Iterate over that iterator.
- Skip any files that aren’t files or that start with a dot.
- Read the contents of the file.
- Print those contents to the screen.
Clearly we’re not doing anything useful yet, but I hope you can see where we’re going with this.
A List of Posts
Now that we can get access to all the posts, we need some way to get a hold of the metadata for the post. The way I decided to do this is with frontmatter. This is just a bit of yaml matter that you can stick in the front of a file.
To make our lives easier, we’re just going to use a library for this. The one I went with is python-frontmatter.
Let’s modify the code earlier to build up a list of Post
objects.
posts = []
with os.scandir("posts") as it:
for post_file in it:
if not post_file.is_file() or post_file.name.startswith("."):
continue
with open(post_file.path, "r") as f:
text = f.read()
post_raw = frontmatter.loads(text)
posts.append(
Post(
title=post_raw["title"],
description=post_raw.get("description"),
date=post_raw["date"],
html=post_raw.content,
)
)
Now, we’re loading a dictionary of frontmatter from the file and making a Post object with it. This is cool and all, but it assumes that the content of the file (the part after the frontmatter) is HTML.
This would mean we’d have to write stuff like this:
---
title: My Cool Blog Post
description: This is just such a cool blog post.
date: 2020-02-28
---
<h1>This is the post title</h1>
<p>This is some post content.</p>
I don’t know about you, but I think that sucks. Let’s use markdown instead.
Another library. This time, I used mistletoe, which is a cool CommonMark implementation in pure Python.
To start using this is really simple. Where we create the Post
object, just
change it to look like this:
Post(
title=post_raw["title"],
description=post_raw.get("description"),
date=post_raw["date"],
html=mistletoe.markdown(post_raw.content), # THIS IS WHAT CHANGED
)
The final thing we’re gonna do here is order the posts by the date they were published. I just added a line like this under the block above:
posts = list(sorted(posts, key=attrgetter("date"), reverse=True))
All this does is sort the posts by their date
attribute in reverse order.
Generating the Home Page
Now that we have a list of ordered posts, we can pretty easily make the home page.
Doing so will require a bit of HTML wrangling, so I decided to install yet another dependency: yattag.
Yattag is a really cool library for generating HTML in plain old Python. No template files or string interpolation here! Yay!
Let’s use that list of posts to build up a home page:
doc, tag, text, line = yattag.Doc().ttl()
with tag("html"):
with tag("body"):
line("h1", "My cool blog")
with tag("ul"):
for post in posts:
with tag("li"):
line("h2", post.title)
line(
"time",
post.date.strftime("%B %d, %Y"),
datetime=post.date.isoformat()
)
if post.description is not None:
line("p", post.description)
print(doc.getvalue())
Go ahead and add some posts to the posts
folder and run what you’ve got now.
You should see a bunch of HTML dumped to your terminal. Almost there!
Rather than print it all to the console, we should probably save it. Replace
the print
call with this:
shutil.rmtree("build")
os.makedirs("build", exist_ok=True)
with open("build/index.html", "w") as f:
f.write(doc.getvalue())
If you run it now, you should get a build
folder with an index.html
in
it. You can now open the build
folder in your browser and see what we’ve got
so far.
Generating the Post Pages
If you recall the discussion about site structure before, you’ll know that we’re going to place each post in its own directory. This is a technique that is nowadays referred to as filesystem routing. It is sometimes also known as the way things have worked since the dawn of time.
Let’s implement it.
for post in posts:
post_dir = os.path.join("build/posts", post.slug)
os.makedirs(post_dir, exist_ok=True)
with open(os.path.join(post_dir, "index.html"), "w") as f:
f.write(post.html)
Now, hold your horses. There are some other things we’ll need to add before that will work.
Where did that slug
attribute on Post
come from? Glad you asked. It’s time
for a new dependency: python-slugify. This one takes a string and makes it
into something URL friendly.
Let’s make use of this by adding a new method to the Post
class:
class Post:
... # whatever you already have
@property
def slug(self):
return slugify(self.title)
With that out of the way, you can run the code and get a nice tree of files and folders that make up your website (probably; haven’t run this code myself).
We’ll probably want to put more valid HTML into those files though. The output
of mistletoe.markdown
isn’t going to have html
or body
tags. We can do
something like what we did earlier for the home page:
for post in posts:
post_dir = os.path.join("build/posts", post.slug)
os.makedirs(post_dir, exist_ok=True)
doc, tag, text, line = yattag.Doc().ttl()
with tag("html"):
with tag("body"):
line("a", "Home", href="/")
with tag("article"):
with tag("header"):
line("h1", post.title)
line(
"time",
post.date.strftime("%B %d, %Y"),
datetime=post.date.isoformat(),
)
with tag("main"):
doc.asis(post.html)
with open(os.path.join(post_dir, "index.html"), "w") as f:
f.write(doc.getvalue())
With this, we’re almost done.
Adding Post Links
This last step is really easy.
First, we need a way to get the path to a post. Let’s be Clean Coders™
and encapsulate this functionality within the Post
class.
class Post:
... # all the crap from before
@property
def url(self):
return f"/posts/{self.slug}"
Wow, this Clean Code thing is pretty easy after all.
All we need to do now is put links on the home page. Change the home page rendering code to look like this:
doc, tag, text, line = yattag.Doc().ttl()
with tag("html"):
with tag("body"):
line("h1", "My cool blog")
with tag("ul"):
for post in posts:
with tag("li"):
with tag("a", href=post.url): # THIS IS WHAT CHANGED
line("h2", post.title)
line(
"time",
post.date.strftime("%B %d, %Y"),
datetime=post.date.isoformat()
)
if post.description is not None:
line("p", post.description)
Done.
In Conclusion
That wasn’t so hard, was it? By now you probably have a working static site generator, and you haven’t even written a line of JavaScript.
Now to figure out how to get rid of all this Python…
Anyway, for some ideas on how you can build on this concept, check out my repo here.
Now go on and write a blog post about how you wrote a static site generator, and put it on your blog generated by that very static site generator. That’s what I’m gonna do.