Syntax Highlighting
Parsing, Pain, and Regex
While writing the new site generator, I've wanted to also redo syntax highlighting. The previous method was to manually define a syntax, somewhat inspired by how the Ace editor defines languages. There's a big problem with this method, in that I can't find good places to get this syntax from, as it's not actually compatible with Ace.
Enter, syntect, a good rust syntax highlighter that does a lot of languages, and can output to html. The downside?
.z-comment,
.z-string.z-quoted.z-double.z-block.z-python {
color: $comment;
}
.z-keyword.z-operator.z-class,
.z-constant.z-other,
.z-source.z-php.z-embedded.z-line {
color: $orange;
}
.z-variable,
.z-support.z-other.z-variable,
.z-string.z-other.z-link,
.z-string.z-regexp,
.z-entity.z-name.z-tag,
.z-entity.z-other.z-attribute-name,
.z-meta.z-tag,
.z-declaration.z-tag {
color: $red;
}
/* ...
* this goes on for a bit longer, 110 lines total
*/
Not great CSS if I want to style it with CSS, so let's find something better.
The nice thing about the previous method is that there was a small collection of highlight types to deal with, so the CSS needed for styling would be small. But there was no way to get syntax easily. There is the syntax for nano, like the project here, that provide a good set of alternatives. Sadly, these work on a color basis, so styling again becomes a bit hard.
So, are there better alternatives than having to deal with syntect and tmLanguage
files?
I think so, the micro editor seems adapts it's syntax files from the nano project linked earlier, but extends it a bit to provide proper names for certain syntax items. Can we use that?
The actual syntax implementation is a bit different from what I had in the previous iteration of SLSG, but should be reasonably easy to implement. I don't really like yaml however, so I'll probably end up converting the files to some other syntax like lua instead.
I would also want to add the ability to switch between languages when highlighting, as code highlights in markdown and html are pretty cool.
SLSG update
Yes, I've been busy on this still. In the previous post I laid out a new design for the site generator to work more like a traditional site generator like zola or Hugo. I've done that now, and went back to using markdown instead of a custom markdown format.
Instead of shortcodes, or some template language, there is lua, and fennel too because why not.
To fix the issue of having to deal with a template language I just search for processing instructions, run the lua/fennel inside, and paste the result back in. This looks like so:
<p>
Welcome to the site,
where we do templates with
<?lua stylize("lua!") ?>
</p>
Same goes to markdown, as it has a dedicated syntax for processing instructions I pick up on when parsing:
# Hello world
This is some markdown!
cool huh!
Welcome to <?lua stylize("lua!") ?> too!
However, if this returns a function, that function is called at the end, with the full file as argument, in order to do post processing. It is also possible to set a lua or fennel script to process in advance, before the files are parsed. Similarly, if this script returns a function, that function is called after all files are parsed.
I've also added font subsetting, as that's a useful feature. For now, I still have to finish the new syntax highlighter, create the project site, and template sites, write documentation, and deal with the last bits of the standard library, mainly emitting files and dealing with file paths in order to link and read files relative to the current one being templated.