How my web journal is build

Journal Topic of Robin Gruenke

For the purpose of starting my blog (I call it journal, because I will write in small chapters and it also serves as documentation), I want to generate static html without a server. I want a blend of: Plain text, less boilerplate, purity, approachability, content validation, modularity and freedom for customizing and extension.
Chapter Index

TL;DR I created my own document format
MAR 23, 2020 - Robin Gruenke

Writing my articles outside the scope of html and css rendering was important to me. I want to write plain text and decorate it with properties, which resemble certain reoccurring html components and meta information. So I created my own format for this. You can find a link to the document for this article in appendix.

Preface: What about Elm ?
MAR 7, 2020 - Robin Gruenke

Because I am a big fan of Elm, my first thought was if it was a good choice for my purpose. I have a little bit experience with elm-static, which is an opinionated tool for creating static html for your website. It supports markdown and elm-markup. However, in my humble opinion it is structural very complex, has a lot of boilerplate and the modularity of the markdown files is somehow hard to see through.

What about elm-markup ?

I saw the introduction video of it, and I think it is a very great idea. It is so modular and extensible that you can create very rich custom formats for your static html. I was looking for it on github, but soon I realized, it is somewhat hard to start off with it, since there is no real introduction and documentation for it. I would spend a lot of time understanding it and getting things done, so dropped it from the list. However, maybe I will have a look later this year !

Finally, I was thinking: What would be the elm way of doing it ?

Elm could render static html in a very simple way, by just creating a main function and call the html functions you need. And it would bring all the power and modularity of elm with it. However, I was quite sure that the compiler couldn't generate html at all.

This is how I would write it:

module Test exposing (..)
import Html exposing (Html, div, h1, text)

main : Html Never
main =
    div [] [ h1 [] [ text "Hello World !" ]  ]

Cool, no boilerplate at all in the first place !

When you compile it, you get a html document. However, there is no static html from your main function in it. It just seems so natural to me that it would generate static html, since the annotation line 'main : Html Never' tells us: I will render html without any Javascript Events guaranteed. Of course, in the end, I was expecting that. As of today, Elm can render html via its Javascript runtime only (It is a compile to js language).

Maybe the addition of a built-in custom type like 'Static' could tell the compiler to just render html ? It would be just so simple and straight forward :)

Summary:

It seems like that Elm is not the appropriate tool (for now). I could generate pages just the way I mentioned earlier, however, the whole page would be rendered with javascript, which is not what I want (think of SEO).

Preface: And what about PHP ?
MAR 7, 2020 - Robin Gruenke

Yes, good old php. It comes with modular html rendering 'included'. That is what it was made for initially. I had my experience with it. It is great for starting a small project from scratch and you want to proceed fast. Until today it grew to has compelling OOP features. Also, leveraging existing html files to be dynamic is very easy. However, mixing plain html and php code is scaling badly, since maintaining readability is clearly not a 'built-in feature', in my opinion. It can quickly turn into a complete mess.

In order to get a clean and fitting environment with php, I would need to research for appropriate tools. But the pool of my concerns are too inconstant. Do I need some sort of smart classes which handle rendering ? How could I dismiss writing mixed php and html ? Do I need some sort of framework ? Where can I find well written documentation ? Can I have PHP with a nice syntax ? (Python, you ?)

I am sorry, PHP.

Indeed Python !
MAR 7, 2020 - Robin Gruenke

I was learning Python the last weeks. Somehow it caught my attention after years, and I thought: why did I never give it a try ? Sorry Python, that I never considered you before ! You are clearly awesome !

Python just runs. It is very approachable and easy to learn. It has clever and unique idioms, straight forward data structures, outstanding libraries for a lot of use cases and on top, a very clean syntax.

It is versatile.

- Do you want to compute complex math with less code ? NumPy got your back.

- Do you want to create a science application ? SciPy got your back.

- Do you want to create interactive code documents ? IPython and Jupyter Notebooks got your back.

- Want to compute math for neural networks on your GPU ? PyTorch has your back, just to mention one.

- Data Science ? Python loves it.

- Web Frameworks ? Django, Flask and more.

Now: What about my need of html modularity and a simple and clean approach ? I would prefer not to write html documents but still be declarative. I also would prefer not to use some sort of dynamic template language.

Luckily, Python has my back and I found a really interesting library for my purpose: Yattag.

Yattag code is just plain Python which utilizes the 'with' statement:

from yattag import Doc, indent

doc, tag, text = Doc().tagtext()

with tag('html'):
    with tag('body', id = 'hello'):
        with tag('h1'):
            text('Hello world!')

print(indent(doc.getvalue()))

Result:

<html>
  <body id="hello">
    <h1>Hello world!</h1>
  </body>
</html>

This way I could write my whole site in pure python. Clean, modular and declarative html generation there you go !

Still, I want to write my articles in plain text !
MAR 8, 2020 - Robin Gruenke

Okay, okay. Yes, I would still need to build rather complex and repetitive stacks of 'pythonic html' for each page. Reusability or not. Of course I will build up functions which resemble html components, that would be the modular part. But that is not sufficient.

To recapture:

I want a blend of: Plain text, less boilerplate, purity, approachability, content validation, modularity and freedom for customizing and extension.

So what would be basically missing is content validation and plain text. I could extend my pythonic html with validator functions, to get the validation I need, or create a validation layer between the parsing and the rendering. But before all that, I want to write my articles in plain text.

So I finally decided to create my own document format, which resembles a Journal Page with all its specific layout and styling.

I want to write a parser that parses the document, validates for SEO best practices and enforces semantics in content (like a requirement for an introduction text, max length of content elements, required meta data or keyword occurrence). The reason I want to combine parsing and content validation is, so that I can reuse it for another project. Content semantics would be a feature of the document format.

Each Page will have the exact same structure and basic elements. Creating an abstract format, while letting python handle the parsing, while letting the parsing handle the content validation, while letting yattag assemble the html from the parsing results, would be a nice separation of concerns. Phew !

As an aside, you can see (maybe you were thinking about it already), the very known markdown format would not be enough for this.

How does the format look like ?
MAR 9, 2020 - Robin Gruenke

At first I describe the meta data for the document:

  /meta
  author: Robin Gruenke
  website: https://www.robingruenke.com
  year: 2020
  title: Journal - Tools | robingruenke.com
  description: Generate static html flexible, approachable, consistent and with a custom format
  keywords: journal generate html python elm
  topic: How my Journal is build

- Semantic blocks are annotated with a slash like /meta

- The meta properties are expected in this exact order, in those exact lines, to enforce consistency across multiple pages.

- author: Requires exactly two latin character words separated by a space. This author is the owner of the journal topic. Any following author properties are just so that multiple authors can write on a single document. Like for guest chapters.

- website: The journal topic owners website

- year: Requires a four digit year.

- title: Requires three words with the given special characters and spaces

- description: Requires exactly 50 to 160 characters

- keywords: Requires exactly 5 latin words

- topic: Can be any characters up to a length of 50 characters

Then there is the /introduction block. It is required to have it in the document including the following plain text, which must be between 50 and 300 characters, not counting spaces, and it must be surrounded with two line breaks.

  /introduction

  For the purpose of starting my blog (I call it journal, because I will write in small chapters),
  I want to generate static html without a server.
  I want a blend of: Clean approach, less boilerplate, simplicity, approachability,content validation,
  plain text, modularity and freedom for customizing.

For every chapter there is a /chapter block. It also requires to set at least topic, author and a date. The date is in german format (dd.mm.yyyy). Optionally you can set a picture, which consists of two space separated values: a (css compatible) height value and the link itself. Then, there is the plain text for the chapter, it is just free texting, with unlimited paragraphs. Just separate them by two new lines. They will be wrapped in p tags later on.

  /chapter
  topic: Indeed Python !
  author: Robin Gruenke
  date: 07.03.2020
  picture: 1000px https://imgs.xkcd.com/comics/python.png

  I am a paragraph

  Another paragraph

And that's basically it.

Ah yes, not to forget, there are these code blocks on this page which looks like this: (I had to escape them, a code block starts with code: and ends with :code)

  |code
  <html>
    <body id="hello">
      <h1>Hello world!</h1>
    </body>
  </html>
  code|

Code blocks may only occur in chapter blocks in between paragraphs.

- Finally, a single line of at least three dashes (---) stops the parsing. It has come to an area in the document, which is just for happily and freely drafting around :)

What I want to add further:

A chapter appendix with one hyperlink (use 'appendix: [description] hyperlink')

One markdown style inline hyperlink at the end of an introtext. I think this should be the only place in the document for an inline hyperlink, because I think they are distracting from text content. This is not covered by the journal parser however. It is part of the html components.

Gallery support (use 'gallery: 75px (path) (path) (path)', picture attribute is required)

Quotes for chapters (use 'quote: [description] [quotetext] [hyperlink]')

Checkbox support for paragraphs beginning with '- [ ]' or '- [x]'. This is also covered by the html components.

- Maybe some emphasizing stuff. I am not too much of a fan of bold or underlined words in texts, however. I think it is distracting from the content. Maybe some italic stuff. I like Italy !

Also, some small uncritical parser bugs have to be fixed.

Make the parser module extensible

How does the Parser output look like ?
MAR 10, 2020 - Robin Gruenke

The parser output is a Python Dictionary with the following keys and its value types (Key : Type) :

-- document root structure
author : String
owner-website : String
year : String
title : String
description : String
keywords : String
topic : String
introtext : String
chapters : List Dictionary

The 'chapters' value is a List of Dictionaries, where each Dictionary resembles a /chapter block in the document. A chapter Dictionary has the following keys:

-- Chapter structure
topic : String
author : String
date : String
appendix : Dictionary { 'href' : String, 'description' : String }
picture : Dictionary { 'src' : String, 'height' : String }
paragraphs : List Dictionary

The 'paragraphs' value is a List of Dictionaries, where each Dictionary resembles a line break separated text block in the chapter, be it just a text block or a code block. Therefore two properties are in a paragraph Dictionary: type and content. Type stands for text or code block, the content is just the actual text from the document.

-- Paragraph structure
type : String
content : String

Here an overview of the output converted to JSON:

{
    "author": "Robin Gruenke",
    "owner-website": "https:\/\/www.robingruenke.com",
    "year": "2020",
    "title": "Journal - Tools | robingruenke.com",
    "description": "Generate static html flexible, approachable, consistent and with a custom format",
    "keywords": "journal generate html python elm",
    "topic": "How my Journal is build",
    "introtext": "For the purpose of starting my blog (I call it journal, because I will write in small chapters), I want to generate static html without a server. I want a blend of: Clean approach, less boilerplate, simplicity, approachability, content validation, plain text, modularity and freedom for customizing. ",
    "chapters": [
        {
            "topic": "How does the parser output look like ?",
            "author": "Robin Gruenke",
            "date": "10.03.2020",
            "paragraphs": [
                {
                    "type": "text",
                    "content": "I am a Paragraph"
                },
                {
                    "type": "code",
                    "content": "I am a Paragraph"
                }
            ]
        }
    ]
}
Copyright 2020-2023 Robin T. Gruenke