In this post, I’m going to explain how I wrote a blog generator from scratch using Elixir and the standard Erlang library. My motivation? To learn more. What’s the problem with building an HTTP server and a Markdown to HTML transpiler from scratch?

It’s divided into two parts, and in this first part, we’ll explore the HTTP server.

What we’ll see in this series of posts

  • Create our HTTP server
  • Read requests and display results
  • Respond concurrently
  • Transpile Markdown to HTML
  • How to create templates to include CSS and other elements

What is a static site generator?

Writing blogs directly in HTML and CSS can be very costly, especially when the site structure is complex. If you inspect this very website, you’ll see it’s written in extremely simple HTML (partly my decision), but imagine we have a much more complex blog with links, tags, multi-level menus, etc. Then things start to get complicated.

However, it’s much easier to write posts in Markdown and transpile them to HTML. This way, we can write in a much simpler markup language than HTML.

This work of transpiling and making the HTML look nice with CSS is called Static Site Generators, where our page only downloads content and doesn’t accept any requests or anything relevant from the client. They simply request a route, and our HTTP server takes care of delivering it.

What options does the open-source world offer?

  • Serum
  • Franklin
  • Gatsby
  • Hexo
  • Eleveny
  • Pelican
  • Zola
  • Hugo (this website)
  • Jekyll

There are many of them in different languages. Mine is called Personal

Personal

Personal is a project that has lasted 1 week, motivated by boot.dev and ThePrimegen, where they create very interesting projects to push you out of your comfort zone. In my case, I also decided to do it with 0 dependencies to add a bit of excitement. Moreover, by doing it from scratch, one appreciates the tremendous work done by open-source contributors because, as we’ll see, everything is relatively simple but very costly to implement.

HTTP 1.1 Server

As we’ve said, we’re going to program this from scratch. So the first step is to open sockets. We’ll use the gen_tcp module, which will give us everything we need to set up our server.

I won’t go into much detail about how an HTTP server works, but the first thing we need to do is bind to a port and obtain our “Listen Socket”

  defp accept(port) do
    {:ok, listen_socket} = :gen_tcp.listen(
      port,
      [
        :binary,
        packet: :line,
        active: false,
        reuseaddr: true,
        nodelay: true,
        backlog: 1024
      ]
    )

    Logger.info("Listening port: #{port}")

    loop(listen_socket)
  end

This interacts with the TCP/IP stack where a buffer of 1024 spaces is created for pending connections (backlog). To understand this a bit better, every time someone makes a request to our port, it is stored in the buffer waiting to be accepted. Reducing or increasing this number dramatically affects throughput. (You can run benchmark tests and you’ll see the difference :3)

As I just explained, we have a buffer, but now we need to accept these connections one by one, which will be removed from this buffer. To do this, we need to execute the following code in an infinite loop.

  defp loop(listen_socket) do
    case :gen_tcp.accept(listen_socket) do
      {:ok, socket} ->
        Personal.Worker.work(socket)

      {:error, reason} ->
        Logger.error("Failed to accept connection #{inspect(reason)}")
    end

    loop(listen_socket)
  end

It’s very important to consider concurrency at this point because if we think about it, we have a function running in an infinite loop with :gen_tcp.accept/1 and Personal.Worker.work(socket), so if these functions are slow, we can imagine the massive bottleneck in accepting requests. Ideally, we want to accept connections and serve them in parallel without one blocking the other.

So at this point, we need to consider the architecture of our server. In my case, it’s something similar to this:

process arch

In my case, it’s not exactly like this since I don’t create a pool, I simply launch processes. The important thing to highlight here is that we have a Process called Acceptor that accepts connections, and the Worker will be responsible for creating a process per connection to serve the request.

Here we see the main code of the worker:

  def work(socket) do
    fun = fn ->
      case :gen_tcp.recv(socket, 0) do
        {:ok, data} ->
          {code, body} = handle_request(data)
          response = "#{@http_ver} #{code}\r\n#{@server}\r\nContent-Type: text/html\r\n\n#{body}\r\n"
          :gen_tcp.send(socket, response)
          :gen_tcp.close(socket)

        {:error, reason} ->
          Logger.error("Failed to read socket socket #{inspect(reason)}")
          :gen_tcp.close(socket)
      end
    end

    pid = spawn(fun)
    :gen_tcp.controlling_process(socket, pid)
  end

Our Acceptor only executes the fun declaration, creates another process with spawn/1, where we pass the fun and tell the stack using :gen_tcp.controlling_process(socket, pid) that the socket we obtained by accepting with :gen_tcp.accept now belongs to the process we just created.

So our Acceptor can continue its loop accepting connections, and the new process will continue serving the request.

As a note, I have to add that this is not the best way to handle this problem since we raise processes without any type of control, and the fun declaration is executed at runtime, which can cause problems depending on the use case.

The real world is much more complex, and the reality is that we end up using third-party libraries. In the case of Elixir, these libraries are usually:

Moreover, now we have new libraries like Bandit that also has it’s own dependency tree

These libraries are examples of how to work in production, but our small project serves to help you understand why these libraries exist and why people maintain them for so many years.

HTTP 1.1 Request

Great, now we have the client’s socket where we can read and write data. Now we can work in the world of Requests.

HTTP requests are quite simple, as we obtain them in plain text where we can receive multiple lines separated by line breaks.

An example of a request without headers would be:

GET /styles/style.css HTTP/1.1\r\n

It’s now our problem to read the line, parse it correctly, and know what to do with it. In this case, we can see it’s a GET request wanting to obtain the stylesheet at the indicated path.

In our case, I’ve only implemented GET

  def handle_request("GET " <> rest) do
    path =
      rest
      |> String.split(" ")
      |> List.first()

    body = FileReader.get_file(path)

    if body == nil do
      {"404 Not Found", ""}
    else
      {"200, OK", body}
    end
  end

  def handle_request(_) do
    {"405 Method Not Allowed", ""}
  end

As we can see, we extract the path, look it up in FileReader, and return an appropriate response. NOTE: it’s very important how we read the data because this can generate very serious security flaws. In the case of Personal, we’ll see that it pre-caches files in memory.

To send data, we just need to follow the Response format similar to this:

response = "#{@http_ver} #{code}\r\n#{@server}\r\nContent-Type: text/html\r\n\n#{body}\r\n"

And finally execute

:gen_tcp.send(socket, response)
:gen_tcp.close(socket)

These last two lines send the response and close the socket, thus ending the execution of the socket’s controlling process.

As we can see in this simple case, there are thousands of things missing here, such as header handling, different HTTP requests, etc. There’s a world of specifications to discover! rfc9110 have fun!

Getting the body!

The objective of any HTTP server is for our client to obtain any type of data we want to make visible, but this must be done securely. Imagine someone could do something like GET /etc/passwd and our server says, “Sure, no problem, here you go…”.

To avoid this, and knowing this is a small blog, I’ll directly prevent GET requests from having to make system calls to read. To do this, when the server starts up, it will read all data from a specific folder and generate a data structure to access it.

In our case, FileReader defines a folder I’ve called static that stores all the final information our blog can offer. At the same time, the folder structure will be the same as the HTTP request structure.

For example, if our structure looks like this:

static/
├─ images/
├─ styles/
│  ├─ style.css
├─ blog/
├─ index.html

To obtain style.css or index.html, the requests should be:

GET /styles/style.css HTTP/1.1\r\n
GET / HTTP/1.1\r\n

In the case of FileReader, it builds a Map where each key is a folder and the files inside this folder have the filename as the key and the file content as the value.

%{
  "static" => %{
    "images" => %{
      "sample.webp" => <<82, 73, 70, 70, 214, 10, 0, 0, 87, 69, 66, 80, ...>>
    },
    "index.html" => "<html>...</html>",
    "styles" => %{
      "style.css" => "/* css content */"
    }
  }
}

So if in the data variable we have the previous map, to obtain sample.webp we would execute:

data["static"]["images"]["sample.webp"]

In my opinion, this method is quite simple. Obviously, if the script that builds this map accesses resources outside of static, we would have a problem. But apart from that, once the structure is written, it cannot be updated until the server is restarted (although it can be changed at runtime).

In my case, the structure is stored in a persistent_term which offers the fastest reads in Elixir after declaring variables directly in code.

With this, the HTTP server part ends. In the next post, we’ll see how we build HTML from Markdown and how we do the “bundle” to the “static” folder.

This has been translated to english with Claude from the original Spanish post