Code generation is a good way to automate writing repetitive code, also known as boilerplate. Although programming languages are getting better at reducing how much repetitive code you need to write, it does happen with some frequency.
Another use-case is generating the same code on a variety of different languages. For instance when writing a software development kit (SDK) for an API. This is the idea behind Protocol Buffers and OpenAPI which let you write your models and services in a domain-specific language (DSL).
An interesting use-case is writing a cross-language library with idiomatic interfaces in different languages. Normally if you write a library you choose the language up-front and that is the only language you consider supporting. If you do decide to make the library more widely accessible you need to write it in C. The problem with writing it in C is that, even though you can use it from practically any other language, the interface is going to be very clunky. With code generation you can provide nicer idiomatic interfaces for other languages. Watch out for an upcoming post about one such library 😉.
A more advanced use-case is generating code from a higher-level specification. This is what Flow does to generate animation code in Swift and HTML.
In this post I introduce some common approaches to code generation.
How to generate code?
There are a number of ways of generating code. They range from using
How easy is it to add a new output language or format? Values range from 1 where you need to start from scratch for each new output language to 5 where you can add a new output language without having to modify any existing code.
How expressive is the code generation language? Values range from 1 where there is no generation language to 5 where the generation language is a programming language in its own right.
How clear is the generation code? Values range from 1 where everything is a messy mix of languages in the same file to 5 where everything is written in the same language and easily followed by developers new to the codebase.
How unintrusive is the code generation? Values range from 1 where you need a completely new toolset to 5 where the code generation requires no additional dependencies.
Roll your own
This is the simplest form of code generation. Write a program that uses
- Protobuf plugins implement their code generation directly in C++. Have a look at their GitHub repo.
Model-based code generation tools take a model as input and generate code. The input model can be a domain-specific language or a standard data format like JSON. The most common model-based generators are Protobuf and OpenAPI (Swagger). These are great for model and SDK generation across multiple different languages but are very limited in their expressivity. Although the model code is clear and concise, the generated code is often messier than necessary (due in part to the lack of expressiveness).
The most common general-purpose generation mechanism is template engines. Templates are used extensively to generate HTML websites. Some common template engines are Jinja2, Mustache, PHP, eRuby, and Swift GYB. Templates work by interleaving verbatim output with generation instructions. With special control sequences in-between. When the template is executed the templating engine replaces the instructions with the generated code. See for instance the Mustache Demo.
Template engines shine when the generated code has a consistent structure, like in HTML. Most template engines (Mustache, Jinja2) lack expressivity: they only allow a small set of basic constructs (a.k.a. logic-free templates). While this is desirable for simple “fill in the blanks” use-cases, it will limit what you can generate.
The more expressive template engines (PHP, eRuby, GYB) are powerful but come with the price of having to introduce dependencies and separate languages to your codebase (unless, of course, your codebase happens to be in one of these languages). The template code can also end up being a jumbled mess of different languages.
See the Wikipedia page for a full list of template engines and their capabilities.
Some languages like C, C++, Julia, and Rust have built-in metaprogramming constructs. These let you mix normal code with meta-code. These are great for simple use cases but overuse often leads to hard-to-understand code. The main disadvantage is that you have to be using one of these languages to start with and even then you can only generate more of the same language.
Code generation is a powerful tool. It lets you automate writing boring repetitive boilerplate code, write cross-language interfaces and SDKs, and generate code from higher-level specs.
But it's not without issues. One of the main issues with code generation is the need to mix the code that is doing the generation with the code that is being generated. In general these are two different languages. Better IDE support for this use case would be a good step in the right direction.
I will keep exploring options and sharing what I learn.