Representing Transformations

Introduction

Transformations let you map points or vectors from one coordinate system to another. Don’t worry, I am not going to get too deep into the math. Transformations are used extensively in computer graphics. They let you move, scale, rotate and skew things on the screen. You can also do projective transformations but we’ll leave those out for now.

More importantly transformations let you have views with local coordinate systems. What this means is you can design a view and its contents independently of where in the screen that view ends up. This also enables having nested views. This is how UIKit and every other UI framework does things.

So how do you apply a transformation? The most basic transformation is a translation. You translate a point by adding an offset to the x and y components. This offset is known as a vector. This can me expressed as p_new = p + v. For instance if you have a view you can place it anywhere on the screen by translating its coordinate system by the view’s position (x and y).

For rotations the math gets more interesting and involves sines and cosines. The important thing is that every transformation can be expressed as arithmetic operations on points (or vectors).

The Matrix

The industry-standard way of representing transformations is matrices. Turns out all those arithmetic operations can be concisely represented as a matrix multiplication. Do check out the Wikipedia page on Transformation matrix for more details.

This is actually how most UI frameworks represent transformations, including UIKit. But have you ever seen a transformation matrix? It is basically a list of 16 floating point numbers (9 for 2D transformations). Not very helpful when you are trying to figure out what went into transforming that view.

Also it is often necessary to invert a transformation. For instance if you want to map a point on a view back to screen coordinates. This can be done by inverting the transformation matrix. The problem is that doing matrix inversions is expensive, introduces round-off error, and is not even defined in some situations (like if you made a mistake when building the matrix).

Can we do better?

One Piece at a Time

The way you build a complex transformation is one operation at a time. For instance for a view you would first translate then rotate and then maybe translate again. You rarely have a pre-built matrix of unknown values.

So why not store the transformation as a list of transformation operations? This has a few advantages over the matrix representation. For one there is no accumulation of error. Also you get cheap matrix inversion, you can simply reverse the operations and invert each one individually. But more importantly you get a clear representation that you can use when debugging and when saving to text-based formats like JSON.

An operation can be conveniently represented in Swift using an enum with associated values:

public enum Operation2D {
    case translate(CGFloat, CGFloat)
    case scale(CGFloat, CGFloat)
    case rotate(Angle)
    case skew(Angle, Angle)
}

And a transformation is simply represented as a list of operations:

public struct Transform2D {
    public var steps = [Operation2D]()
}

We also need to add methods that apply the transformation to points and vectors. That is left as an exercise for the reader. An interesting method is calculating the transformation inverse:

public func inverted() -> Transform2D {
    return Transform2D(steps.reversed().map({ $0.inverse }))
}

Cool huh? The full source code is available on GitHub.

Discussion

Matrices are the most general, fool-proof way of representing transformations. But for most practical applications they obfuscate transformations, add unnecessary round-off error, and have performance overhead for common use cases. The piecewise approach addresses these issues and also provides a human-readable representation for debugging or serialization.

Swift’s enums with associated values are a great way of representing the pieces. It leads to a simple, easy to follow implementation of a transformation type with no overly complicated math. The implementation I shared is limited to 2-dimensional transformations but these same ideas can be applied to 3 or more dimensions.