UUID as silver bullet

2019/04

We narrow down UUIDs to the ones defined by RFC4122 which are therefore 128-bit long values consisting of 36 characters of the form xxxxxxxx-xxxx-Axxx-Bxxx-xxxxxxxxxxxx where A is the version and B the variant. For general use version 4, which draws from a random distribution, is commonly used which leaves 122 bits randomly generated ensuring that collision probability is close to zero. Read up the theoretical background information in the Wikipedia article if you would like, since we will focus on practical issues. For generated sequences this comparison assumes 64-bit integers between -2^63 and 2^63 - 1 where the sequence is those of natural numbers.

The main benefit of global uniqueness lies in systems where the IDs are generated by the application, not the database which amplifies if multiple clients can write to multiple databases

Take an event-sourced CQRS architecture for example, where in strict design the commands are not allowed to return a value as compared to a classical approach where methods commonly return a reference. If the consumer creates the ID it not only holds the reference in memory but also a valid state of the aggregate and therefore can continue to do work.

There are also some minor benefits which you can mention in an argument for UUIDs:

  1. You want to migrate data from an existing database to another database without worrying about conflicts
  2. You want to implement high-performance equality checks and hash codes relatively easy, i.e. UUIDs are unique and not nullable
  3. You want security through obscurity if you give the entity ID to consumers, i.e. enumeration becomes more difficult.

However, I would argue that if you do not have a distributed system where multiple clients can write, you will certainly not need an UUID as primary identifier in your databases. Even then there are often high-performance solutions to create unique integer identifiers with Twitter's snowflake algorithm or a centralized database service.

You never need an UUID as primary identifier there is only a single database node for writing

Often the best solution to the issue at hand is to use a natural identifier which can be hashed and therefore will work in distributed systems with multiple write nodes.

The inherent issue comes from the B-tree data structure used by databases which makes an insert very costly by splitting the entire tree. Also you need to be careful that the database uses a native implementation and not just a CHAR(36) string which are 36 bytes or 16 bytes if optimized. This could easily break your neck, performance wise, since it can top the threshold for in-memory usage and will take vastly more space since foreign references add up.

Real human beans

Another big issue which often gets overlooked is that UUIDs on the consumer-side are not human friendly. Think about a dialogue on the support hotline:

Supporter: "Which order are you talking about"
Customer: "Oh, it’s 2f52b103-82ff-4d17-a58d-3add255d4624."

Some people advocate that this problem can be solved by using a hashing algorithm to reduce the number of characters, but does a6A4Xde2!=4 really sound better?

What you generally can do is to differentiate between external and internal IDs, so you could internally use UUIDs to avoid clashes and externally generate an identifier which is human-friendly if you are concerned about readability. Or you do it the other way around if you are concerned about performance and internally use a custom integer which does not need to be a sequence.

Spring problems

You use Spring Data JPA? Then I have got bad news for you, look at SimpleJPARepository#save:

if (entityInformation.isNew(entity)) {
  em.persist(entity);
  return entity;
} else {
  return em.merge(entity);
}

The isNew-method returns true if the ID is null otherwise he needs to do a SELECT before, instead of just one INSERT statement. One solution is to create a versionized field and a custom equals and of course the hashCode method:

@Entity
public class Data {
    @Id
    @Column(name = "id", length = 16, unique = true, nullable = false)
    private UUID uuid;

    @Version
    private Long version;

    private String field;

    private Data() {}

    public Data(String field) {
        this.uuid = UUID.randomUUID();
        this.field = field;
    }

    public Data(UUID uuid, String field) {
        this.uuid = uuid;
        this.field = field;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        Data data = (Data) o;
        return Objects.equals(uuid, data.uuid);
    }

    @Override
    public int hashCode() {
        return Objects.hash(uuid);
    }
}