Confluent schema evolution in development and production

12 Nov 2023

Schema Evolution: The official word

TLDR

Free-for-all when solo developing
Practice your schema evolution when working as part of a team
Defined process for schema evolution in production

Schema evolution In development

In a development environment you have a lot of flexibility in terms of what to do around schema evolution. If your working on your own and dont care about any of the data, my favorite technique (when I dont feel like recreating the entire Kafka cluster) is delete all schema versions using the REST API, eg:

curl -X DELETE http://localhost:8081/subjects/mytopic-value

Assumes Schema Registry running on localhost, eg via docker compose - adjust as needed
If using TopicNameStrategy the default schema subject (name) is the name of the topic with -value appended. Source
Deleting all schemas will allow you to register changes that would otherwise have been incompatible with the default schema compatibility
Other approach: Set the default schema compatibility to NONE
Automatic schema registration: true by default, you may want to set false to simulate production and test deployment procedures and pipelines

Schema evolution In production

In a production environment, you dont normally have the luxury of being able to completely change the data schema. This is where schema evolution becomes necessary. If you experiment with evolution in the development environment you will gain confidence in how to evolve schemas in environments you care about.

For production environments, Confluent recommend disabling automatic schema registration. The aim is to allow ops teams to take control over data schemas to safeguard correct app execution.

With this in mind, production scheme evolution should look something like this:

Prevent automatic schema registration
Develop a process to register schemas with oversight: Either manually or via CI/CD
Decide on an overall corporate strategy around schema compatibility (BACKWARD/BACKWARD_TRANSITIVE/FORWARD/ect) and configure schema registry
Special case schemas compatibility where required

See below for details.

Prevent automatic schema registration

Automatic schema registration is on by default
On the server side, this is normally achieved through Confluent RBAC and principle of least privilege
Java clients can be configured with auto.register.schemas=false

Server-wide default schema compatibility

Reference
Config file variable: schema.compatibility.level
Docker environment variable: SCHEMA_REGISTRY_SCHEMA_COMPATIBILITY_LEVEL
Applies to all schemas which do not explicitly set schema compatibility at the schema level (the default)

How to register schemas

With auto.register.schemas=false set, these are the options to register schemas:

Per-schema compatibility

Sometimes you will need to special case the schema compatibility mode for a given schema. This can be done using the same techniques used to register schemas as above.

geoffwilliams@home:~$