Confluent schema evolution in development and production
TLDR
- Free-for-all when solo developing
- Practice your schema evolution when working as part of a team
- Defined process for schema evolution in production
Schema evolution In development
In a development environment you have a lot of flexibility in terms of what to do around schema evolution. If your working on your own and dont care about any of the data, my favorite technique (when I dont feel like recreating the entire Kafka cluster) is delete all schema versions using the REST API, eg:
curl -X DELETE http://localhost:8081/subjects/mytopic-value
- Assumes Schema Registry running on localhost, eg via
docker compose
- adjust as needed - If using
TopicNameStrategy
the default schema subject (name) is the name of the topic with-value
appended. Source - Deleting all schemas will allow you to register changes that would otherwise have been incompatible with the default schema compatibility
- Other approach: Set the default schema compatibility to
NONE
- Automatic schema registration:
true
by default, you may want to setfalse
to simulate production and test deployment procedures and pipelines
Schema evolution In production
In a production environment, you dont normally have the luxury of being able to completely change the data schema. This is where schema evolution becomes necessary. If you experiment with evolution in the development environment you will gain confidence in how to evolve schemas in environments you care about.
For production environments, Confluent recommend disabling automatic schema registration. The aim is to allow ops teams to take control over data schemas to safeguard correct app execution.
With this in mind, production scheme evolution should look something like this:
- Prevent automatic schema registration
- Develop a process to register schemas with oversight: Either manually or via CI/CD
- Decide on an overall corporate strategy around schema compatibility (
BACKWARD
/BACKWARD_TRANSITIVE
/FORWARD
/ect) and configure schema registry - Special case schemas compatibility where required
See below for details.
Prevent automatic schema registration
- Automatic schema registration is on by default
- On the server side, this is normally achieved through Confluent RBAC and principle of least privilege
- Java clients can be configured with
auto.register.schemas=false
Server-wide default schema compatibility
- Reference
- Config file variable:
schema.compatibility.level
- Docker environment variable:
SCHEMA_REGISTRY_SCHEMA_COMPATIBILITY_LEVEL
- Applies to all schemas which do not explicitly set schema compatibility at the schema level (the default)
How to register schemas
With auto.register.schemas=false
set, these are the options to register schemas:
- REST API
- Maven plugin
- Gradle plugin
- Terraform (Confluent Cloud)
- Julie Ops (in hibernation unfortunately)
Per-schema compatibility
Sometimes you will need to special case the schema compatibility mode for a given schema. This can be done using the same techniques used to register schemas as above.