· Tutorial · 11 min read
Turn PostgreSQL to an event streams with Kafka Connect & Debezium
We will leverage CDC pattern to turns your existing databases into an event stream 🌊
Debezium is a distributed platform that converts information from your existing databases into event stream, enabling applications to detect, and immediately respond to row-level changes in the databases.
In this article, we will use the Debezium PostgreSQL connector to captures row-level changes that insert, update, and delete database content and streams them to Kafka topics.
Since PostgreSQL 10, there is a logical replication stream mode, called pgoutput
that is natively supported. This means that a Debezium PostgreSQL connector can consume that replication stream without the need for any additional plug-ins. This is particularly valuable for environments where installation of plug-ins is not supported or not allowed.
docker-compose.yml
A good way to start is to use the Confluent Cloud stack with docker-compose seen in my last blog post.
I just added two environment variables to the control-center
service related to Kafka Connect :
Next, we add Kafka Connect with debezium-connector-postgresql
:
Finally, we add PostgreSQL service :
Starting the services
You can now start all the services with docker-compose up -d
:
After some startup time, you can access Control Center at http://localhost:9021
and see the PostgresConnector
available in the Connect section :
Configuring PostgreSQL wal_level
PostgreSQL support 3 types of Write Ahead Log levels, configured via wal_level
:
replica
minimal
logical
Only logical
adds information necessary to support logical decoding. So let’s enable it !
Configuring PostgreSQL client authentication
PostgreSQL need to accept connection from Debezium PostgreSQL connector. Client authentication is controlled by a configuration file, which traditionally is named pg_hba.conf
. It should work out of the box 🥳
Register Debezium PostgreSQL connector
Now it’s time to register the Debezium PostgreSQL source connector on our Kafka connect service.
Let’s use the Kafka Connect REST API instead of the UI 🤓
You should now be able to see the cdc-connector
running in your Control Center 😎
You can also view the replication_slot
created by Debezium using SQL :
With the formatted output :
Do some SQL changes
Debezium PostgreSQL source connector is now up and running, ready to do Change Data Capture on our Database, and send these events into Kafka.
Let’s do some SQL actions !
Viewing events in Kafka
We just made 4 changes on our database with :
CREATE TABLE
(not captured)INSERT
UPDATE
DELETE
Using AKHQ, you should be able to see 4 events in the topic cdc.public.users
!
First mesage on INSERT
:
Second message on UPDATE
:
Third message on DELETE
:
There is also a fourth empty message, with the following key
:
Source Code
You can find the complete sources on my GitHub 🫡