top of page

Database Replication for Postgres/Oracle using Debezium and Kafka

It is common for other databases outside of the 'source' database to require up-to-date data to other systems. This tutorial will step through setting up this replication from stratch.

There are two sets of data that will be replicated. The first is HR related tables (employees, departments, jobs, locations) which will be replicated from Oracle to Postgres. The second is baseball related tables (game, team, venue) which will be replicated from Postgres to Oracle.


Components Overview


Volumes

Several volumes need to be presented to various containers to persist data. Be sure to review/update the docker-compose.yaml and the volume references.

Container

Target Mount

oracle

/opt/oracle/oradata

postgres

/pgdata


Ports

Several ports are exposed to interact with the deployed containers and software deployments. Note that Postgres is using the non-default port setting of 5433 to avoid any already running Postgres clusters.

Container

Ports

oracle

1521

postgres

5433

zookeeper

2181,2888,3888

kafka

9092,29092,9999

connect

8083,9080,9012

prometheus

9090

grafana

3000


Oracle

Oracle Database Express Edition is used in this tutorial. Prior to using this tutorial be sure that you have read and understand the licenses around Express Edition.

The container used is container-registry.oracle.com/database/express:21.3.0-xe. As this container spins up for the first time it could take several minutes to create the new instance.


Setup

Clone the github repository and navigate to the debezium-poc directory. All of the steps assume the current directory is debezium-poc.


GitHub Repo: https://github.com/cbrianpace/kafka-ora2pg


Oracle Instant Client

Before performing the build, the Kafka Connect container needs the Oracle Instant client. Refer to the Oracle Instant Client download page to download the necessary packages and review the license agreement. Download the Basic Package and extract the contents into the connect/instantclient directory.


Here is an example of the process for Mac OS:

cd connect 

wget https://download.oracle.com/otn_software/mac/instantclient/198000/instantclient-basic-macos.x64-19.8.0.0.0dbru.zip  

unzip instantclient-basic-macos.x64-19.8.0.0.0dbru.zip 

mv instantclient_19_8 instantclient 


Docker Compose

Review the docker compose file and make any necessary adjustments for volumes. Note that if any port modifications are made there may be requirements to modify the connector json files.



Deploy the environment using docker-compose.

docker-compose -f docker-compose.yaml up --build -d 


Oracle Setup

Once the Oracle container is fully up and running (look for the 'Database is Ready' banner in the container log), the database needs to be modified to enable archive log mode, supplemental logging, etc.


Exec into the Oracle container and perform these setups. The database will restart several times.


docker exec -it oracle /bin/bash 
./setup-oracle.sh 
sqlplus sys/welcome1@localhost:1521/xepdb1 as sysdba @sql/hr-ora.sql 
sqlplus sys/welcome1@localhost:1521/xepdb1 as sysdba @sql/mlb-ora.sql  

The container may also report a few ORA-600s which can be ignored as this is related to the container environment itself and does not affect the functionality needed for this tutorial.


Postgres Setup


To setup Postgres database execute the commands below.


docker exec -it postgres /bin/bash 
psql -p 5433 -f /var/lib/pgsql/sql/hr-pg.sql 
psql -p 5433 -f /var/lib/pgsql/sql/mlb-pg.sql 

Register the Connectors

Using curl, register the connectors with Kafka-Connect.

curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @connectors/oracle-connector.json 
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @connectors/postgres-connector.json 
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @connectors/postgres-sink.json 
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @connectors/oracle-sink.json 


Check Kafka Connect Logs

To ensure that everything is working from the Kafka-Connect perspective, view the logs from the kafka-connect conatiner.

docker logs kafka-connect -f 

Test Replication


Update Oracle

Exec into Oracle container and start sqlplus.

docker exec -it oracle /bin/bash 
sqlplus sys/welcome1@localhost:1521/xepdb1 as sysdba 

From within the sqlplus session, execute the following SQL statements.

INSERT INTO hr.employees (employee_id, first_name, last_name, hire_date) VALUES (200, 'George', 'Washington', sysdate); 

SELECT SUM(salary) FROM hr.employees; 

UPDATE hr.employees SET salary=salary*1.1; 

COMMIT; 

SELECT SUM(salary) FROM hr.employees; 


Verify Postgres

With the changes made to the employees table in Oracle, verify the updates in Postgres. First, start psql using docker exec.

docker exec -it postgres psql -p 5433 -d hr 

Run the following SQL to verify replication.

SELECT * FROM employees WHERE last_name='Washington'; 
SELECT SUM(salary) FROM employees; 

Update Postgres

Now let us test the reverse replication using the sports tables. Connect to Postgres using psql and the sport database.

docker exec -it postgres psql -p 5433 -d sport 

Run the following SQL statements to update data in Postgres.

INSERT INTO venue (venue_id, venue_name, city, country) VALUES (900,'Crunchy Park', 'Jacksonville, FL', 'USA'); 

UPDATE venue SET venue_name='Pace Park' WHERE venue_id=136; 

Verify Oracle

Exec into Oracle container and start sqlplus.

docker exec -it oracle /bin/bash 
sqlplus sys/welcome1@localhost:1521/xepdb1 as sysdba 

Run the following SQL statement to verify the updates in Oracle.

SELECT * FROM sport.venue WHERE venue_id IN (900,136); 


Conclusion

Debezium helps bridge the data gap by performing change data capture in both Oracle and Postgres and publishing those messages to Kafka. The Oracle capture leverages logminner which does have some scalability challenges. On the Postgres side, Debezium leverages the native logical replication capabilities and scales better.


Last, Prometheus and Grafana is deployed with built in dashboards and mining of metrics published by Debezium. Be sure to check those out by access Grafana at http://localhost:3000. The default user/password for Grafana is admin/admin.

53 views0 comments

Comments


bottom of page