ClickHouse Null Engine and Handling Log Events

dboost.me
2 min readFeb 10, 2024

--

We are going to see how Null ClickHouse engine can be useful and how can we implement a simple logging system. Our goal here is to implement a logging system where different types of log data would be inserted into the same kafka topic but within the ClickHouse we should get a table for each data type.

Assuming our topic raw_sensor_logs is filled with the following data (check how to do that here):

sensors_data = [
{
'timestamp': '2024-02-10 21:29:24.788366',
'type': 'wind',
'speed': 10,
'direction': 'NV'
},
{
'timestamp': '2024-02-10 21:29:19.248227',
'type': 'temperature',
'degrees': 30,
},
{
'timestamp': '2024-02-10 21:28:15.124438',
'type': 'wind',
'speed': 12,
'direction': 'N'
},
]

We can proceed with configuring ClickHouse to separate sensor data by sensor type. Let’s create a table referencing a kafka topic and prepare separate tables for each log type:

create table kafka_raw_sensor_logs (
json String
) engine = Kafka settings
kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'raw_sensor_logs',
kafka_group_name = 'group-1',
kafka_format = 'JSONAsString'
;


create table sensor_temperature (
timestamp Datetime,
temperature int
) engine = MergeTree()
order by timestamp;


create table sensor_wind(
timestamp Datetime,
speed int
) engine = MergeTree()
order by timestamp;

In order to populate those target tables we are going to create a table with a Null engine and couple materialized view which would populate target data table from the raw_sensor_logs_null table. Null engine won’t be storing any data, but it makes possible for materialized views to do that. Thus only target tables sensor_wind and sensor_temperature will be storing the data.

create table raw_sensor_logs_null (
json String
) engine = Null;


create materialized view wind_consumer to sensor_wind as
select
parseDateTimeBestEffort(JSONExtractString(json, 'timestamp')) as timestamp,
JSONExtractString(json, 'speed') as speed
from raw_sensor_logs_null
where JSONExtractString(json, 'type') = 'wind';


create materialized view temperature_consumer to sensor_temperature as
select
parseDateTimeBestEffort(JSONExtractString(json, 'timestamp')) as timestamp,
JSONExtractString(json, 'degrees') as temperature
from raw_sensor_logs_null
where JSONExtractString(json, 'type') = 'temperature';

Everything is prepared, now we just need to start consuming the data from kafka and forward the data to raw_sensor_logs_null table, to be stored by materialized views in the tables sensor_wind and sensor_temperature . So here it is:

create materialized view kafka_consumer to raw_sensor_logs_null as
select
json as json
from kafka_raw_sensor_logs;

Conclusion

Here we have shown a cool way how we can create a single kafka topic where we can publish different types of log data which is going to be processed and stored separatly. We have used a Null engine trick to avoid storing raw data.

--

--

No responses yet