Calming down Clickhouse

Why is Clickhouse so intensive?!

2020-09-02
3 minutes

Clickhouse is a column oriented database engine, mostly designed for analytics. It was developed and deployed by Yandex in their analytics offering, but Plausible, my analytics provider of choice, uses it to store the analytics data, as it’s more suited to this than PostgreSQL.

Unfortunately, the official docker container is pretty crazy intensive. As soon as Plausible starts up, Clickhouse jumps to around 10% CPU usage, and sustains 30MB/s writes to the database directory. My hourly ZFS snapshots allocate over 1GB each! The server I run Plausible is reasonably powerful, but even this is just crazy!

Initially I thought this was a bug in Plausible, as it only occurred when Plausible was running. However, after raising an issue, and being told Plausible’s production servers don’t appear to suffer this issue, it was time to dig deeper.

Before anything else, let’s take a quick look at the data directory for anything obvious:

$ df -hs clickhouse/
8.2G    clickhouse

#Looking at the data

Looking at the filesystem structure of clickhouse/ (nice simple layout!), most of the usage came from the system.query_log and system.query_thread_log tables. These aptly-named tables are responsible for logging all queries executed by Clickhouse, and for me had over 20 million rows in, each! After scouring through the docs, this is enabled to aid with debugging. For the majority of people, you don’t really need all this data, and it’s definitely not worth the trade-offs of the huge amount of disk usage relative to the actual stored data.

#Log file

Looking at the Clickhouse log file (/var/log/clickhouse/clickhouse-server.log), it was incredibly verbose, and outputting far more than it needs. The default log level is set to trace, which basically means absolutely everything is logged, even more so than regular debug logs. The log level of trace matches the mantra of helping with debugging, but again is unnecessary in the majority of environments.

#Calming down

So, given how Clickhouse is behaving, and logging far too much data into both the system tables and log files, how can we go about toning Clickhouse down? Chances are it’s the additional logging work both to tables and log files which is using up the extra resources.

#Stop logging queries

The first step, and likely the one which makes the most difference, is to flat out disable query logging. Doing this should alleviate CPU cycles spent collecting This is done using the log_queries and log_query_threads variables in users.xml:

users.xml
<clickhouse>
    <profiles>
        <default>
            <log_queries>0</log_queries>
            <log_query_threads>0</log_query_threads>
        </default>
    </profiles>
</clickhouse>
<update>

Older versions of Clickhouse use yandex as the root node rather than clickhouse. The file also needs to go in /etc/clickhouse-server/users.d/*.xml.

</update>

#Reduce logging

Step 2 is to reduce the log level on both the file logger, and any other table logging we can’t disable entirely. This change is done in config.xml (or a file in /etc/clickhouse-server/config.d/*.xml). I set the level to warning, so it’s still obvious to see when something is wrong, but these all go into the shell, so it’s fine. All the table logs are completely disabled.

config.xml
<clickhouse>
    <logger>
        <level>warning</level>
        <console>true</console>
    </logger>
    <query_thread_log remove="remove"/>
    <query_log remove="remove"/>
    <text_log remove="remove"/>
    <trace_log remove="remove"/>
    <metric_log remove="remove"/>
    <asynchronous_metric_log remove="remove"/>

    <!-- Update: Required for newer versions of Clickhouse -->
    <session_log remove="remove"/>
    <part_log remove="remove"/>
</clickhouse>

For easier debugging, I also push logs through to the console with <console>true</console>, so it can be seen using docker-compose logs.

In addition to reducing the log level, I moved the log files to atmpfs mount. I did this initially to reduce writes to disks, but it’s still there anyway just in case it makes a difference.

#Reclaim some disk space

The final step is to reclaim the disk space we lost to the overly verbose logs. Because everything was logged to tables, it’s all still around. This probably won’t impact runtime performance much, but disk usage went from 8GB to 200MB, which is quite nice!

  1. Log in to the Clickhouse shell: docker-compose exec clickhouse bash
  2. Truncate the existing logs:
Bash
clickhouse-client -q "SELECT name FROM system.tables WHERE name LIKE '%log%';" | xargs -I{} clickhouse-client -q "TRUNCATE TABLE system.{};"

#Review

Reading through Clickhouse’s documentation, it’s incredibly configurable for any scale of need - unsurprising given it’s being used by Yandex. With that said, whilst the defaults might make sense at that scale, for the smaller use case they’re not especially appropriate.

I’m still currently convinced there’s a small bug in Plausible, in the healthchecks performing too many queries, but that’s another day’s investigation.

After deploying these changes, not only have resource uses dropped off a cliff, but Plausible even seems a bit snappier!

Share this page

Similar content

View all →

Keeping your Docker containers up to date

2020-07-27
4 minutes

Last year, I switched all of my hosting from arbitrarily installed packages to Docker. This made installing and configuring incredibly simple, but updating a little less defined. Whilst Docker itself is updated through the system package manager (probably), the containers themselves aren’t. Docker container versions are known as “tags”, and…

Freight Containers on a Ship

Backing up and restoring Docker containers

You should back up your data, properly! If you’re not, you’re playing a dangerous game with fate. Computers are pretty reliable, but they also go wrong, often. You should always backup your files, but backing up a containerized application isn’t quite as simple. A container is 3 things:ConfigurationVolumesNetworking The point…

None

Docker in LXC

Docker is a great containerization technology for running applications. It keeps multiple applications completely isolated from each other, only allowing connections exactly when you tell them to. But what if you’re on a hypervisor? You want your host OS to be as lean as possible (else it defeats the point),…