zulip/zulip

create topic ids in the database

Open

#1,191 创建于 2016年7月7日

在 GitHub 查看
 (16 评论) (1 反应) (0 负责人)Python (19,672 star) (7,339 fork)batch import
area: db cleanuparea: productionhelp wanted

描述

Currently stream topics are represented as strings in the Message table. This decision dates back to a time when users were not allowed to retroactively change topics on prior messages, so the original implementation decision was relatively harmless and simplified some code, where we didn't need to join to another table.

As the system has grown, not having a separate table for topics that maps an immutable id to transient attributes (like the topic name) has caused us pain.

One example is that when you change the name of a topic in Zulip, the back end has to basically fan out writes to change all messages that used the original topic name. If we were to create a separate topic table, we could change just one row in the database, and let subsequent queries pick up the new name automatically via joins. (I'm oversimplifying a bit; there would still be lots of things to orchestrate on the back end for topic name changes, even with a new table, but, trust me, it would be simpler.)

Another consequence of not having topic ids in our database is that tables other than the Message table also refer to topics by their names. An example of this is our topic-muting feature. When we record in the database which topics are muted, we use strings to refer to topics, not ids. So, this interacts with other features, like changing the name of a topic, which leads to even more fan-out writes when you rename a topic.

Another benefit of moving to a separate topic table is that we can more easily add features to our system that allow users to configure things about topics. Right now a topic is mostly just a tag on a message, and it can be muted/unmuted, but there's not much else interesting about a topic. This may change over time. An example would be that we might want to have certain topics be read-only or have restricted audiences. We might also want periodically running analytical processes to be able to write data to the topics table.

The main implementation challenge with changing how we represent topics in the database is just the sheer volume of code that would be affected. Also, there would be migration headaches.

There are two basic approaches to getting this done that I can think of.

  1. Do a massive big-bang change to the system, and be done with it.
  2. First fix we how write topics to the database (which is still biggish-bang in terms of changing all possible code that writes topic-related stuff), and then over time fix all the code that consumes topic information to use ids, not strings. (And then, of course, when no more code consumes strings, kill off the deprecated column in the database.)

The advantages and risks of #1 are hopefully obvious.

The advantages of #2 is that it will be a lot easier to distribute the programming work in an open-source environment. One hero does the big-bang work to fix the writes, and then many gnomes can slowly deprecate the string column in Message for particular features. The obvious disadvantage of #2 is that it creates complexity during the transition.

贡献者指南