MaartenGr/BERTopic

Custom labels in dynamic modeling and topic per class

Open

#2.154 geöffnet am 20. Sept. 2024

Auf GitHub ansehen
 (5 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (634 Forks)batch import
buggood first issue

Repository-Metriken

Stars
 (5.074 Stars)
PR-Merge-Metriken
 (Keine gemergten PRs in 30 T)

Beschreibung

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Desribe the bug

Hi,

I’ve come across a small issue that’s left me a bit puzzled. I’m not entirely sure if it’s a bug, but it does seem confusing. When I run topics_per_class with my topic model:

topics_per_class = topic_model.topics_per_class(docs, classes=classes, global_tuning=False)

The resulting topics_per_class looks like this:

Topic | Words | Frequency | Class -1 | jaar, mensen, zegt, vrouwen, politie | 1421 | nl

This has no custom_labels. That is not ideal but OK.

But when I run this

fig = topic_model.visualize_topics_per_class(topics_per_class, custom_labels=True)

Then rerun topics_per_class in my jupyter, I get now this:

  | Topic | Words | Frequency | Class | Name -1 | jaar, mensen, zegt, vrouwen, politie | 1421 | nl | -1_mensen_vrouwen_politie_kinderen

Now the "Name" is added. It’s unclear why it isn’t already included or why there’s no parameter like custom_labels=True when creating topics_per_class. I checked the documentation, and this parameter doesn’t exist, and attempting to use it gives an error in the first step.

I just wanted to bring this up. As title indicates, I’ve also experienced the same issue with dynamic topic modeling as well.

I sometimes find the table version easier to use, so I’m curious why it works like this, or maybe I’m missing something?

This is only my second time reporting something (on GitHub in general), so I hope I explained it clearly. Thanks again for everyone’s work and effort on BERTopic.

Best,

Reproduction

from bertopic import BERTopic

topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)

topics_per_class = topic_model.topics_per_class(docs, classes=classes)

print(topics_per_class.columns)
# No custom labels at this step

topic_model.visualize_topics_per_class(topics_per_class, top_n_topics=10)

print(topics_per_class.columns )
# Now custom labels are the last column in the df.

BERTopic Version

0.16.2

Contributor Guide