Custom labels in dynamic modeling and topic per class
#2,154 opened on 2024幎9æ20æ¥
Repository metrics
- Stars
- Â (5,074 stars)
- PR merge metrics
-  (30d ã« merged PR ã¯ãããŸãã)
説æ
Have you searched existing issues? ð
- I have searched and found no existing issues
Desribe the bug
Hi,
Iâve come across a small issue thatâs left me a bit puzzled. Iâm not entirely sure if itâs a bug, but it does seem confusing. When I run topics_per_class with my topic model:
topics_per_class = topic_model.topics_per_class(docs, classes=classes, global_tuning=False)
The resulting topics_per_class looks like this:
Topic | Words | Frequency | Class -1 | jaar, mensen, zegt, vrouwen, politie | 1421 | nl
This has no custom_labels. That is not ideal but OK.
But when I run this
fig = topic_model.visualize_topics_per_class(topics_per_class, custom_labels=True)
Then rerun topics_per_class in my jupyter, I get now this:
 | Topic | Words | Frequency | Class | Name -1 | jaar, mensen, zegt, vrouwen, politie | 1421 | nl | -1_mensen_vrouwen_politie_kinderen
Now the "Name" is added. Itâs unclear why it isnât already included or why thereâs no parameter like custom_labels=True when creating topics_per_class. I checked the documentation, and this parameter doesnât exist, and attempting to use it gives an error in the first step.
I just wanted to bring this up. As title indicates, Iâve also experienced the same issue with dynamic topic modeling as well.
I sometimes find the table version easier to use, so Iâm curious why it works like this, or maybe Iâm missing something?
This is only my second time reporting something (on GitHub in general), so I hope I explained it clearly. Thanks again for everyoneâs work and effort on BERTopic.
Best,
Reproduction
from bertopic import BERTopic
topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)
topics_per_class = topic_model.topics_per_class(docs, classes=classes)
print(topics_per_class.columns)
# No custom labels at this step
topic_model.visualize_topics_per_class(topics_per_class, top_n_topics=10)
print(topics_per_class.columns )
# Now custom labels are the last column in the df.
BERTopic Version
0.16.2