reworkd/AgentGPT
View on GitHub✨ Investigate the best similarity score threshold to remove duplicate tasks
Open
#729 opened on Jun 6, 2023
enhancementhelp wanted
Description
When we generate tasks, we filter tasks that have a similarity score that is too close to existing tasks in the vector database
similar_tasks = memory.get_similar_tasks(
task, score_threshold=0.95 # TODO: Once we use ReAct, revisit
)
This is done with the help of the code above. Arbitrarily, we use 0.95. Even with this, the task may not be related.
On the other hand, there may be very related / duplicated tasks that have a score that is less than this.
This ticket is tasked with investigating what the best value for this threshold is, or to use some other means of calculating similarity for this given case.