unslothai/unsloth

Kaggle - GPT OSS Data set

Open

#3 400 ouverte le 2 oct. 2025

Voir sur GitHub
 (0 commentaires) (0 réactions) (0 assignés)Python (5 658 forks)batch import
help wanted

Métriques du dépôt

Stars
 (64 271 stars)
Métriques de merge PR
 (Merge moyen 3j 15h) (525 PRs mergées en 30 j)

Description

Hi I am trying ot run the kaggle version but I have a list of .md files which I wan to feed the model to train and then export the model. but is .md file enough to feed model I can see you have this ,

from datasets import Dataset
dataset = Dataset.from_list([{"prompt" : [{"role": "user", "content": prompt.strip()}], "answer" : 0, "reasoning_effort": "low"}]*1000)
maximum_length = len(tokenizer(prompt.strip())["input_ids"])
print(maximum_length)
dataset[0]

given that I wan this plus all my md files , how best to merge these any tips

Guide contributeur