unslothai/unsloth

Kaggle - GPT OSS Data set

Open

#3.400 geöffnet am 2. Okt. 2025

Auf GitHub ansehen
 (0 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (5.658 Forks)batch import
help wanted

Repository-Metriken

Stars
 (64.271 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 3T 15h) (525 gemergte PRs in 30 T)

Beschreibung

Hi I am trying ot run the kaggle version but I have a list of .md files which I wan to feed the model to train and then export the model. but is .md file enough to feed model I can see you have this ,

from datasets import Dataset
dataset = Dataset.from_list([{"prompt" : [{"role": "user", "content": prompt.strip()}], "answer" : 0, "reasoning_effort": "low"}]*1000)
maximum_length = len(tokenizer(prompt.strip())["input_ids"])
print(maximum_length)
dataset[0]

given that I wan this plus all my md files , how best to merge these any tips

Contributor Guide