unslothai/unsloth

Kaggle - GPT OSS Data set

Open

#3400 aperta il 2 ott 2025

Vedi su GitHub
 (0 commenti) (0 reazioni) (0 assegnatari)Python (5658 fork)batch import
help wanted

Metriche repository

Star
 (64.271 star)
Metriche merge PR
 (Merge medio 3g 15h) (525 PR mergiate in 30 g)

Descrizione

Hi I am trying ot run the kaggle version but I have a list of .md files which I wan to feed the model to train and then export the model. but is .md file enough to feed model I can see you have this ,

from datasets import Dataset
dataset = Dataset.from_list([{"prompt" : [{"role": "user", "content": prompt.strip()}], "answer" : 0, "reasoning_effort": "low"}]*1000)
maximum_length = len(tokenizer(prompt.strip())["input_ids"])
print(maximum_length)
dataset[0]

given that I wan this plus all my md files , how best to merge these any tips

Guida contributor