unslothai/unsloth

Kaggle - GPT OSS Data set

Open

#3.400 aberto em 2 de out. de 2025

Ver no GitHub
 (0 comments) (0 reactions) (0 assignees)Python (5.658 forks)batch import
help wanted

Métricas do repositório

Stars
 (64.271 stars)
Métricas de merge de PR
 (Mesclagem média 3d 15h) (525 fundiu PRs em 30d)

Description

Hi I am trying ot run the kaggle version but I have a list of .md files which I wan to feed the model to train and then export the model. but is .md file enough to feed model I can see you have this ,

from datasets import Dataset
dataset = Dataset.from_list([{"prompt" : [{"role": "user", "content": prompt.strip()}], "answer" : 0, "reasoning_effort": "low"}]*1000)
maximum_length = len(tokenizer(prompt.strip())["input_ids"])
print(maximum_length)
dataset[0]

given that I wan this plus all my md files , how best to merge these any tips

Guia do colaborador