Add more popular datasets to graphscope built-in datasets · alibaba/GraphScope#1015

(0 commenti) (0 reazioni) (1 assegnatario)HTML (301 fork)batch import

good first issue

Metriche repository

Star: (2401 star)
Metriche merge PR: (Merge medio 1m) (8 PR mergiate in 30 g)

Descrizione

We have several built-in datasets that can be easily loaded in one-line, located in the dataset directory of Aliyun OSS bucket graphscope, and the corresponding utility function to load them, located in python/graphscope/dataset/. We are planning to enrich the datasets continuously.

There's the procedure to add new datasets:

Find a popular and appropriate dataset, adapt the format to property graph if necessary,
Put all data files inside a folder, give the folder a meaningful name,
Compress the folder, then upload the compressed file together with the original folder to the dataset folder of the OSS bucket. Assume you have a folder named foo/, and two files foo/nodes.csv and foo/edge.csv, after this step, you will have the following file structure in the bucket:

dataset
|-- foo.tar.gz
|-- foo
    |-- nodes.csv
    |-- edge.csv

Write the loading function load_foo in a new file named python/graphscope/dataset/foo.py.
A corresponding unit test is appreciated!

Guida contributor

Direzione di ricerca: Il problema fornisce una procedura chiara passo passo per aggiungere un nuovo dataset. Innanzitutto, trova un dataset di grafi popolare e adattalo al formato del grafo delle proprietà, se necessario. Quindi, crea una cartella con il nome del dataset contenente file CSV o altri file, comprimila come .tar.gz e carica sia il file compresso che la cartella originale nella cartella del dataset nel bucket Aliyun OSS. Successivamente, scrivi una funzione di caricamento Python in un nuovo file sotto python/graphscope/dataset/, seguendo lo schema dei loader esistenti. Infine, aggiungi un test unitario per il nuovo loader. Guarda i loader di dataset esistenti nel repository, come quelli nella stessa directory, come riferimento.
Tech stack: python
Dominio: backenddata
Tipo issue: Funzionalità
Difficoltà: 3
Tempo stimato: 1-2 giorni
Stato attività: Attiva
Chiarezza: Chiara
Prerequisiti: PythonGit
Adatta ai principianti: 80

Metriche repository

Descrizione

Guida contributor

Ricevi issue Easy fresche nella tua inbox.