Add more popular datasets to graphscope built-in datasets · alibaba/GraphScope#1015

Repository-Metriken

Stars: (2.401 Stars)
PR-Merge-Metriken: (Durchschn. Merge 1m) (8 gemergte PRs in 30 T)

Beschreibung

We have several built-in datasets that can be easily loaded in one-line, located in the dataset directory of Aliyun OSS bucket graphscope, and the corresponding utility function to load them, located in python/graphscope/dataset/. We are planning to enrich the datasets continuously.

There's the procedure to add new datasets:

Find a popular and appropriate dataset, adapt the format to property graph if necessary,
Put all data files inside a folder, give the folder a meaningful name,
Compress the folder, then upload the compressed file together with the original folder to the dataset folder of the OSS bucket. Assume you have a folder named foo/, and two files foo/nodes.csv and foo/edge.csv, after this step, you will have the following file structure in the bucket:

dataset
|-- foo.tar.gz
|-- foo
    |-- nodes.csv
    |-- edge.csv

Write the loading function load_foo in a new file named python/graphscope/dataset/foo.py.
A corresponding unit test is appreciated!

Contributor Guide

Research-Richtung: Das Issue bietet ein klares schrittweises Verfahren zum Hinzufügen eines neuen Datensatzes. Zuerst finden Sie einen beliebten Graph Datensatz und passen ihn bei Bedarf an das Property Graph Format an. Erstellen Sie dann einen Ordner mit dem Namen des Datensatzes, der CSV oder andere Dateien enthält, komprimieren Sie ihn als .tar.gz und laden Sie sowohl die komprimierte Datei als auch den ursprünglichen Ordner in den Datensatz Ordner im Aliyun OSS Bucket hoch. Schreiben Sie anschließend eine Python Ladefunktion in einer neuen Datei unter python/graphscope/dataset/, die dem Muster vorhandener Loader folgt. Fügen Sie schließlich einen Unit Test für den neuen Loader hinzu. Sehen Sie sich vorhandene Datensatz Loader im Repository, wie z.B. die im selben Verzeichnis, als Referenz an.
Tech Stack: python
Domain: backenddata
Issue Type: Funktion
Schwierigkeit: 3
Geschätzte Zeit: 1-2 Tage
Aktivitätsstatus: Aktiv
Klarheit: Klar
Voraussetzungen: PythonGit
Einsteigerfreundlichkeit: 80

Repository-Metriken

Beschreibung

Contributor Guide

Erhalte frische Easy Issues per E-Mail.