alibaba/GraphScope

Add more popular datasets to graphscope built-in datasets

Open

#1,015 建立於 2021年11月17日

在 GitHub 查看
 (0 留言) (0 反應) (1 負責人)HTML (2,401 star) (301 fork)batch import
good first issue

描述

We have several built-in datasets that can be easily loaded in one-line, located in the dataset directory of Aliyun OSS bucket graphscope, and the corresponding utility function to load them, located in python/graphscope/dataset/. We are planning to enrich the datasets continuously.

There's the procedure to add new datasets:

  1. Find a popular and appropriate dataset, adapt the format to property graph if necessary,
  2. Put all data files inside a folder, give the folder a meaningful name,
  3. Compress the folder, then upload the compressed file together with the original folder to the dataset folder of the OSS bucket. Assume you have a folder named foo/, and two files foo/nodes.csv and foo/edge.csv, after this step, you will have the following file structure in the bucket:
dataset
|-- foo.tar.gz
|-- foo
    |-- nodes.csv
    |-- edge.csv
  1. Write the loading function load_foo in a new file named python/graphscope/dataset/foo.py.
  2. A corresponding unit test is appreciated!

貢獻者指南

Add more popular datasets to graphscope built-in datasets · alibaba/GraphScope#1015 | Good First Issue