alibaba/GraphScope
在 GitHub 查看Add more popular datasets to graphscope built-in datasets
Open
#1,015 创建于 2021年11月17日
good first issue
描述
We have several built-in datasets that can be easily loaded in one-line, located in the dataset directory of Aliyun OSS bucket graphscope, and the corresponding utility function to load them, located in python/graphscope/dataset/. We are planning to enrich the datasets continuously.
There's the procedure to add new datasets:
- Find a popular and appropriate dataset, adapt the format to property graph if necessary,
- Put all data files inside a folder, give the folder a meaningful name,
- Compress the folder, then upload the compressed file together with the original folder to the
datasetfolder of the OSS bucket. Assume you have a folder namedfoo/, and two filesfoo/nodes.csvandfoo/edge.csv, after this step, you will have the following file structure in the bucket:
dataset
|-- foo.tar.gz
|-- foo
|-- nodes.csv
|-- edge.csv
- Write the loading function
load_fooin a new file namedpython/graphscope/dataset/foo.py. - A corresponding unit test is appreciated!