Giter VIP home page Giter VIP logo

cfpq_data's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cfpq_data's Issues

Add graph size information

It could be very useful when you add information about the number of vertices and the number of edges for each graph.

Graph reduce

It would be cool to have an opportunity to get the graph with only edges or vertices of interest. For example, Enzyme only with broaderTransitive relationship edges.

Establish releases

We should establish releases of CFPQ_data as freeze points for graphs and queries.

  1. Tag on GitHub.
  2. I'm not sure, how we should freeze RDFs, which currently store in Google Drive. Maybe, we should move it to git-lfs. The reason, why RDFs currently store not in git-lfs is that git-lfs has a limit on upload-download operations and it may be a problem for users. Maybe we should just freeze RDFs in a specific folder in Google Drive and fix references to download. But in this case, each release will be in a separate branch. Or master should be for releases only. The dev branch for development.
  3. Release automation: tag creation, branch creation (or not), documentation regeneration, site update, artifacts freezing.

[FEATURE] Developer docs

Add documentation for developer.
Include:

  • Pre-commit
  • Test pipeline
  • Docs deploy
  • Package deploy
  • Guideline

Establish releases

We should establish releases of CFPQ_data as freeze points for graphs and queries.

  1. Tag on GitHub.
  2. I'm not sure, how we should freeze RDFs, which currently store in Google Drive. Maybe, we should move it to git-lfs. The reason, why RDFs currently store not in git-lfs is that git-lfs has a limit on upload-download operations and it may be a problem for users. Maybe we should just freeze RDFs in a specific folder in Google Drive and fix references to download. But in this case, each release will be in a separate branch. Or master should be for releases only. The dev branch for development.
  3. Release automation: tag creation, branch creation (or not), documentation regeneration, site update, artifacts freezing.

Prettify numbers

  • It is very difficult to look at the number of vertices, edges, etc. when it exceeds hundreds of thousands
  • It would be very helpful to add a three-digit separator
  • It is better to indicate the size of the graph in at least kilobytes

Update S3 data architecture and integration

Proposal

The data architecture at the moment looks like this:

  • 1.0.0 -- Release version of cfpq_data
    • MemoryAliases
      • graphs -- Separate rdf files, each packed in its own .tar.gz archive
    • RDF
      • graphs -- Separate rdf files, each packed in its own .tar.gz archive

I suggest to do the following data architecture:

  • Release version of cfpq_data
    • csv file for each graph

With CSV headers [from,label,to]

Pros

  • The proposed architecture simplifies the manipulation of graphs in S3
  • Loading will be faster
  • CSV is simpler and more convenient to represent labeled graphs than RDF

Cons

  • It may take more disk space than .tar.gz archives
  • CSV format is not as rich as RDF
  • No other data format provided

Add Java graphs to CFPQ_Data

The following graphs are taken from DaCapo benchmark suite.

All required data is uploaded to Google Drive.

  • Graph name: graphs/graph_name
  • Graph triplets: graphs/graph_name/data.csv
  • Edge Statistics: graphs/graph_name/edge_stats.csv
  • Node and Edge counts: graphs/graph_name/graph_stats.csv (columns 1 and 2 respectively)

Canonical grammar:
Alias โ†’ PointsTo PointsTo_r
PointsTo โ†’ (assign | load_f Alias store_f )* alloc

Improve Readme

  • Add repository structure description
  • Add data set description
    • Graphs
    • Queries
  • Add reference values for algorithms correctness checking
  • Add a list of works which use this data set

Add flags to generate a text representation of graphs and cnf grammars

This will be great if it is possible to get a representation of the graphs in text form and the cnf grammars after initialization. This will make it easier to run cfpq algorithms and there will be no need to convert formats. All the necessary scripts are already written, so it will be easy to implement.

How use graph_to_txt?

Hello,

I want to get a graph from a dataset in a txt file. To do this, I wrote a script like this:

import cfpq_data
g = cfpq_data.graph_from_dataset("generations")
path = cfpq_data.graph_to_txt(g, "test.txt")

But in the test.txt file I get something like this:

'N1a0fdccf1e5747358d788032136b2d3a' 'http://www.w3.org/1999/02/22-rdf-syntax-ns#first' 'N5b1227cd9d554e6b8376780df8c0ac4b'
'N1a0fdccf1e5747358d788032136b2d3a' 'http://www.w3.org/1999/02/22-rdf-syntax-ns#rest' 'http://www.w3.org/1999/02/22-rdf-syntax-ns#nil'
'N5b1227cd9d554e6b8376780df8c0ac4b' 'http://www.w3.org/2002/07/owl#someValuesFrom' 'Nf6e96ad96dee4cc380a425a39571412b'
'N5b1227cd9d554e6b8376780df8c0ac4b' 'http://www.w3.org/2002/07/owl#onProperty' 'http://www.owl-ontologies.com/generations.owl#hasChild'
...

Please tell me where I am wrong. Thank you

[FEATURE] Add grammars to CFPQ_Data

  • Add grammar support to the CFPQ_Data package (downloading, other utilities)
  • Add grammar templates to website
  • Add grammar instances for existing graphs to the storage
  • Update other website parts wrt the new grammars

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.