Giter VIP home page Giter VIP logo

Comments (31)

dgarijo avatar dgarijo commented on June 2, 2024 3

not all of them, just the ones you want to describe with those properties. A research object may contain many files. Some of them may be datasets. Some may be Slides, workflows, SoftwareApplications...

from ro-crate.

dgarijo avatar dgarijo commented on June 2, 2024 1

I don't think you need to adopt all of it, just the parts that cover your use cases (as you point out). In PROV we had like 3 main concepts and 8 relationships among them and people still said it was complicated...

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024 1

Ok - made that change in the example above. Fact remains that schema.org doesn't cover a lot of common use cases for describing tabular data, so should we look at providing a simplified subset of CSVW more or less corresponding to table_schema?

from ro-crate.

stain avatar stain commented on June 2, 2024 1

Thanks, @LauraWalters, for re-awakening this discussion - I've added this to the agenda for the RO-Crate Community Call this Thursday.

It would be good to hear more about your project's requirement on this, either in this issue or in the call.

Feel free to join if you have time, see #1 or https://s.apache.org/ro-crate-minutes for call details!

from ro-crate.

jmfernandez avatar jmfernandez commented on June 2, 2024 1

Yes, I agree, if the standard already exists, we should reuse it. And btw, it could be a nice example about using annotations based on third-party ontologies along with RO-Crate. We could even consider the inclusion of a list of useful standards / ontologies, depending on the use case.

from ro-crate.

ptsefton avatar ptsefton commented on June 2, 2024 1

Revisiting this as part of our work on the Text Commons RO-Crate profile.

Here's what we have now (including some new terms that are defined in a custom context)

A CSV file references a schema using the csvw:tableSchema property:


 {
      "@id": "files/427/original_bad0fd7f9c918df1db8b6a5b39faec48.csv",
      "@type": [
        "File",
        "Annotation"
      ],
      "name": "Transcript of interview with Patricia Colless full text transcription (CSV)",
      "encodingFormat": "text/csv",
      "annotationType": [
        {
          "@id": "olac:Transcription"
        },
        {
          "@id": "olac:TimeAligned"
        }
      ],
      "modality": {
        "@id": "olac:Orthography"
      },
      "annotationOf": {
        "@id": "files/503/original_779656ecdb38dfb06cee9440773692a7.mp3"
      },
      "language": {
        "@id": "https://www.ethnologue.com/language/eng"
      },
      "csvw:tableSchema": {
        "@id": "#dialog_schema"
      },
      "size": 54363
    },

{
      "@id": "#dialog_schema",
      "@type": "csvw:Schema",
      "name": "Table schema for dialogue transcript",
      "columns": [
        {
          "@id": "#speaker"
        },
        {
          "@id": "#transcript"
        },
        {
          "@id": "#start_time"
        },
        {
          "@id": "#notes"
        }
      ]
    },
    {
      "@id": "#speaker",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Which of the participants is talking in that particular utterance. ",
      "name": "speaker"
    },
    {
      "@id": "#transcript",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Transcription of speaker turn",
      "name": "text",
      "sameAs": {
        "@id": "olac:Transcription"
      }
    },
    {
      "@id": "#start_time",
      "@type": "csvw:Column",
      "description": "Start time of the utterance.",
      "name": "time",
      "sameAs": {
        "@id": "https://schema.org/startTime"
      }
    },
    {
      "@id": "#notes",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Additional information",
      "name": "notes"
    },

This has some advantages over the schema.org approach suggested above by @eocarragain many moons ago and used by Science on Schema.org.

  • This approach explicitly links to the column names rather than just being a convention that the name of a PropertyValue matches a CSV header.
  • The schema.org approach uses variableMeasured which is not always going to a good semantic match with the contents of a column. We're not measuring variables in our example, we're transcribing a conversation.

On the other hand, the csvw spec is very complicated and very strict, and by bringing it into the the schema.org world we're not really using it properly - the sameAs reference to schema.org terms feels like a bit of a hack.

Maybe we should aim to bring the best of csvw into schema.org? (And while we're at it we could include worksheet as level of orgnization so we can deal with spreadsheets)

from ro-crate.

mekline avatar mekline commented on June 2, 2024 1

Hi all - Psych-DS maintainer here, and I found this discussion following a meeting with some ROCrate people including @stain. Couple of points, and we'd be very happy to partner if there's a useful way to do so!

Psych-DS is designed largely to get researchers who probably should be storing CSV files in a well-structured directory, to actually do so. It essentially tries to provide an implementation of some standard best practices in file and directory structures (e.g. http://www2.stat.duke.edu/~rcs46/lectures_2015/01-markdown-git/slides/naming-slides/naming-slides.pdf) so that researchers can check whether they've succeeded in following those recommendations.

The minimal version of Psych-DS should be writable by hand, by a person who does not know anything about JSON files. It should be transparent to researcher what the information inside means, if that same person opens the json file. Or alternately, it should be implementable by a toolmaker who is not knowledgeable about linked data/JSON-LD/etc. but wants to provide users (who also don't know about those things!) with 'well structured' data that will be usable by other, similar tools.

The following really minimal schema.org/Dataset can provide an immediate benefit to the user - it can confirm whether there exists a CSV file, stored in the appropriate place, with a reasonable file name and an expected set of columns. If this is adopted widely, I would expect something like the following to be the extent of metadata for most datasets that exist, at least at first:

{
 "@context" : "http://schema.org/",
 "@type" : "Dataset",
 "name" : "Psych-DS Example Dataset",
 "description" : "This is a minimal example of a dataset for Psych-DS",
 "variableMeasured" : ["study_id", "sub_id", "age_years", "responded", "trial_id", "response"]
}

And over time, we would be nudging researchers toward something more like:

{
 "@context" : "http://schema.org/",
 "@type" : "Dataset",
 "name" : "Psych-DS Example Dataset",
 "description" : "This is a slightly bigger and Dr. Seuss themed example of a Psych-DS dataset",
"author": ["Cat Inthehat", "Theodor Geisel"],
"citation": "Inthehat, C., & Geisel, T. (2019). Article title about something, 2(1), 45–54. https://doi.org/dostring.",
"schemaVersion": "Psych-DS 1.0",
"license": "https://creativecommons.org/licenses/by/4.0/",
"usageInfo": "This dataset can be freely reused, but here are some limitations on what this data can/can't actually tell us or how it should be interpreted.",
 "variableMeasured" : [{"type":"PropertyValue","name":"study_id", "description":"This is the id code for the specific experiment the data point is from"},"more_variables"]}

I agree that schema.org/Dataset only sort-of meets our needs in terms of describing CSV files more fully, but it does provide a structured format that lets us validate against an externally established pattern while really minimizing the 'extra' that the user sees - there are essentially two lines of "magic", followed by no other explicit special syntax other than for variableMeasured.

If there was a subtype schema.org/TabularDataset we'd be all over it, as it might allow us to be more expressive about data type/formats & constraints - but something that's come up a few times is the idea of PDS a 'handoff' format - getting a researcher 80% of the way toward what a data curator/archivist might want it to be, with enough structure that e.g. the corresponding frictionless or ROCrate metadata can be automatically produced or inferred. One nice property of JSON-LD is the ability to store multiple objects in a single file - I wonder if all the JSON-LD metadata formats for tabular data could start supporting a common pattern along the lines of:

  • Allow metadata.json as an alternate name to the top-level metadata file
  • Allow/expect multiple JSON-LD objects to be present in the file
  • Parse only those with the relevant context or schemaVersion and ignore the rest.

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

Note: we may need a more general use-case for how to express sub-file/variable level metadata. Some concrete non-tabular examples would be good though

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

Discussed this on Editor's call 2019-08-08, and agreed it would be good to use the schema.org flavour if possible, e.g. Dataspice, Psych-DS

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

The table below compares the Frictionless Data tabular data specs with the schema.org variableMeasured property. It also shows the additional fields that the psych-ds team have added on top of their use of variableMeasured.

table_schema schema:variableMeasured psych-ds
dialect
name name schema:name
description description schema:name
title alternateName
type type
type>rdfType propertyId
format
constraints>required
constraints>unique
constraints>minLength minValue schema:minValue
constraints>maxLength maxValue schema:maxValue
constraints>minimum ~minValue
constraints>maximum ~maxValue
constraints>pattern
constraints>enum levels
missingValues na/naValues
primaryKey
foreignKeys
~type>rdfType unitCode schema:unitCode
~type>rdfType unitText schema:unitText
derivation
imputation

Notes:

  • table schema enumerates types (e.g. string, Boolean, number) and formats (e.g. email address, ISO8601) for fields, see https://frictionlessdata.io/specs/table-schema#types-and-formats. There is no equivalent in schema.org
  • table_schema doesn't have a direct equivalent of unitCode or unitText but type>rdfType could probably be used

from ro-crate.

dgarijo avatar dgarijo commented on June 2, 2024

This sounds a little like: https://www.w3.org/TR/tabular-data-primer/#string-restriction
Why not reuse it?
EDIT: Oh, I see you listed it above, but it covers all the constraints nicely...

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

@dgarijo agreed "csvw" is probably the most complete rdf-friendly way to do this. It also has the benefit that Google seem to be adopting it in the dataset search. However, we received quite strong feedback at Open Repositories that CSVW was 'too complicated' for most researchers & coders to pick up and use easily.

There may be ways around this in terms of how we present it in the RO-Crate spec, i.e. just provide examples of the most common cases, more or less equivalent to table-schema?

EDIT: if we did this, the psych-ds community might be a good test group as they are clearly struggling with the fact that schema.org doesn't quite do what they need

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

Example of what the schema.org approach would look like in an RO-Crate context:

{ "@context": "https://w3id.org/ro/crate/0.3-DRAFT/context",
  "@graph": [
  {
    "@id": "./",
    "@type": [
      "Dataset"
    ],
    "hasPart": [
      {
        "@id": "./table.csv"
      },
      ],
   },
  {
    "@id": "./table.csv",
    "@type": ["File", "Dataset"],
    "contentSize": "383766",
    "description": "A table capturing all my data",
    "variableMeasured": [
        {
        "type": "PropertyValue",
        "unitText": "metres",
        "name": "wall_width",
        "description": "The width of the wall in metres"
        },
        {
        "type": "PropertyValue",
        "unitCode": "CMT",
        "name": "wall_height",
        "description": "The height of the wall in centimetres"
        },
        {
        "type": "PropertyValue",
        "name": "datetime",
        "description": "The date and time of the measurement"
        },
    ]    
  },

]

Issue: in schema.org variableMeasured is only defined as a property of schema:Dataset, i.e. it cannot be used on an RO-Crate file as this maps to schema:MediaObject

EDIT: made the file a Dataset in the example above following @dgarijo's comments below

from ro-crate.

dgarijo avatar dgarijo commented on June 2, 2024

Are they disjoint (I don't see anything about that in schema.org)? If not, I don't see the problem in using them.

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

Would that mean making all ro-crate "files" be both schema:MediaObject and schema:Dataset?

from ro-crate.

jmfernandez avatar jmfernandez commented on June 2, 2024

I have a naive question: if the tabular format is an standard one, described in some ontology (but not at this granularity level), what should we do?

from ro-crate.

stain avatar stain commented on June 2, 2024

@dgarijo also mentions https://www.w3.org/TR/vocab-data-cube/

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

I have a naive question: if the tabular format is an standard one, described in some ontology (but not at this granularity level), what should we do?

@Stian suggested conformsTo or schema:additionalType (or maybe schema:schemaVersion)

from ro-crate.

eocarragain avatar eocarragain commented on June 2, 2024

isatab is another example

from ro-crate.

LauraWalters avatar LauraWalters commented on June 2, 2024

Hello, we are really interesting into using Ro-Crate for a project and this use case would also be really important for us. Is there any news on this in general or integrating an existing solution as listed above? Thanks!

from ro-crate.

stain avatar stain commented on June 2, 2024

Also worth looking at GA4GH Search API specification, which include a JSON-based table definition.

from ro-crate.

ptsefton avatar ptsefton commented on June 2, 2024

@stain @LauraWalters @jmfernandez - just want to re-awake this discussion.

Has anyone done this for RO-Crate?

I have a simple example I want to code up from here: https://github.com/JTrippas/Spoken-Conversational-Search

How should I turn their text description of columns in a CSV into something in RO-Crate? Or should I justt create a text file with the text in it and link it as an encoding format.

from ro-crate.

ptsefton avatar ptsefton commented on June 2, 2024

@stain the link to GA4GH Search API specification above is 404.

from ro-crate.

jmfernandez avatar jmfernandez commented on June 2, 2024

@stain the link to GA4GH Search API specification above is 404.

@ptsefton I have been having a look, and the repo and the target file were renamed. Here you are a more stable link to the example https://github.com/ga4gh-discovery/data-connect/blob/3a9be1fab628d0278eedcb5e70bb7d55f7d0a081/SPEC.md#table-discovery-and-browsing-examples

from ro-crate.

jmfernandez avatar jmfernandez commented on June 2, 2024

From the spec pointed out by @stain and my point of view, a CSV/TSV can be semantically described on one hand by the needed parameters to open it in R, Python or similar (encoding, column separator, comment character, etc...), and on the other hand enumerating the name, syntactic or semantic type and logical position of the columns.

EDIT: I have just read @ptsefton answer at #64 (comment) , and W3C tabular metadata spec seems to cover all these points.

from ro-crate.

ptsefton avatar ptsefton commented on June 2, 2024

@jmfernandez

How about we use W3C tabular metadata - but with its prefix so we get confused with different definitions of name for example.

Here's an example reworked from the example 2:

{
  "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}]
"@graph": [
  "@id": "tree-ops.csv",
  "name": "Tree Operations",
  "keyword": ["tree", "street", "maintenance"],
  "publisher": {
  ...
  },
  "license": {"@id": "http://opendefinition.org/licenses/cc-by/"},
  "dateModified": {"2010-12-31"},
  "csvw:tableSchema": {
    "csvw:columns": [{
      "csvw:name": "GID",
      "csvw:titles": ["GID", "Generic Identifier"],
      "description": "An identifier for the operation on a tree.",
      "csvw:datatype": "string",
      "csvwrequired": true
    }, {
      "csvw:name": "on_street",
      "csvw:titles": "On Street",
      "description": "The street that the tree is on.",
      "csvw:datatype": "string"
    }, {
      "csvw:name": "species",
      "csvw:titles": "Species",
      "description": "The species of the tree.",
      "csvw:datatype": "string"
    }, {
      "csvw:name": "trim_cycle",
      "csvw:titles": "Trim Cycle",
      "description": "The operation performed on the tree.",
      "csvw:datatype": "string"
    }, {
      "csvw:name": "inventory_date",
      "csvw:titles": "Inventory Date",
      "description": "The date of the operation that was performed.",
      "csvw:datatype": {"base": "date", "format": "M/d/yyyy"}
    }],
    "csvw:primaryKey": "GID",
    "csvw:aboutUrl": "#gid-{GID}"
  }
]
}



from ro-crate.

stain avatar stain commented on June 2, 2024

@ptsefton to have a go at reworking example with explicit @type and flattened JSON-LD. This can become a new page in the spec.

from ro-crate.

ptsefton avatar ptsefton commented on June 2, 2024

Have tried this out.

A CSV file can have a schema

image


Here we see a column definition referencing one with a similar spelling with sameAs

image

"@graph": [
    {
      "@id": "#Action",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "The action the participant takes in that utterance, these actions are described in the code book and allow for reproduction of the results.",
      "name": "Action",
      "sameAs": {
        "@id": "#Code"
      }
    },
    {
      "@id": "#Actor_pair",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "13 different pairs completed three tasks. This column distinguishes the different pairs for each task (1-13)",
      "name": "Actor_pair"
    },
    {
      "@id": "#Code",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "The action the participant takes in that utterance, these actions are described in Trippas et al. (2020)",
      "name": "Code",
      "sameAs": {
        "@id": "#Action"
      }
    },
    {
      "@id": "#File.name",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Indicating the group number (2-14) and the date of the experiment.",
      "name": "File.name"
    },
    {
      "@id": "#Notes",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Comments such as the particular search is stopped by the user or researcher or extra notes which relate to the action of the participant regarding the search session. *not included in the \"SCSdataset.csv\"",
      "name": "Notes"
    },
    {
      "@id": "#Query",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "The reference to the information need participants are solving.",
      "name": "Query"
    },
    {
      "@id": "#Query.complexity",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "One of three levels, referencing the task complexity type (remember, understand, and analyse).",
      "name": "Query.complexity",
      "sameAs": {
        "@id": "#Query_complexity"
      }
    },
    {
      "@id": "#Query.counter",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "A counter which keeps track of how many turns there have been between the participants in that conversation. For the initial data release only the first two turns are given. However, the first three turns are presented if the second turn is classified under the Meta-communcation Theme (See CHIIR 2017 paper for further information).",
      "name": "Query.counter",
      "sameAs": {
        "@id": "#Query_counter"
      }
    },
    {
      "@id": "#Query_complexity",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "One of three levels, referencing the task complexity type (remember, understand, and analyse).",
      "name": "Query_complexity"
    },
    {
      "@id": "#Query_counter",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "A counter which keeps track of how many turns there have been between the participants in that conversation.",
      "name": "Query_counter",
      "sameAs": {
        "@id": "#Query.counter"
      }
    },
    {
      "@id": "#Role",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Which of the participants is talking in that particular utterance. The roles are annotated as A_User (participant who has the information need which needs to be solved) and B_Receiver (person who has access the the computer and search engine).",
      "name": "Role"
    },
    {
      "@id": "#Start.time",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Start time of the utterance.",
      "name": "Start.time",
      "sameAs": {
        "@id": "#Start_time"
      }
    },
    {
      "@id": "#Start_time",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "Start time of the utterance.",
      "name": "Start_time",
      "sameAs": {
        "@id": "#Start.time"
      }
    },
    {
      "@id": "#Stop.time",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Stop time of the utterance.",
      "name": "Stop.time",
      "sameAs": {
        "@id": "#Stop_time"
      }
    },
    {
      "@id": "#Stop_time",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "Stop time of the utterance.",
      "name": "Stop_time",
      "sameAs": {
        "@id": "#Stop.time"
      }
    },
    {
      "@id": "#Sub_themes",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "Subthemes based on codes as described in Trippas et al. (2020)",
      "name": "Sub_themes"
    },
    {
      "@id": "#Transcript",
      "@type": "csvw:Column",
      "csvw:datatype": "string",
      "description": "Transcripts of the utterance of the particular user in that particular times lot.",
      "name": "Transcript",
      "sameAs": {
        "@id": "#Transcription"
      }
    },
    {
      "@id": "#Transcription",
      "@type": "csvw:Column",
      "csvw:datatype": "",
      "description": "Transcripts of the utterance of the particular user in that particular timeslot.",
      "name": "Transcription",
      "sameAs": {
        "@id": "#Transcript"
      }
    },
    {
      "@id": "#ffaa324f-bdec-4bf5-a260-62cc39580129",
      "@type": "Person",
      "affilitation": "\" https://ror.org/04ttjf776",
      "name": "Paul Thomas"
    },
    {
      "@id": "#schema-ConversationalSearchDataSet.csv",
      "@type": "csvw:Schema",
      "columns": [
        {
          "@id": "#Start.time"
        },
        {
          "@id": "#Stop.time"
        },
        {
          "@id": "#Query"
        },
        {
          "@id": "#Query.complexity"
        },
        {
          "@id": "#Role"
        },
        {
          "@id": "#Action"
        },
        {
          "@id": "#Transcript"
        },
        {
          "@id": "#Notes"
        },
        {
          "@id": "#Query.counter"
        },
        {
          "@id": "#File.name"
        }
      ],
      "name": "Schema for ConversationalSearchDataSet.csv"
    },
    {
      "@id": "#schema-SCSdata_v1.csv",
      "@type": "csvw:Schema",
      "columns": [
        {
          "@id": "#Start_time"
        },
        {
          "@id": "#Stop_time"
        },
        {
          "@id": "#Query"
        },
        {
          "@id": "#Query_complexity"
        },
        {
          "@id": "#Role"
        },
        {
          "@id": "#Sub_themes"
        },
        {
          "@id": "#Code"
        },
        {
          "@id": "#Query_counter"
        },
        {
          "@id": "#Transcript"
        },
        {
          "@id": "#Actor_pair"
        }
      ],
      "name": "Schema for SCSdata_v1.csv"
    },


from ro-crate.

abigail-miller avatar abigail-miller commented on June 2, 2024

Including file content definitions is an important use case for our project. We've been working with concepts from the frictionless data framework to define file types that include many permutations of manually assembled and machine generated data files. A common scenario is for several different labs to produce assay data files that contain corresponding columns that could be aggregated for analysis, but there is no way to know that from the file headers. Using some of the concepts from frictionless, we define file types containing field descriptors, which can map to an rdf type so a data consumer will know which columns across various file types may be integrated. Though frictionless is geared toward tabular files, the field descriptors could be used to describe non-tabular data file contents as well.

Now that we are moving to RO Crates to package our metadata and files, we'd like to include these file type definitions in the crate metadata. Ideally, we'd like to be able to include a context entity for each file type and link these to the data files. The file type context entity would include the frictionless field descriptors. Following is an example of what this might look like (using "FrictionlessFileType" as a placeholder.) We are pretty new to RO Crates, so any advice is appreciated.

{
  "@context": "https://w3id.org/ro/crate/1.1/context",
  "@graph": [
    {
      "@id": "./",
      "@type": "Dataset",
      "datePublished": "2022-05-27T18:45:24+00:00",
      "hasPart": [
        {
          "@id": "study_3-1/Food_Intake_9.3.2020.csv"
        }
      ]
    },
    {
      "@id": "study/Food_Intake_9.3.2020.csv",
      "@type": "File",
      "contentSize": "27710",
      "name": "study/Food_Intake_9.3.2020.csv",
      "frictionlessFileType": {
        "@id": "food_intake_phenotype"
      }
    },
    {
      "@id": "food_intake_phenotype",
      "@type": "FrictionlessFileType",
      "encoding": "iso8859-1",
      "format": "csv",
      "hashing": "md5",
      "schema": {
        "fields": [
          {
            "id": "animal_diet",
            "name": "Diet",
            "type": "string",
            "description": "Animal diet",
            "rdfType": "http://www.ebi.ac.uk/efo/EFO_0002755",
            "constraints": {
              "required": "true",
              "enum": [ "Envigo HFHS", "10% fat + fiber", "6% fat" ]
            }
          },
          {
            "id": "animal_weight",
            "name": "Weight",
            "type": "number",
            "description": "Animal weight on day 0",
            "rdfType": "http://www.ebi.ac.uk/efo/EFO_0004338",
            "constraints": {
              "required": "true"
            }
          }
        ]
      }
    }
  ]
}

from ro-crate.

ptsefton avatar ptsefton commented on June 2, 2024

This is an interesting approach I think Abigail - structurally it has quite a similar topology to the csvw approach but the documentation for Frictionless data is much more approachable.

A couple of comments - for RO-Crate the graph needs to be flattened - so all the fields with have to be separate entities with a @type attribute, FrictionlessField or maybe fd:Field if we used a namespace. Also the IDs should be URIs so, either #animal_weight or an http URI if you want to re-use them.

The constraints part is also problematic as for RO-Crate that would also need to be a separate entity - but in an RO-Crate dialect that could be direct properties of the field.

It could look something like this, maybe:

{
            "@id": "#animal_weight",
            "@type": "fd:Field",
            "name": "Weight",
            "fd:type": "number",
            "description": "Animal weight on day 0",
            "fd:rdfType": "http://www.ebi.ac.uk/efo/EFO_0004338",
           
              "fd:required": "true"
           
          }

OR another approach would be to put the frictionless schema in a file or at a URL and reference it - that way we don't have to force it into JSON-LD and it should work with FD tools. I think this is probably the way to go.

from ro-crate.

ptsefton avatar ptsefton commented on June 2, 2024

At the Language Data Commmons of Australia we are taking the second approach I mentioned above, and implementing frictionless table schemas included as a data entity in an RO-Crate - initial documentation is here in the draft profile for language resources.

{
   "@id": "conversation1.csv",
  "@type" :["File"],
  "encodingfomat":  "text/csv",
  "name": "Transcript of conversation 1".
 "conformsTo": {"@id" : "arcp://name,ausnc.ary/csv_schema")  

}

{
  "@id":  "arcp://name,ausnc.art/csv_schema", ← REPOSITORY-UNIQUE NAME
  "Type": "CreativeWork",
  "name": "Frictionless Table Schema for CSV transcription files in the ART corpus"
  "sameAs": "art_schema.json". ← Reference to the schema file above TODO: is this the best link?
  "conformsTo": {"@id" : "https://specs.frictionlessdata.io/table-schema/")  

}


{
   "@id": "artSchema",
  "@type" :["File"],
  "encodingfomat":  "text/csv",
  "name": "Frictionless Table Schema file for CSV transcription files in the ART corpus".
  "conformsTo": {"@id" : "https://specs.frictionlessdata.io/table-schema/")  

}

from ro-crate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.