Giter VIP home page Giter VIP logo

stam-tools's People

Contributors

proycon avatar

Watchers

 avatar  avatar

stam-tools's Issues

XML (TEI and TEI-like) to STAM conversion

Though I already have a TEI-> FoLiA-> STAM conversion, I'd like to also have something more generic that would work for formats like TEI XML, and HTML. It would do a fairly straightforward one-on-one mapping/untangling from XML elements to STAM annotations (and plain text output) and XML attribute to annotation data. The underlying presumption is that all XML text nodes in a document constitute the actual text (in order without redundancy; perhaps with some parametrised format-specific exemptions).

Stam query should output the full annotation data when using --format json , not just IDs

$ stam query -F json  --query "SELECT ANNOTATION ?letter WHERE ID hoof001hwva02_01_0275;"  hoof001hwva.store.stam.json                                                                                                                        
[{                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                
"letter": {                                                                                                                                                                                                                                     
  "@type": "Annotation",                                                                                                                                                                                                                        
  "@id": "hoof001hwva02_01_0275",                                                                                                                                                                                                               
  "target": {                                                                                                                                                                                                                                   
    "@type": "TextSelector",                                                                                                                                                                                                                    
    "resource": "hoof001hwva02.txt",                                                                                                                                                                                                            
    "offset": {                                                                                                                                                                                                                                 
      "@type": "Offset",                                                                                                                                                                                                                        
      "begin": {                                                                                                                                                                                                                                
        "@type": "BeginAlignedCursor",                                                                                                                                                                                                          
        "value": 1231033                                                                                                                                                                                                                        
      },                                                                                                                                                                                                                                        
      "end": {                                                                                                                                                                                                                                  
        "@type": "BeginAlignedCursor",
        "value": 1233344
      }
    }
  },
  "data": [
    {
      "@type": "AnnotationData",
      "@id": "!D16",
      "set": "brieven-van-hooft-categories"
    },
    {
      "@type": "AnnotationData",
      "@id": "!D1",
      "set": "brieven-van-hooft-categories"
    },

Implement batch/shell funcionality

Implement a stam batch or stam shell subcommand that loads a give annotation store (or multiple) and then executes each of the lines on standard input as if they were new subcommands to the stam tool, but without reloading or saving the data. Saving should be deferred to the end.

html visualisation

Write a CLI tool that outputs a visualisation of a text (a TextSelection) with
its annotations. The visualisation would be a static HTML page, optionally with
SVG and/or some embedded Javascript. Existing libraries like
https://github.com/recogito/recogito-js/ could provide a solution.

Annotations could either be shown on mouse-over, or already indicated with some form of highlighting.
Challenges are when annotations overlap and when they overlap accross nesting boundaries.

Import from CONLL-U (Plus)

This requires some extra work on top of our TSV import (#1). As a common format in the field, support for CONLL-U/X would make it easy to get linguistically annotated data into STAM.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.