Comments (8)
Thanks for the input @agravier. We already have a format to upload existing data (https://docs.kern.ai/docs/project-creation-and-data-upload#uploading-existing-labeled-data), but I agree that this requires UX improvement. We'll work on this, and I'd be happy to have your feedback again when that's implemented :)
from refinery.
Thanks for the heads up @jhoetter , I'll give it a try at the next occasion. Cheers
from refinery.
Hi @agravier,
thank you for reaching out to us and your feedback. You are right, both options (1. multi-attribute embeddings & 2. "calculated" columns) aren't part of our current UI. Calculated columns are on our roadmap for 2022.
from refinery.
Hi! That point is 100% valid, and we thought about it too. We're thinking about the following, and I'd be curious what you think about it:
- currently, you have one programming interface, i.e. in the heuristics sections
- in the near future (Q4), you'll be able to have a programming interface similar to that to write computed attributes, e.g.
def word_a_cat_word_b(record):
return str(record["word_a"]) + str(record["word_b"])
- also, we're continuing our work on our embedder library. Here, again we want to provide a programmatic interface that provides similar to the active learning templates, but with which you can compute your very own customized (and finetuned) embeddings, e.g.
from embedders.classification.contextual import TransformerSentenceEmbedder
def classification_word_a_cat_word_b_distilbert(record):
embedder = TransformerSentenceEmbedder("distilbert-base-cased")
return embedder.fit_transform(record["word_a_cat_word_b"], record["is_oxymoron"])
of course, not 100% sure about the exact interface here, but that is the general idea.
And thanks for trying out refinery, means a lot! :)
from refinery.
Thanks for getting back to me @JWittmeyer and @jhoetter. Sound good, as long as the UX is there to make all this clear. Another couple of things that you may want to consider, from my trial: tabular data export (not that JSON is horrible, but the thing lends itself to a tabular format) and "partially annotated input reconciliation", when one of the columns of the imported data already contains some labels. Obviously this raises some more questions that could be presented to the user about what to do with this data, like assign it to which annotator, etc.
from refinery.
I'll revisit in a few months, all the best, cheers!
from refinery.
This will be first solved by implementing #40. You'll be able to modify any attributes, in that case have e.g. a concatenation of word_a
and word_b
(similar to this):
def word_a_cat_word_b(record):
return str(record["word_a"]) + str(record["word_b"])
Afterward, you can apply encoding to this attribute.
We'll ultimately provide an extensive interface to program embeddings, but that is a bit further down the road :)
from refinery.
@agravier This is solved with the release of version 1.3.0. You can now do attribute modifications, which allow you to then create exactly the embeddings you like. Let us know what you think :)
from refinery.
Related Issues (20)
- [BUG] - Labels that are just numbers can't be imported HOT 1
- [BUG] - Deleting a project during tokenization can run into errors
- Detecting precision of values in lookup list
- Suggesting lookup list entries
- [BUG] - Linebreaks are not displayed correctly in the labeling view
- Allow the user to upload Information Extraction labels
- Updating the Code Parser in the Bricks Integrator
- [BUG] - change "Open AI" to "OpenAI" when choosing embedding providers
- Project snapshot name can be set when uploading a new snapshot
- [BUG] - Error with the "rigistered at" sort in the admin dashboard
- [BUG] - Embedding Modal resets when other embedding finishes
- [UX] - Active Learning creation modal asks for function name instead of class name
- [BUG] - write /var/lib/docker/tmp/GetImageBlob298522289: no space left on device HOT 3
- [BUG] - Line breaks not visible when filtering in data browser
- [BUG] - Extraction label over multiple lines breaks knowledge module HOT 1
- [BUG] - Heuristic Card needs overflow behaviour
- [BUG] - Lookup lists not known in attribute functions
- [ENH] - Optimize some database requests
- Implement support for Qdrant's fastembed
- [BUG] - Faulty attribute type and follow up errors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from refinery.