huggingface / hub-docs Goto Github PK

View Code? Open in Web Editor NEW

247.0 38.0 208.0 18.55 MB

Docs of the Hugging Face Hub

Home Page: http://hf.co/docs/hub

License: Apache License 2.0

hacktoberfest machine-learning

hub-docs's Issues

Improve the tagging of models

Hi,

I hope first of all to open this problem in the right place (I hesitated to post in this repo but it seems less active with only issues in 1 year: https://github.com/huggingface/model_card).

I'll illustrate my observation by talking about French models, but the logic applies to any language.

I found by chance the following model on the hub: https://huggingface.co/dbmdz/electra-base-french-europeana-cased-generator
An electra model in French released more than a year ago and I had never heard of it? How it's possible?
I realized that it was simply because it was not referenced correctly (no "fr" tag). This probably explains why it was downloaded only 12 times last month. I think it's a shame.

So I did a little more research to see if I had missed any other French models that were not referenced and here is the list I came up with:

This represents 24 models. If we calculate in relation to what is announced by the "fr" filter (https://huggingface.co/models?language=fr), it's about 7.5% (24/(24+300)) of the models in French that are not referenced.
So I think it would be important to improve the reference.

I have two ideas to submit:

either proposed to be able to add new tags to users.
However, this could lead to abuse from bad people tagging all the models they find in a given language, so there should be a system of safeguards by allowing maybe only certain people to do that. Or for example a place where people could indicate the badly indexed models by providing the link to the model + the tag to add and a person of Hugging Face would check manually and update things once a month for example
propose tag suggestions to people adding their models on the Hub based on the model name. I found the 24 models by simply typing "french" and "-fr-" in the search bar. This would complement what you are already doing on metadata: https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined

A slightly different but related topic is multilingual models. Should multilingual models to be tag with all the languages they contain or not?
This solution has been adopted for Helsinki NLP templates (an example: https://huggingface.co/Helsinki-NLP/opus-mt-af-fr tagged in "af" and "fr").
But this is not the case for Geotrend models (an example: https://huggingface.co/Geotrend/bert-base-en-fr-cased, contains neither "en" nor "fr") or for T-systems (an example: https://huggingface.co/T-Systems-onsite/cross-en-fr-roberta-sentence-transformer contains neither "en" nor "fr").
I haven't checked with datasets, but I guess the problem must apply there too. So I think this would be a point to harmonize.

Have a nice day :)

Language filtering on the Hub does not work correctly

(I noticed recently that you changed the language filtering GET parameter from filter to language, but unfortunately my problem still persists)

If I search for Danish models then models like this also pop up, which is not a Danish model, but has the da tag (for "direct assessment"). This could be fixed be filtering by the tag-green class rather than a general tag filtering, as I guess is done currently.

Thanks!

Image to Image Widget

Would have as input

Image

Would output:

Image

This should be similar in implementation to the image classification widget.

Document hf.co webhooks and open them publicly if necessary (process to be defined)

Allow uploading large files via the web UI

Is your feature request related to a problem? Please describe.
I tried to upload a 13GB dataset file (for this repo), and after waiting a couple of hours for it to upload, it gave a "Payload too large" error.

Describe the solution you'd like
I'd like to be able to upload large dataset files using the web UI.

Describe alternatives you've considered
I'm guessing there's some sort of command line tool for cases like this, but I'd prefer to not have to install something just to upload a file - it's bad UX. There's no fundamental technical reason why the web UI can't handle large files, so it doesn't seem like a good idea to put limits like this in place.

Additional info
If the team is for some reason adamant about not allowing upload of large files via the web UI, then at the very least it would be good to stop the user with an error when they pick the file, rather than after the upload is complete. I.e. check the size of the blob with JS before uploading rather than uploading and checking size on the server.

If it's actually already possible to upload large files via the web UI (and I've just misunderstood the process), then please consider this issue a request for better UX in guiding the user toward doing that. After creating the dataset repo I just clicked across to the "files and versions" tab, and then clicked "Add file > Upload file".

Thanks!

Text regression rather than classification

Is your feature request related to a problem? Please describe.
I am trying to share our hate speech measurement model, which predicts a continuous measure for hate speech severity. So it is doing text regression rather than classification, but the input remains just a text input that gets tokenized. Is it possible to clarify if I need to create a custom pipeline for this, or how else to proceed? I have one of our models uploaded for TF at https://huggingface.co/ucberkeley-dlab/hate-measure-roberta-large

The architecture is just a Transformer backbone, 1D global average pooling layer, followed by a linear output node.

Describe the solution you'd like
I would like to support the hosted inference API so that individuals can get a continuous score prediction out of our model and for the demo widget to work.

Describe alternatives you've considered
I'm not clear if I need to create a new TextRegression task for HuggingFace, which feels like it would be a large lift, or if there is an easier way to do this.

Additional context
The preprint for our work is at https://arxiv.org/abs/2009.10277 and this is to allow the model to be used in the Jigsaw Toxic Severity Kaggle competition: https://www.kaggle.com/c/jigsaw-toxic-severity-rating/overview

Feature request: widgets for document image understanding and document image VQA

I'm currently adding LayoutLMv2 and LayoutXLM to HuggingFace Transformers. These models, built by Microsoft, have impressive capabilities for understanding document images (scanned documents, such as PDFs). LayoutLM, and its successor, LayoutLMv2, are extensions of BERT that incorporate layout and visual information, besides just text. LayoutXLM is a multilingual version of LayoutLMv2.

It would be really cool to have inference widgets for the following tasks:

document image understanding
document image visual question answering
document image classification

Document image understanding

Document image understanding (also called form understanding) means understanding all pieces of information of a document image. Example datasets here are FUNSD, CORD, SROIE and Kleister-NDA.

The input is a document image:

The output should be the same image, but with colored bounding boxes, indicating for example what part of the image are questions (blue), which are answers (green), which are headers (orange), etc.

LayoutLMv2 solves this as a NER problem, using LayoutLMv2ForTokenClassification. First, an OCR engine is run on the image to get a list of words + corresponding coordinates. These are then tokenized, and together with the image sent through the LayoutLMv2 model. The model then labels each token using its classification head.

Document visual question answering

Document visual question answering means, given an image + question, generate (or extract) an answer. For example, for the PDF document above, a question could be "what's the date at which this document was sent?", and the answer is "January 11, 1999".
Example datasets here are DocVQA - on which LayoutLMv2 obtains SOTA performance, who might have guessed.

LayoutLMv2 solves this as a extractive question answering problem similar to SQuAD. I've defined a LayoutLMv2ForQuestionAnswering, which predicts the start_positions and end_positions.

Document image classification

Document image classification is fairly simple: given a document image, classify it (e.g. invoice/form/letter/email/etc.). Example datasets here are [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/#:~:text=The%20RVL%2DCDIP%20(Ryerson%20Vision,images%2C%20and%2040%2C000%20test%20images.). For this, I have defined a LayoutLMv2ForSequenceClassification, which just places a linear layer on top of the model in order to classify documents.

Remarks

I don't think we can leverage the existing 'token-classification', 'question-answering' and 'image-classification' pipelines, as the inputs are quite different (document images instead of text). To ease the development of new pipelines, I have implemented a new LayoutLMv2Processor, which takes care of all the preprocessing required for LayoutLMv2. It combines a LayoutLMv2FeatureExtractor (for the image modality) and LayoutLMv2Tokenizer (for the text modality). I would also argue that if we have other models in the future, they all implement a processor that takes care of all the preprocessing (and possibly postprocessing). Processors are ideal for multi-modal models (they have been defined previously for CLIP and Wav2Vec2).

Tracking integration for Image Segmentation

Tracking integration of task - Image Segmentation

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline huggingface/transformers#13828
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget huggingface/huggingface_hub#378

Integration guide: https://hf.co/docs/hub/adding-a-task

Show `pipeline` in "use in transformers" instead.

When I go to a repo like this one I can click the "use in transformers" button. This makes me think "grand! that'll be a quickstart!". It shows me this code;

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large")
model = AutoModelForTokenClassification.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large")

This is a bit of a bummer. It tells me how to set up a tokenizer and a model. But it doesn't tell me at all how to use the model. It'd be much more pragmatic if instead it showed this line of code;

from transformers import pipeline

pipe = pipeline("token-classification", "oliverguhr/fullstop-punctuation-multilang-large")
pipe(["this is an example that you can actually run"])

This way, I don't need to search the docs for the type of pipeline model that I'm dealing with and I immediately have something that works in my notebook. Wouldn't it be better to generate the pipeline code in the docs?

Potentially you could still render the model/tokeniser as well, but these don't feel part of the "getting started journey".

Tracking integration for text-nearest

Tracking integration of task - text-nearest

Naming is tentative

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget

Integration guide: https://hf.co/docs/hub/adding-a-task

We can do this with gensim or fasttext models for example for obtaining closest words with nearest neighbors. Example repo: https://huggingface.co/Hellisotherpeople/debate2vec

CC @mishig25 in case you would like to help with the widget, this is pretty much the same as the text-classification widget

"Favorite" models and datasets + follow organizations/users

Hey guys, huge fan of you and what you've done to make all our lives easier.

I think it would be really cool if the site https://huggingface.co/ had the option to favorite a model or dataset if you're a user.

Add docs on Hub organisations

As described in this forum comment, the Hub docs are currently a bit sparse on:

Searching for existing orgs
How to join an org
What features are available between free vs premium subscriptions

It would be nice to integrate this feedback into the docs.

Widget for image captioning

Would have as input

Image

Would output

Text

This should be similar in implementation to the image classification widget.

(Space) transformers is not installed in gradio SDK?

Is your feature request related to a problem? Please describe.

I got this error:

Traceback (most recent call last):
  File "app.py", line 1, in <module>
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
ModuleNotFoundError: No module named 'transformers'

when I pushed this version to spaces https://huggingface.co/spaces/ttj/t0-generation/commit/efc071478da5be7f6a369b034ecda4844ed3ad22

Describe the solution you'd like

Pre-install common libraries.

Task page redirection from the model search page

Hello,
I think it's a UX improvement and an improvement for better visibility of the pages. I want a redirection from the models page to task page, such that when they people filter for a task, they can also get better information on it (+these resources would be more visible)
cc: @osanseviero @gary149 @beurkinger

Widget for Table to text generation

https://ai.googleblog.com/2021/01/totto-controlled-table-to-text.html

as suggested by @mrm8488

Should be decently easy once huggingface/huggingface_hub#87 is merged

Token classification widget makes labeled token invisible in dark mode

https://huggingface.co/Lauler/demformer?text=dem+har+s%C3%B6kt+upp+de+f%C3%B6r+att+prata

Add inference API snippets for non-NLP tasks

We are having more non-NLP tasks supported in the Inference API but there are no code snippets at https://huggingface.co/superb/hubert-large-superb-er even if there are at https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html#audio-classification-task

This probably requires some internal changes + changing https://github.com/huggingface/huggingface_hub/tree/main/widgets/src/lib/inferenceSnippets since it expects same type of input always.

cc @mishig25

Widget for zero-shot image classification

Would have as inputs:

an image dropzone
a field with possibles labels

as suggested by @patil-suraj (for CLIP?)

Should be decently easy once huggingface/huggingface_hub#87 is merged

Tracking integration for Object Detection

Tracking integration of task - Object Detection

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline https://github.com/huggingface/transformers/blob/master/src/transformers/pipelines/object_detection.py
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget huggingface/huggingface_hub#247

Integration guide: https://hf.co/docs/hub/adding-a-task

Feature request: add link to model documentation

Is your feature request related to a problem? Please describe.
As discussed on Slack, it would be nice if we add a direct link to the documentation of a model on the hub (e.g. linking to https://huggingface.co/transformers/model_doc/bert.html for bert-base-uncased). This can probably be done based on the config.json of a model.

Describe the solution you'd like
A button could be added either next to "Deploy" and "Use in Transformers", or a link could be added within the "Use in Transformers" button.

cc @julien-c

Revamp of Hub documentation

As we do huggingface/huggingface_hub#744 and add library documentation based in docstrings, the hub docs might split into hugginface_hub and product usage.

Based on existing content, what we can do is more self-contained use cases. An initial mental model without creating additional content would be

Move to huggingface_hub as guides

How to programmatically access the Inference API
https://huggingface.co/docs/hub/how-to-upstream
https://huggingface.co/docs/hub/how-to-downstream
https://huggingface.co/docs/hub/searching-the-hub

Then on the Hub

Model card

What are model cards and why are they useful?
When sharing a model, what should I add to my model card?
Model card metadata
How are model tags determined?
Can I specify which framework supports my model?
How can I link a model to a dataset?
Can I access models programatically?
Can I write LaTeX in my model card?
How is a model's type of inference API and widget determined?
What are all the possible task/widget types?

Repositories

What's a repository?
How can I explore the Hugging Face Hub?
How can I load/push from/to the Hub?
How can I rename or transfer a repo?
How can I fork or rebase a repository with LFS pointers?
List of license identifiers

CO2 Emissions

Why is it useful to calculate the carbon emissions of my model?
What information should I include about the carbon footprint of my model?
Carbon footprint metadata
How is the carbon footprint of my model calculated? 🌎

Widgets

What's a widget?
How can I control my model's widget example input?
How to create a new widget?

Inference API

What's the Inference API?
How can I control my model's widget Inference API parameters?

Security

Endpoints

Adding a new task

Integrating a new library (with parts of it linking to huggingface_hub)

This would also include Spaces, which can very likely break into more pieces

WDYT @julien-c @LysandreJik @adrinjalali @muellerzr of this as a first step once we kick off the splitting

Tracking integration for Zero-Shot Sequence Labeling with TARS [FLAIR]

In v0.9 of flair a new zero-shot sequence labeling feature was included which seems (a) very exciting and (b) very useful for real-world NER applications where acquiring labeled data is expensive.

It would be cool to include support for this in the Hugging Face Hub, possibly as a companion widget to the existing one for zero-shot classification.

Here's a link to the paper behind the TARS technique.

cc @osanseviero

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget

Tracking integration for Image to Text (Image Captioning, etc)

Tracking integration of task - Image to Text

Note that you're not expected to do all of the following steps. This issue helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget #15

Integration guide: https://hf.co/docs/hub/adding-a-task

Tracking integration for Image Question Answering

Tracking integration of task -Image Question Answering

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget #21

Integration guide: https://hf.co/docs/hub/adding-a-task

cc @mishig25 @NielsRogge

Monetisation of models

What do you think about paying some % to developer / owners of models if it's getting used through inference API ?
It could make it more attractive uploading more amazing models if it's possible to generate some passive income.
It would be like rapid API for NLP 🤗

🚀Display (and enable filtering by) size / number of parameters

Is your feature request related to a problem? Please describe.
It is hard to tell how big a model is without going to the repo and seeing the size of the weights file.

Describe the solution you'd like
It might prove useful to be able to sort models by size or number of parameters, especially for people with more modest compute budgets. Perhaps even display that information alongside a model's popularity.

Describe alternatives you've considered
Alternatives would be prior knowledge of the distilled / small model landscape. This feature might also help people discover new models they hadn't heard of by enabling filtering queries like "most popular and smallest model trained using MLM on wikitext"

Additional context
The datasets tab has a similar feature with the Size category filter.

Tracking integration for Video to Text (video-text retrieval)

Tracking integration of task - Video Text Retrieval

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget

Integration guide: https://hf.co/docs/hub/adding-a-task

Example model: https://arxiv.org/pdf/2111.05610v1.pdf
cc @nateraw

Tracking integration for Table to Text generation

Tracking integration of task - Table to Text generation

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget #13

Integration guide: https://hf.co/docs/hub/adding-a-task

Widget for Visual Question Answering

This request for a widget/inference API for the hub for multimodal models for VQA tasks:

Would have input as

Image
Question
Would output
Answer

This should be used for VisualBERT/LXMERT models. And might need detectron or something similar to the FasterRCNN model here : https://github.com/huggingface/transformers/tree/master/examples/research_projects/lxmert

@LysandreJik

Getting Error while Starting Sparksession

Hello, I am trying to create new streamlit app on HuggingFace by using pyspark in it.

I've created app.py file, requirement.txt file and I run a basic streamlit app without pyspark and it worked seamlessly.

However, problem starts when I add the following line spark = SparkSession.builder.appName("appName").getOrCreate() to start sparksession, I got an error like following:

Exception: Java gateway process exited before sending its port number

The content of requirements.txt file:

streamlit
pyspark==3.1.2
spark-nlp==3.3.2

If Anyone worked with pyspark on hf can help me I would be appreciate!

Document model repo best practices

I would like to start documenting good practices of model repos to add to our documentation.

Some come to mind rather quickly

One model per repo (avoid having multiple models in the same repo)
Add metadata to the model card
Add metrics to the metadata of the model card

How do we want to encourage users to have multiple checkpoints in a single repo? There was a related discussion in GPT-J and for other contributions

One branch per checkpoint?
One commit per checkpoint?

My suggestion

When using checkpoints for version control, use a commit per checkpoint
- For example, Mistral has 600 checkpoints per model. Each checkpoint correspond to a different step. In that sense, I think it makes sense to have a commit/tag per checkpoint
When using checkpoints of a model with slightly different characteristics, use a branch per checkpoint
- For example, GPT-J 6B has a half precision checkpoint and a single precision checkpoint.

I'm just gathering ideas so any are welcome!

cc @patrickvonplaten @julien-c @LysandreJik @lewtun @NielsRogge I hope I did not forget anyone

DOI Generation for Artifacts

It would be useful to be able to generate a Digital Object Identifier to artifacts living on the Hub.

It would let people cite a specific dataset or a specific model, and make their own datasets and models citable. Some venues require DOIs for digital resources and it would be nice to not have to use 3rd parties for that.

Kaggle, for instance, currently has that feature for public datasets: https://www.kaggle.com/product-feedback/108594

Drawback: It costs money, because one has to go through approved agencies (e.g.: https://datacite.org/feemodel.html)

Automation would probably be straightforward: Creating DOIs with the Datacite REST API

Widget for zero-shot image classification

Would have as inputs:

an image dropzone
a field with possibles labels

as suggested by @patil-suraj (for CLIP?)

Should be decently easy once huggingface/huggingface_hub#87 is merged

token-classification widget shows empty result when there are no classifications

Should we either show the same sentence or say that no tokens were found?

Tracking integration for Zero-shot Image classification

Tracking integration of task - Zero-shot Image classification

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget huggingface/hub-docs#14

Integration guide: https://hf.co/docs/hub/adding-a-task

Upload progress bar contrast should be higher

Here's the current progress bar:

On some screens it's very hard to see the progress at all.

For better accessibility and UX I think it should be closer to something like this:

Text-generation API endpoint fails to handle integer-formatted numeric arguments in JSON request

When I call the hosted text-generation API, the request fails if I set the temperature parameter to an integer value of 1 instead of a float value of 1.0.

For example, this succeeds:

curl -i -X POST https://api-inference.huggingface.co/models/my_organization/my_model \
     -H "Authorization: Bearer <<REDACTED>>" \
     -H "Content-Type: application/json" \
     -d \
     '{
          "inputs":"Once upon a time",
          "parameters": {
            "temperature": 1.0,
            "max_new_tokens": 20
          }
     }'

But this request fails:

curl -i -X POST https://api-inference.huggingface.co/models/my_organization/my_model \
     -H "Authorization: Bearer <<REDACTED>>" \
     -H "Content-Type: application/json" \
     -d \
     '{
          "inputs":"Once upon a time",
          "parameters": {
            "temperature": 1,
            "max_new_tokens": 20
          }
     }'

...with this error message:

{"error":["value is not a valid float: `temperature` in `parameters`"]}

Both requests should succeed. The only difference between the two requests is the numeric formatting of the temperature parameter (with or without a decimal point and trailing zero).

Tracking integration for Image to Text (Image Captioning, etc)

Tracking integration of task - Image to Text

Note that you're not expected to do all of the following steps. This issue helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget #15

Integration guide: https://hf.co/docs/hub/adding-a-task

Widget for zero-shot image classification

Would have as inputs:

an image dropzone
a field with possibles labels

as suggested by @patil-suraj (for CLIP?)

Should be decently easy once huggingface/huggingface_hub#87 is merged

New object detection and image segmentation widgets

For the DETR model, which will soon be part of HuggingFace Transformers (see huggingface/transformers#11653 (comment)), it would be cool to have object detection and image segmentation (actually panoptic segmentation) inference widgets.

Similar to the image classification widget, a user should be able to upload/drag an image to the widget, which is then annotated with bounding boxes and classes (in case of object detection), or turned into a segmentation map (in case of panoptic segmentation).

Here are 2 notebooks which illustrate what you can do with the head models of DETR:

DetrForObjectDetection: https://colab.research.google.com/drive/170dlGN5s37uaYO32XKUHfPklGS8oB059?usp=sharing
DetrForSegmentation: https://colab.research.google.com/drive/1hTGTPGBLPRY1QkLmG7P9air6v04tcXUL?usp=sharing

The models are already on the hub: https://huggingface.co/models?search=facebook/detr

cc @LysandreJik

Give users the ability to compare models' outputs for a given task.

🗒️ Motivation

When a user selects a specific task on the Hugging Face Hub - for example, image-to-text:

That user is shown a series of models, with no guidance as to which model might be state of the art, or which might be the most performant for their use case.

To test the capabilities and behavior of each model, the user must:

Open the link to each model in a new browser tab.
Read through each model's model card.
Test out each model with an image, if the model has a Spaceor a Colab notebook available (not every model does).
Cross-reference each model with the state-of-the-art leaderboards on Papers with Code.

🙏 Desired Behavior

The user should be able to:

Select a given task (for example, image-to-text).
Select one or more models to test, for that use case.
Input a piece of data, to test (an image, in the case of image-to-text).
View the output of each model, given that input data -- side by side, to compare the performance and behavior.

Transferring models/stuff Organization to Organization does not work

As you can see on the following image, the dropdown is disabled. I tried to remove de "disabled" attribute using Chrome Development Console but the dropdown does not contain the organization I want to make the transfer to. I am member of both organizations.

Tracking integration for Image to Image

Tracking integration of task - Image to image

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget #51

Integration guide: https://hf.co/docs/hub/adding-a-task

SSL certificates renewed very frequently

@osanseviero thank you for looking into this issue.

Simply put, we built an API that receives questions or documents from a user through a web application in the front, and on the back end downloads ML models from huggingface.co in order to answer those questions or documents and encode new documents for search.

In order to connect to huggingface.co to download models, we require an SSL certificate. Without the cert, the following error comes up

The current fix requires that I download the certificate chain manually through my browser as follows

Note the hugging face SSL certificate expires very soon!

To install the certificate, I copy it over to my container in my Dockerfile and add the following lines of code to the application:

import requests
import certifi

try:
    print('Checking connection to Huggingface...')
    test = requests.get('https://huggingface.co')
    print('Connection to Huggingface OK.')
except requests.exceptions.SSLError as err:
    print('SSL Error. Adding custom certs to Certifi store...')
    cafile = certifi.where()
    with open('huggingface-co-chain.pem', 'rb') as infile:
        customca = infile.read()
    with open(cafile, 'ab') as outfile:
        outfile.write(customca)
    print('That might have worked.')

The main problem with this fix is that the certificate I download is only valid for a short period of time (one or two weeks).
-Ideally, we should be able to do this on the command line as part of the container build, but so far efforts to do so have not worked.
-The following command yields a chain of three certificates, while the ones downloaded from the browser have a chain of 4.

openssl s_client -showcerts -verify 5 -connect huggingface.co:443 < /dev/null

-The missing certificate appears to be the zScaler root CA, which shows up in the browser but not the command line.

Improve right-to-left language support in inputs and outputs

Is your feature request related to a problem? Please describe.

In newer versions of HuggingFace_Hub, text inputs and outputs left-justify their text, when we can add an attribute to automatically detect and adjust for right-to-left language text.

Describe the solution you'd like

recommended

Model Page, Widget input: either the <form> or the <span ... role="textbox"> element should have HTML attribute dir="auto" (no css equivalent)
Model Page, Text Generation and Speech Recognition output: the <p ... class="alert alert-success"> element should have dir="auto"

less critical

Model and Dataset readme: I believe what we had earlier would be JS after the markdown element renders, to select .prose > {h1, h2, h3, h4, p} and apply dir="auto"; the selection is so we don't mess with code / pre / table elements
Model Page, Fill-Mask Widget: the <div> around the bar chart could also have dir="auto"
Spaces (probably will submit to Gradio) - the user has more control here, so I might just advise the user that they can add direction: rtl CSS to text inputs and outputs

Display size of the generated dataset, downloaded dataset files, total amount of disk used in GB when MB >= 1000 in dataset cards

Is your feature request related to a problem? Please describe.
Currently, the following data fields on the hub are only displayed in MB.

--Size of the generated dataset:
--Size of downloaded dataset files:
--Total amount of disk used:

Figures like 1895.01 MB and 1611.50 MB can become unwieldly as they grow in size to reason about the space they require, compared to 1.89501 GB or 1.6115 GB.

An example page for reference

Describe the solution you'd like
Convert numbers to GB in dataset cards when MB > 1000.

Describe alternatives you've considered
I considered advocating to truncate size to 3 decimal points, as precision to 5 or 6 decimal points in GB (such as a number like 1.543210 GB) may be an unnecessary degree of precision to provide for users.

Ultimately though, I reasoned that more precision is often better.

Additional context
I'm happy to contribute to this. I didn't see exactly where this was handled in the current codebase, so any pointers appreciated. Also, let me know if I should be opening this up in the datasets repo instead...

Exploratory Analysis of Models on Hub

Using the huggingface_hub library, I was able to collect some statistics on the 9,984 models that are currently hosted on the Hub. The main goal of this exercise was to find answers to the following questions:

How many model architectures can be mapped to tasks that we wish to evaluate? For example a model with the BertForSequenceClassification architecture is likely to be about text classification; similarly for the other ModelNameForXxx architectures.
How many models have an architecture, dataset, and metric in their metadata?
Which tasks are most common?

Number of models per dimension

Without applying any filters on the architecture names, the number of models per criterion is shown in the table below:

Has architecture	Has dataset	Has metric	Number of models
✅	❌	❌	8129
✅	✅	❌	1241
✅	✅	✅	359

These numbers include models for which a task may not be easily inferred from the architecture alone. For example BertModel would presumably be associated with a feature-extraction task, but these are not simple to evaluate.

By applying a filter on the architecture name to contain any of "For", "MarianMTModel" (translation), or "LMHeadModel" (language modelling), we arrive at the following table:

Has task	Has dataset	Has metric	Number of models
✅	❌	❌	7452
✅	✅	❌	1150
✅	✅	✅	337

Architecture frequencies

Some models either have no architecture (e.g. the info is missing from the config.json file or the model belongs to another library like Flair), or multiple ones:

Number of architectures	Number of models
0	1755
1	8125
2	1
3	3

Based on these counts, it thus makes sense to just focus on models with a single architecture.

Number of models per task

For models with a single architecture, I extract the task names from the architecture name according to the following mappings:

"MarianMTModel" => "Translation"
architectures containing "LMHeadModel", "LMHead", "MaskedLM", "CausalLM" => "LanguageModeling"
architectures containing "Model", "DPR", "Encoder" => "Model"

The resulting frequency counts are shown below:

LanguageModeling                    3250
Translation                         1354
SequenceClassification               829
ConditionalGeneration                766
Model                                655
QuestionAnswering                    364
CTC                                  318
TokenClassification                  286
PreTraining                          163
MultipleChoice                        37
MultiLabelSequenceClassification      17
ImageClassification                   15
MultiLabelClassification              11
Generation                             7
ImageClassificationWithTeacher         4

Fun stuff

We can visualise which tasks are connected to which datasets as a graph. Here we show the top 10 tasks (measured by node connectivity) with the top 20 datasets marked in orange

Include a tag for mobile-compatible models.

👋 Greetings! Am not sure if this is the correct place to file an issue for huggingface.co, but figured I'd try anyway. :)

Issue Description

Is your feature request related to a problem? Please describe.
Would it be possible to include models that are mobile-compatible as a filterable category in the Other section on the Hugging Face website? Meaning .tflite, .ptl for PyTorch, or CoreML models for iOS devices.

Additional context

In competitors' hubs, models of varying sizes are colocated on a single page (example below) - so you might see the base example, a Colab notebook to test out the model interactively, and a TF Lite implementation. You can also sort and filter based on model type, as well as framework version and fine-tunability (though HuggingFace's models would win that contest every time 😃 ).

Tracking integration for video classification

Tracking integration of task - video-classification

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub 🔥

Integration with Inference API. Select at least one of the following:
- Added a transformers pipeline
- Added to Community Inference API for 3rd party library
- Added to Community Inference API for generic
Added basic UI elements (icon, order specification, etc)
Added a widget

Integration guide: https://hf.co/docs/hub/adding-a-task

huggingface / hub-docs Goto Github PK

hub-docs's Issues

Document image understanding

Document visual question answering

Document image classification

Remarks

Tracking integration of task - Image Segmentation

Tracking integration of task - text-nearest

Tracking integration of task - Object Detection

Tracking integration of task - Image to Text

Tracking integration of task -Image Question Answering

Tracking integration of task - Video Text Retrieval

Tracking integration of task - Table to Text generation

Tracking integration of task - Zero-shot Image classification

Tracking integration of task - Image to Text

🗒️ Motivation

🙏 Desired Behavior

Tracking integration of task - Image to image

Number of models per dimension

Architecture frequencies

Number of models per task

Fun stuff

Issue Description

Additional context

Tracking integration of task - video-classification

Recommend Projects

Recommend Topics

Recommend Org