Giter VIP home page Giter VIP logo

hub-docs's Introduction

hub-docs

This repository regroups documentation and information that is hosted on the Hugging Face website.

You can access the Hugging Face Hub documentation in the docs folder at hf.co/docs/hub.

For some related components, check out the Hugging Face Hub JS repository

How to contribute to the docs

Just add/edit the Markdown files, commit them, and create a PR. Then the CI bot will build the preview page and provide a url for you to look at the result!

For simple edits, you don't need a local build environment.

Previewing locally

# install doc-builder (if not done already)
pip install hf-doc-builder

# you may also need to install some extra dependencies
pip install black watchdog

# run `doc-builder preview` cmd
doc-builder preview hub {YOUR_PATH}/hub-docs/docs/hub/ --not_python_module

hub-docs's People

Contributors

beurkinger avatar cbensimon avatar christophe-rannou avatar coyotte508 avatar davanstrien avatar dvsrepo avatar gary149 avatar julien-c avatar kakulukian avatar krampstudio avatar lhoestq avatar lysandrejik avatar meg-huggingface avatar merveenoyan avatar michellehbn avatar mishig25 avatar muellerzr avatar nimaboscarino avatar osanseviero avatar philschmid avatar pierrci avatar sbrandeis avatar severo avatar simoninithomas avatar smty2018 avatar stefan-it avatar stevhliu avatar vaibhavs10 avatar wauplin avatar whitphx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hub-docs's Issues

Tracking integration for Image to Text (Image Captioning, etc)

Tracking integration of task - Image to Text

Note that you're not expected to do all of the following steps. This issue helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget #15

Integration guide: https://hf.co/docs/hub/adding-a-task

Document model repo best practices

I would like to start documenting good practices of model repos to add to our documentation.

Some come to mind rather quickly

  • One model per repo (avoid having multiple models in the same repo)
  • Add metadata to the model card
  • Add metrics to the metadata of the model card

How do we want to encourage users to have multiple checkpoints in a single repo? There was a related discussion in GPT-J and for other contributions

  • One branch per checkpoint?
  • One commit per checkpoint?

My suggestion

  • When using checkpoints for version control, use a commit per checkpoint
    • For example, Mistral has 600 checkpoints per model. Each checkpoint correspond to a different step. In that sense, I think it makes sense to have a commit/tag per checkpoint
  • When using checkpoints of a model with slightly different characteristics, use a branch per checkpoint
    • For example, GPT-J 6B has a half precision checkpoint and a single precision checkpoint.

I'm just gathering ideas so any are welcome!

cc @patrickvonplaten @julien-c @LysandreJik @lewtun @NielsRogge I hope I did not forget anyone

Tracking integration for Image Segmentation

Tracking integration of task - Image Segmentation

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline huggingface/transformers#13828
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget huggingface/huggingface_hub#378

Integration guide: https://hf.co/docs/hub/adding-a-task

Exploratory Analysis of Models on Hub

Using the huggingface_hub library, I was able to collect some statistics on the 9,984 models that are currently hosted on the Hub. The main goal of this exercise was to find answers to the following questions:

  • How many model architectures can be mapped to tasks that we wish to evaluate? For example a model with the BertForSequenceClassification architecture is likely to be about text classification; similarly for the other ModelNameForXxx architectures.
  • How many models have an architecture, dataset, and metric in their metadata?
  • Which tasks are most common?

Number of models per dimension

Without applying any filters on the architecture names, the number of models per criterion is shown in the table below:

Has architecture Has dataset Has metric Number of models
βœ… ❌ ❌ 8129
βœ… βœ… ❌ 1241
βœ… βœ… βœ… 359

These numbers include models for which a task may not be easily inferred from the architecture alone. For example BertModel would presumably be associated with a feature-extraction task, but these are not simple to evaluate.

By applying a filter on the architecture name to contain any of "For", "MarianMTModel" (translation), or "LMHeadModel" (language modelling), we arrive at the following table:

Has task Has dataset Has metric Number of models
βœ… ❌ ❌ 7452
βœ… βœ… ❌ 1150
βœ… βœ… βœ… 337

Architecture frequencies

Some models either have no architecture (e.g. the info is missing from the config.json file or the model belongs to another library like Flair), or multiple ones:

Number of architectures Number of models
0 1755
1 8125
2 1
3 3

Based on these counts, it thus makes sense to just focus on models with a single architecture.

Number of models per task

For models with a single architecture, I extract the task names from the architecture name according to the following mappings:

  • "MarianMTModel" => "Translation"
  • architectures containing "LMHeadModel", "LMHead", "MaskedLM", "CausalLM" => "LanguageModeling"
  • architectures containing "Model", "DPR", "Encoder" => "Model"

The resulting frequency counts are shown below:

LanguageModeling                    3250
Translation                         1354
SequenceClassification               829
ConditionalGeneration                766
Model                                655
QuestionAnswering                    364
CTC                                  318
TokenClassification                  286
PreTraining                          163
MultipleChoice                        37
MultiLabelSequenceClassification      17
ImageClassification                   15
MultiLabelClassification              11
Generation                             7
ImageClassificationWithTeacher         4

Fun stuff

We can visualise which tasks are connected to which datasets as a graph. Here we show the top 10 tasks (measured by node connectivity) with the top 20 datasets marked in orange

tasks2datasets

Tracking integration for video classification

Tracking integration of task - video-classification

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget

Integration guide: https://hf.co/docs/hub/adding-a-task

Tracking integration for Zero-shot Image classification

Tracking integration of task - Zero-shot Image classification

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget huggingface/hub-docs#14

Integration guide: https://hf.co/docs/hub/adding-a-task

Add inference API snippets for non-NLP tasks

We are having more non-NLP tasks supported in the Inference API but there are no code snippets at https://huggingface.co/superb/hubert-large-superb-er even if there are at https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html#audio-classification-task

This probably requires some internal changes + changing https://github.com/huggingface/huggingface_hub/tree/main/widgets/src/lib/inferenceSnippets since it expects same type of input always.

cc @mishig25

Tracking integration for text-nearest

Tracking integration of task - text-nearest

Naming is tentative

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget

Integration guide: https://hf.co/docs/hub/adding-a-task

We can do this with gensim or fasttext models for example for obtaining closest words with nearest neighbors. Example repo: https://huggingface.co/Hellisotherpeople/debate2vec

CC @mishig25 in case you would like to help with the widget, this is pretty much the same as the text-classification widget

DOI Generation for Artifacts

It would be useful to be able to generate a Digital Object Identifier to artifacts living on the Hub.

It would let people cite a specific dataset or a specific model, and make their own datasets and models citable. Some venues require DOIs for digital resources and it would be nice to not have to use 3rd parties for that.

Kaggle, for instance, currently has that feature for public datasets: https://www.kaggle.com/product-feedback/108594

Drawback: It costs money, because one has to go through approved agencies (e.g.: https://datacite.org/feemodel.html)

Automation would probably be straightforward: Creating DOIs with the Datacite REST API

New object detection and image segmentation widgets

For the DETR model, which will soon be part of HuggingFace Transformers (see huggingface/transformers#11653 (comment)), it would be cool to have object detection and image segmentation (actually panoptic segmentation) inference widgets.

Similar to the image classification widget, a user should be able to upload/drag an image to the widget, which is then annotated with bounding boxes and classes (in case of object detection), or turned into a segmentation map (in case of panoptic segmentation).

Here are 2 notebooks which illustrate what you can do with the head models of DETR:

DetrForObjectDetection: https://colab.research.google.com/drive/170dlGN5s37uaYO32XKUHfPklGS8oB059?usp=sharing
DetrForSegmentation: https://colab.research.google.com/drive/1hTGTPGBLPRY1QkLmG7P9air6v04tcXUL?usp=sharing

The models are already on the hub: https://huggingface.co/models?search=facebook/detr

cc @LysandreJik

Monetisation of models

What do you think about paying some % to developer / owners of models if it's getting used through inference API ?
It could make it more attractive uploading more amazing models if it's possible to generate some passive income.
It would be like rapid API for NLP πŸ€—

Image to Image Widget

Would have as input

  • Image

Would output:

  • Image

This should be similar in implementation to the image classification widget.

Add docs on Hub organisations

As described in this forum comment, the Hub docs are currently a bit sparse on:

  • Searching for existing orgs
  • How to join an org
  • What features are available between free vs premium subscriptions

It would be nice to integrate this feedback into the docs.

Allow uploading large files via the web UI

Is your feature request related to a problem? Please describe.
I tried to upload a 13GB dataset file (for this repo), and after waiting a couple of hours for it to upload, it gave a "Payload too large" error.

Describe the solution you'd like
I'd like to be able to upload large dataset files using the web UI.

Describe alternatives you've considered
I'm guessing there's some sort of command line tool for cases like this, but I'd prefer to not have to install something just to upload a file - it's bad UX. There's no fundamental technical reason why the web UI can't handle large files, so it doesn't seem like a good idea to put limits like this in place.

Additional info
If the team is for some reason adamant about not allowing upload of large files via the web UI, then at the very least it would be good to stop the user with an error when they pick the file, rather than after the upload is complete. I.e. check the size of the blob with JS before uploading rather than uploading and checking size on the server.

If it's actually already possible to upload large files via the web UI (and I've just misunderstood the process), then please consider this issue a request for better UX in guiding the user toward doing that. After creating the dataset repo I just clicked across to the "files and versions" tab, and then clicked "Add file > Upload file".

Thanks!

Tracking integration for Image Question Answering

Tracking integration of task -Image Question Answering

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget #21

Integration guide: https://hf.co/docs/hub/adding-a-task

cc @mishig25 @NielsRogge

Text regression rather than classification

Is your feature request related to a problem? Please describe.
I am trying to share our hate speech measurement model, which predicts a continuous measure for hate speech severity. So it is doing text regression rather than classification, but the input remains just a text input that gets tokenized. Is it possible to clarify if I need to create a custom pipeline for this, or how else to proceed? I have one of our models uploaded for TF at https://huggingface.co/ucberkeley-dlab/hate-measure-roberta-large

The architecture is just a Transformer backbone, 1D global average pooling layer, followed by a linear output node.

Describe the solution you'd like
I would like to support the hosted inference API so that individuals can get a continuous score prediction out of our model and for the demo widget to work.

Describe alternatives you've considered
I'm not clear if I need to create a new TextRegression task for HuggingFace, which feels like it would be a large lift, or if there is an easier way to do this.

Additional context
The preprint for our work is at https://arxiv.org/abs/2009.10277 and this is to allow the model to be used in the Jigsaw Toxic Severity Kaggle competition: https://www.kaggle.com/c/jigsaw-toxic-severity-rating/overview

Feature request: widgets for document image understanding and document image VQA

I'm currently adding LayoutLMv2 and LayoutXLM to HuggingFace Transformers. These models, built by Microsoft, have impressive capabilities for understanding document images (scanned documents, such as PDFs). LayoutLM, and its successor, LayoutLMv2, are extensions of BERT that incorporate layout and visual information, besides just text. LayoutXLM is a multilingual version of LayoutLMv2.

It would be really cool to have inference widgets for the following tasks:

  • document image understanding
  • document image visual question answering
  • document image classification

Document image understanding

Document image understanding (also called form understanding) means understanding all pieces of information of a document image. Example datasets here are FUNSD, CORD, SROIE and Kleister-NDA.

The input is a document image:
Schermafbeelding 2021-08-19 om 10 12 18

The output should be the same image, but with colored bounding boxes, indicating for example what part of the image are questions (blue), which are answers (green), which are headers (orange), etc.
Schermafbeelding 2021-08-19 om 10 12 45

LayoutLMv2 solves this as a NER problem, using LayoutLMv2ForTokenClassification. First, an OCR engine is run on the image to get a list of words + corresponding coordinates. These are then tokenized, and together with the image sent through the LayoutLMv2 model. The model then labels each token using its classification head.

Document visual question answering

Document visual question answering means, given an image + question, generate (or extract) an answer. For example, for the PDF document above, a question could be "what's the date at which this document was sent?", and the answer is "January 11, 1999".
Example datasets here are DocVQA - on which LayoutLMv2 obtains SOTA performance, who might have guessed.

LayoutLMv2 solves this as a extractive question answering problem similar to SQuAD. I've defined a LayoutLMv2ForQuestionAnswering, which predicts the start_positions and end_positions.

Document image classification

Document image classification is fairly simple: given a document image, classify it (e.g. invoice/form/letter/email/etc.). Example datasets here are [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/#:~:text=The%20RVL%2DCDIP%20(Ryerson%20Vision,images%2C%20and%2040%2C000%20test%20images.). For this, I have defined a LayoutLMv2ForSequenceClassification, which just places a linear layer on top of the model in order to classify documents.

Remarks

I don't think we can leverage the existing 'token-classification', 'question-answering' and 'image-classification' pipelines, as the inputs are quite different (document images instead of text). To ease the development of new pipelines, I have implemented a new LayoutLMv2Processor, which takes care of all the preprocessing required for LayoutLMv2. It combines a LayoutLMv2FeatureExtractor (for the image modality) and LayoutLMv2Tokenizer (for the text modality). I would also argue that if we have other models in the future, they all implement a processor that takes care of all the preprocessing (and possibly postprocessing). Processors are ideal for multi-modal models (they have been defined previously for CLIP and Wav2Vec2).

Tracking integration for Image to Text (Image Captioning, etc)

Tracking integration of task - Image to Text

Note that you're not expected to do all of the following steps. This issue helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget #15

Integration guide: https://hf.co/docs/hub/adding-a-task

(Space) transformers is not installed in gradio SDK?

Is your feature request related to a problem? Please describe.

I got this error:

Traceback (most recent call last):
  File "app.py", line 1, in <module>
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
ModuleNotFoundError: No module named 'transformers'

when I pushed this version to spaces https://huggingface.co/spaces/ttj/t0-generation/commit/efc071478da5be7f6a369b034ecda4844ed3ad22

Describe the solution you'd like

Pre-install common libraries.

Tracking integration for Video to Text (video-text retrieval)

Tracking integration of task - Video Text Retrieval

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget

Integration guide: https://hf.co/docs/hub/adding-a-task

Example model: https://arxiv.org/pdf/2111.05610v1.pdf
cc @nateraw

Improve the tagging of models

Hi,

I hope first of all to open this problem in the right place (I hesitated to post in this repo but it seems less active with only issues in 1 year: https://github.com/huggingface/model_card).

I'll illustrate my observation by talking about French models, but the logic applies to any language.

I found by chance the following model on the hub: https://huggingface.co/dbmdz/electra-base-french-europeana-cased-generator
An electra model in French released more than a year ago and I had never heard of it? How it's possible?
I realized that it was simply because it was not referenced correctly (no "fr" tag). This probably explains why it was downloaded only 12 times last month. I think it's a shame.

So I did a little more research to see if I had missed any other French models that were not referenced and here is the list I came up with:

This represents 24 models. If we calculate in relation to what is announced by the "fr" filter (https://huggingface.co/models?language=fr), it's about 7.5% (24/(24+300)) of the models in French that are not referenced.
So I think it would be important to improve the reference.

I have two ideas to submit:

  • either proposed to be able to add new tags to users.
    However, this could lead to abuse from bad people tagging all the models they find in a given language, so there should be a system of safeguards by allowing maybe only certain people to do that. Or for example a place where people could indicate the badly indexed models by providing the link to the model + the tag to add and a person of Hugging Face would check manually and update things once a month for example
  • propose tag suggestions to people adding their models on the Hub based on the model name. I found the 24 models by simply typing "french" and "-fr-" in the search bar. This would complement what you are already doing on metadata: https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined

A slightly different but related topic is multilingual models. Should multilingual models to be tag with all the languages they contain or not?
This solution has been adopted for Helsinki NLP templates (an example: https://huggingface.co/Helsinki-NLP/opus-mt-af-fr tagged in "af" and "fr").
But this is not the case for Geotrend models (an example: https://huggingface.co/Geotrend/bert-base-en-fr-cased, contains neither "en" nor "fr") or for T-systems (an example: https://huggingface.co/T-Systems-onsite/cross-en-fr-roberta-sentence-transformer contains neither "en" nor "fr").
I haven't checked with datasets, but I guess the problem must apply there too. So I think this would be a point to harmonize.

Have a nice day :)

Tracking integration for Zero-Shot Sequence Labeling with TARS [FLAIR]

In v0.9 of flair a new zero-shot sequence labeling feature was included which seems (a) very exciting and (b) very useful for real-world NER applications where acquiring labeled data is expensive.

It would be cool to include support for this in the Hugging Face Hub, possibly as a companion widget to the existing one for zero-shot classification.

Here's a link to the paper behind the TARS technique.

cc @osanseviero

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget

Improve right-to-left language support in inputs and outputs

Is your feature request related to a problem? Please describe.

In newer versions of HuggingFace_Hub, text inputs and outputs left-justify their text, when we can add an attribute to automatically detect and adjust for right-to-left language text.

Describe the solution you'd like

recommended

  • Model Page, Widget input: either the <form> or the <span ... role="textbox"> element should have HTML attribute dir="auto" (no css equivalent)
  • Model Page, Text Generation and Speech Recognition output: the <p ... class="alert alert-success"> element should have dir="auto"

less critical

  • Model and Dataset readme: I believe what we had earlier would be JS after the markdown element renders, to select .prose > {h1, h2, h3, h4, p} and apply dir="auto"; the selection is so we don't mess with code / pre / table elements
  • Model Page, Fill-Mask Widget: the <div> around the bar chart could also have dir="auto"
  • Spaces (probably will submit to Gradio) - the user has more control here, so I might just advise the user that they can add direction: rtl CSS to text inputs and outputs

Getting Error while Starting Sparksession

Hello, I am trying to create new streamlit app on HuggingFace by using pyspark in it.

I've created app.py file, requirement.txt file and I run a basic streamlit app without pyspark and it worked seamlessly.

However, problem starts when I add the following line spark = SparkSession.builder.appName("appName").getOrCreate() to start sparksession, I got an error like following:

Exception: Java gateway process exited before sending its port number

The content of requirements.txt file:

streamlit
pyspark==3.1.2
spark-nlp==3.3.2

If Anyone worked with pyspark on hf can help me I would be appreciate!

Widget for image captioning

Would have as input

  • Image

Would output

  • Text

This should be similar in implementation to the image classification widget.

Show `pipeline` in "use in transformers" instead.

When I go to a repo like this one I can click the "use in transformers" button. This makes me think "grand! that'll be a quickstart!". It shows me this code;

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large")
model = AutoModelForTokenClassification.from_pretrained("oliverguhr/fullstop-punctuation-multilang-large")

This is a bit of a bummer. It tells me how to set up a tokenizer and a model. But it doesn't tell me at all how to use the model. It'd be much more pragmatic if instead it showed this line of code;

from transformers import pipeline

pipe = pipeline("token-classification", "oliverguhr/fullstop-punctuation-multilang-large")
pipe(["this is an example that you can actually run"])

This way, I don't need to search the docs for the type of pipeline model that I'm dealing with and I immediately have something that works in my notebook. Wouldn't it be better to generate the pipeline code in the docs?

Potentially you could still render the model/tokeniser as well, but these don't feel part of the "getting started journey".

Text-generation API endpoint fails to handle integer-formatted numeric arguments in JSON request

When I call the hosted text-generation API, the request fails if I set the temperature parameter to an integer value of 1 instead of a float value of 1.0.

For example, this succeeds:

curl -i -X POST https://api-inference.huggingface.co/models/my_organization/my_model \
     -H "Authorization: Bearer <<REDACTED>>" \
     -H "Content-Type: application/json" \
     -d \
     '{
          "inputs":"Once upon a time",
          "parameters": {
            "temperature": 1.0,
            "max_new_tokens": 20
          }
     }'

But this request fails:

curl -i -X POST https://api-inference.huggingface.co/models/my_organization/my_model \
     -H "Authorization: Bearer <<REDACTED>>" \
     -H "Content-Type: application/json" \
     -d \
     '{
          "inputs":"Once upon a time",
          "parameters": {
            "temperature": 1,
            "max_new_tokens": 20
          }
     }'

...with this error message:

{"error":["value is not a valid float: `temperature` in `parameters`"]}

Both requests should succeed. The only difference between the two requests is the numeric formatting of the temperature parameter (with or without a decimal point and trailing zero).

πŸš€Display (and enable filtering by) size / number of parameters

Is your feature request related to a problem? Please describe.
It is hard to tell how big a model is without going to the repo and seeing the size of the weights file.

Describe the solution you'd like
It might prove useful to be able to sort models by size or number of parameters, especially for people with more modest compute budgets. Perhaps even display that information alongside a model's popularity.

Describe alternatives you've considered
Alternatives would be prior knowledge of the distilled / small model landscape. This feature might also help people discover new models they hadn't heard of by enabling filtering queries like "most popular and smallest model trained using MLM on wikitext"

Additional context
The datasets tab has a similar feature with the Size category filter.

Tracking integration for Table to Text generation

Tracking integration of task - Table to Text generation

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget #13

Integration guide: https://hf.co/docs/hub/adding-a-task

SSL certificates renewed very frequently

@osanseviero thank you for looking into this issue.

Simply put, we built an API that receives questions or documents from a user through a web application in the front, and on the back end downloads ML models from huggingface.co in order to answer those questions or documents and encode new documents for search.

In order to connect to huggingface.co to download models, we require an SSL certificate. Without the cert, the following error comes up

image

The current fix requires that I download the certificate chain manually through my browser as follows

image

Note the hugging face SSL certificate expires very soon!

image

To install the certificate, I copy it over to my container in my Dockerfile and add the following lines of code to the application:

import requests
import certifi

try:
    print('Checking connection to Huggingface...')
    test = requests.get('https://huggingface.co')
    print('Connection to Huggingface OK.')
except requests.exceptions.SSLError as err:
    print('SSL Error. Adding custom certs to Certifi store...')
    cafile = certifi.where()
    with open('huggingface-co-chain.pem', 'rb') as infile:
        customca = infile.read()
    with open(cafile, 'ab') as outfile:
        outfile.write(customca)
    print('That might have worked.')

The main problem with this fix is that the certificate I download is only valid for a short period of time (one or two weeks).
-Ideally, we should be able to do this on the command line as part of the container build, but so far efforts to do so have not worked.
-The following command yields a chain of three certificates, while the ones downloaded from the browser have a chain of 4.

openssl s_client -showcerts -verify 5 -connect huggingface.co:443 < /dev/null

-The missing certificate appears to be the zScaler root CA, which shows up in the browser but not the command line.

Revamp of Hub documentation

As we do huggingface/huggingface_hub#744 and add library documentation based in docstrings, the hub docs might split into hugginface_hub and product usage.

Based on existing content, what we can do is more self-contained use cases. An initial mental model without creating additional content would be

Move to huggingface_hub as guides

Then on the Hub

Model card

  • What are model cards and why are they useful?
  • When sharing a model, what should I add to my model card?
  • Model card metadata
  • How are model tags determined?
  • Can I specify which framework supports my model?
  • How can I link a model to a dataset?
  • Can I access models programatically?
  • Can I write LaTeX in my model card?
  • How is a model's type of inference API and widget determined?
  • What are all the possible task/widget types?

Repositories

  • What's a repository?
  • How can I explore the Hugging Face Hub?
  • How can I load/push from/to the Hub?
  • How can I rename or transfer a repo?
  • How can I fork or rebase a repository with LFS pointers?
  • List of license identifiers

CO2 Emissions

  • Why is it useful to calculate the carbon emissions of my model?
  • What information should I include about the carbon footprint of my model?
  • Carbon footprint metadata
  • How is the carbon footprint of my model calculated? 🌎

Widgets

  • What's a widget?
  • How can I control my model's widget example input?
  • How to create a new widget?

Inference API

  • What's the Inference API?
  • How can I control my model's widget Inference API parameters?

Security

Endpoints

Adding a new task

Integrating a new library (with parts of it linking to huggingface_hub)

This would also include Spaces, which can very likely break into more pieces

WDYT @julien-c @LysandreJik @adrinjalali @muellerzr of this as a first step once we kick off the splitting

Tracking integration for Image to Image

Tracking integration of task - Image to image

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

  • Integration with Inference API. Select at least one of the following:
    • Added a transformers pipeline
    • Added to Community Inference API for 3rd party library
    • Added to Community Inference API for generic
  • Added basic UI elements (icon, order specification, etc)
  • Added a widget #51

Integration guide: https://hf.co/docs/hub/adding-a-task

Display size of the generated dataset, downloaded dataset files, total amount of disk used in GB when MB >= 1000 in dataset cards

Is your feature request related to a problem? Please describe.
Currently, the following data fields on the hub are only displayed in MB.

--Size of the generated dataset:
--Size of downloaded dataset files:
--Total amount of disk used:

Figures like 1895.01 MB and 1611.50 MB can become unwieldly as they grow in size to reason about the space they require, compared to 1.89501 GB or 1.6115 GB.

An example page for reference

Describe the solution you'd like
Convert numbers to GB in dataset cards when MB > 1000.

Describe alternatives you've considered
I considered advocating to truncate size to 3 decimal points, as precision to 5 or 6 decimal points in GB (such as a number like 1.543210 GB) may be an unnecessary degree of precision to provide for users.

Ultimately though, I reasoned that more precision is often better.

Additional context
I'm happy to contribute to this. I didn't see exactly where this was handled in the current codebase, so any pointers appreciated. Also, let me know if I should be opening this up in the datasets repo instead...

Give users the ability to compare models' outputs for a given task.

πŸ—’οΈ Motivation

When a user selects a specific task on the Hugging Face Hub - for example, image-to-text:

Screen Shot 2022-03-05 at 10 14 06 PM

That user is shown a series of models, with no guidance as to which model might be state of the art, or which might be the most performant for their use case.

To test the capabilities and behavior of each model, the user must:

  • Open the link to each model in a new browser tab.
  • Read through each model's model card.
  • Test out each model with an image, if the model has a Spaceor a Colab notebook available (not every model does).
  • Cross-reference each model with the state-of-the-art leaderboards on Papers with Code.

πŸ™ Desired Behavior

The user should be able to:

  • Select a given task (for example, image-to-text).
  • Select one or more models to test, for that use case.
  • Input a piece of data, to test (an image, in the case of image-to-text).
  • View the output of each model, given that input data -- side by side, to compare the performance and behavior.

Tracking integration for Object Detection

Tracking integration of task - Object Detection

Note that you're not expected to do all of the following steps. This PR helps track all the steps required to get a new task fully supported in the Hub πŸ”₯

Integration guide: https://hf.co/docs/hub/adding-a-task

Feature request: add link to model documentation

Is your feature request related to a problem? Please describe.
As discussed on Slack, it would be nice if we add a direct link to the documentation of a model on the hub (e.g. linking to https://huggingface.co/transformers/model_doc/bert.html for bert-base-uncased). This can probably be done based on the config.json of a model.

Describe the solution you'd like
A button could be added either next to "Deploy" and "Use in Transformers", or a link could be added within the "Use in Transformers" button.

cc @julien-c

Language filtering on the Hub does not work correctly

(I noticed recently that you changed the language filtering GET parameter from filter to language, but unfortunately my problem still persists)

If I search for Danish models then models like this also pop up, which is not a Danish model, but has the da tag (for "direct assessment"). This could be fixed be filtering by the tag-green class rather than a general tag filtering, as I guess is done currently.

Thanks!

Include a tag for mobile-compatible models.

πŸ‘‹ Greetings! Am not sure if this is the correct place to file an issue for huggingface.co, but figured I'd try anyway. :)

Issue Description

Is your feature request related to a problem? Please describe.
Would it be possible to include models that are mobile-compatible as a filterable category in the Other section on the Hugging Face website? Meaning .tflite, .ptl for PyTorch, or CoreML models for iOS devices.

Screen Shot 2022-03-04 at 9 28 08 AM

Additional context

In competitors' hubs, models of varying sizes are colocated on a single page (example below) - so you might see the base example, a Colab notebook to test out the model interactively, and a TF Lite implementation. You can also sort and filter based on model type, as well as framework version and fine-tunability (though HuggingFace's models would win that contest every time πŸ˜ƒ ).

Screen Shot 2022-03-04 at 9 36 55 AM

Screen Shot 2022-03-04 at 9 34 59 AM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.