kkoutini / passt Goto Github PK

View Code? Open in Web Editor NEW

279.0 4.0 46.0 625 KB

Efficient Training of Audio Transformers with Patchout

License: Apache License 2.0

Python 100.00%

pytorch audio-classification audio-tagging transformer machine-learning pattern-recognition

passt's Introduction

al-folio

A simple, clean, and responsive Jekyll theme for academics. If you like the theme, give it a star!

User community

The vibrant community of al-folio users is growing! Academics around the world use this theme for their homepages, blogs, lab pages, as well as webpages for courses, workshops, conferences, meetups, and more. Check out the community webpages below. Feel free to add your own page(s) by sending a PR.

Academics	★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★
Labs	★ ★ ★ ★ ★ ★ ★
Courses	CMU PGM (S-19) CMU DeepRL (F-19, S-20, F-20, S-21, F-21, S-22) CMU MMML (F-20, F-22) CMU AMMML (S-22, S-23) CMU ASI (S-23) CMU Distributed Systems (S-21)
Conferences & workshops	ICLR Blog Post Track (2023) ML Retrospectives (NeurIPS: 2019, 2020; ICML: 2020) HAMLETS (NeurIPS: 2020) ICBINB (NeurIPS: 2020, 2021) Neural Compression (ICLR: 2021) Score Based Methods (NeurIPS: 2022) Images2Symbols (CogSci: 2022) Medical Robotics Junior Faculty Forum (ISMR: 2023)

Lighthouse PageSpeed Insights

User community
Lighthouse PageSpeed Insights
Getting started
- Installation
- FAQ
Features
Contributing
- Core Contributors
License

Getting started

Want to learn more about Jekyll? Check out this tutorial. Why Jekyll? Read Andrej Karpathy's blog post!

Installation

For a hands-on walkthrough of al-folio installation, check out this cool video tutorial by one of the community members! 🎬 🍿

The preferred way of using this template is by clicking in Use this template above the file list. Then, create a new repository at github.com:<your-username>/<your-repo-name>. If you plan to upload your site to <your-github-username>.github.io, note that the name of your repository must be <your-github-username>.github.io or <your-github-orgname>.github.io, as stated in the GitHub pages docs. For more information on how to deploy your site, check the Deployment section below. After you created your new repository, just download it to your machine:

$ git clone [email protected]:<your-username>/<your-repo-name>.git
$ cd <your-repo-name>

Local setup using Docker (Recommended on Windows)

You need to take the following steps to get al-folio up and running in your local machine:

First, install docker and docker-compose.
Finally, run the following command that will pull a pre-built image from DockerHub and will run your website.

$ docker-compose up

Note that when you run it for the first time, it will download a docker image of size 300MB or so.

Now, feel free to customize the theme however you like (don't forget to change the name!). After you are done, you can use the same command (docker-compose up) to render the webpage with all you changes. Also, make sure to commit your final changes.

To change port number, you can edit docker-compose.yml file.

(click to expand) Build your own docker image:

Note: this approach is only necessary if you would like to build an older or very custom version of al-folio.

Build and run a new docker image using:

$ docker-compose -f docker-local.yml up

If you want to update jekyll, install new ruby packages, etc., all you have to do is build the image again using --force-recreate argument at the end of previous command! It will download ruby and jekyll and install all ruby packages again from scratch.

Local Setup (Standard)

Assuming you have Ruby and Bundler installed on your system (hint: for ease of managing ruby gems, consider using rbenv).

$ bundle install
$ bundle exec jekyll serve --lsi

Now, feel free to customize the theme however you like (don't forget to change the name!). After you are done, commit your final changes.

Deployment

Deploying your website to GitHub Pages is the most popular option. Starting version v0.3.5, al-folio will automatically re-deploy your webpage each time you push new changes to your repository! ✨

For personal and organization webpages:

The name of your repository MUST BE <your-github-username>.github.io or <your-github-orgname>.github.io.
In _config.yml, set url to https://<your-github-username>.github.io and leave baseurl empty.
Set up automatic deployment of your webpage (see instructions below).
Make changes, commit, and push!
After deployment, the webpage will become available at <your-github-username>.github.io.

For project pages:

In _config.yml, set url to https://<your-github-username>.github.io and baseurl to /<your-repository-name>/.
Set up automatic deployment of your webpage (see instructions below).
Make changes, commit, and push!
After deployment, the webpage will become available at <your-github-username>.github.io/<your-repository-name>/.

To enable automatic deployment:

Click on Actions tab and Enable GitHub Actions; do not worry about creating any workflows as everything has already been set for you.
Go to Settings -> Actions -> General -> Workflow permissions, and give Read and write permissions to GitHub Actions
Make any other changes to your webpage, commit, and push. This will automatically trigger the Deploy action.
Wait for a few minutes and let the action complete. You can see the progress in the Actions tab. If completed successfully, in addition to the master branch, your repository should now have a newly built gh-pages branch.
Finally, in the Settings of your repository, in the Pages section, set the branch to gh-pages (NOT to master). For more details, see Configuring a publishing source for your GitHub Pages site.

If you keep your site on another branch, open .github/workflows/deploy.yml on the branch you keep your website on and change on->push->branches and on->pull_request->branches to the branch you keep your website on. This will trigger the action on pulls/pushes on that branch. The action will then deploy the website on the branch it was triggered from.

(click to expand) Manual deployment to GitHub Pages:

If you need to manually re-deploy your website to GitHub pages, go to Actions, click "Deploy" in the left sidebar, then "Run workflow."

(click to expand) Deployment to another hosting server (non GitHub Pages):

If you decide to not use GitHub Pages and host your page elsewhere, simply run:

$ bundle exec jekyll build --lsi

which will (re-)generate the static webpage in the _site/ folder. Then simply copy the contents of the _site/ directory to your hosting server.

Note: Make sure to correctly set the url and baseurl fields in _config.yml before building the webpage. If you are deploying your webpage to your-domain.com/your-project/, you must set url: your-domain.com and baseurl: /your-project/. If you are deploying directly to your-domain.com, leave baseurl blank.

(click to expand) Deployment to a separate repository (advanced users only):

Note: Do not try using this method unless you know what you are doing (make sure you are familiar with publishing sources). This approach allows to have the website's source code in one repository and the deployment version in a different repository.

Let's assume that your website's publishing source is a publishing-source subdirectory of a git-versioned repository cloned under $HOME/repo/. For a user site this could well be something like $HOME/<user>.github.io.

Firstly, from the deployment repo dir, checkout the git branch hosting your publishing source.

Then from the website sources dir (commonly your al-folio fork's clone):

$ bundle exec jekyll build --lsi --destination $HOME/repo/publishing-source

This will instruct jekyll to deploy the website under $HOME/repo/publishing-source.

Note: Jekyll will clean $HOME/repo/publishing-source before building!

The quote below is taken directly from the jekyll configuration docs:

Destination folders are cleaned on site builds

The contents of <destination> are automatically cleaned, by default, when the site is built. Files or folders that are not created by your site will be removed. Some files could be retained by specifying them within the <keep_files> configuration directive.

Do not use an important location for <destination>; instead, use it as a staging area and copy files from there to your web server.

If $HOME/repo/publishing-source contains files that you want jekyll to leave untouched, specify them under keep_files in _config.yml. In its default configuration, al-folio will copy the top-level README.md to the publishing source. If you want to change this behavior, add README.md under exclude in _config.yml.

Note: Do not run jekyll clean on your publishing source repo as this will result in the entire directory getting deleted, irrespective of the content of keep_files in _config.yml.

Upgrading from a previous version

If you installed al-folio as described above, you can configure a GitHub action to automatically sync your repository with the latest version of the theme.

Go to Settings -> Actions -> General -> Workflow permissions, give Read and write permissions to GitHub Actions, check "Allow GitHub Actions to create and approve pull requests", and save your changes.

Then go to Actions -> New workflow -> set up a workflow yourself, setup the following workflow and commit your changes:

name: Sync from template
on:
    # cronjob trigger
  schedule:
  - cron:  "0 0 1 * *"
  # manual trigger
  workflow_dispatch:
jobs:
  repo-sync:
    runs-on: ubuntu-latest
    steps:
      # To use this repository's private action, you must check out the repository
      - name: Checkout
        uses: actions/checkout@v3
      - name: actions-template-sync
        uses: AndreasAugustin/[email protected]
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          source_repo_path: alshedivat/al-folio
          upstream_branch: master

You will receive a pull request within your repository if there are some changes available in the template.

Another option is to manually update your code by following the steps below:

# Assuming the current directory is <your-repo-name>
$ git remote add upstream https://github.com/alshedivat/al-folio.git
$ git fetch upstream
$ git rebase v0.9.0

If you have extensively customized a previous version, it might be trickier to upgrade. You can still follow the steps above, but git rebase may result in merge conflicts that must be resolved. See git rebase manual and how to resolve conflicts for more information. If rebasing is too complicated, we recommend re-installing the new version of the theme from scratch and port over your content and changes from the previous version manually.

FAQ

Here are some frequently asked questions. If you have a different question, please ask using Discussions.

Q: After I create a new repository from this template and setup the repo, I get a deployment error. Isn't the website supposed to correctly deploy automatically?
A: Yes, if you are using release v0.3.5 or later, the website will automatically and correctly re-deploy right after your first commit. Please make some changes (e.g., change your website info in _config.yml), commit, and push. Make sure to follow deployment instructions in the previous section. (Relevant issue: 209.)
Q: I am using a custom domain (e.g., foo.com). My custom domain becomes blank in the repository settings after each deployment. How do I fix that?
A: You need to add CNAME file to the master or source branch of your repository. The file should contain your custom domain name. (Relevant issue: 130.)
Q: My webpage works locally. But after deploying, it fails to build and throws Unknown tag 'toc'. How do I fix that?
A: Make sure you followed through the deployment instructions in the previous section. You should have set the deployment branch to gh-pages. (Related issue: 1438.)
Q: My webpage works locally. But after deploying, it is not displayed correctly (CSS and JS is not loaded properly). How do I fix that?
A: Make sure to correctly specify the url and baseurl paths in _config.yml. Set url to https://<your-github-username>.github.io or to https://<your.custom.domain> if you are using a custom domain. If you are deploying a personal or organization website, leave baseurl blank. If you are deploying a project page, set baseurl: /<your-project-name>/. If all previous steps were done correctly, all is missing is for your browser to fetch again the site stylesheet.
Q: Atom feed doesn't work. Why?
A: Make sure to correctly specify the url and baseurl paths in _config.yml. RSS Feed plugin works with these correctly set up fields: title, url, description and author. Make sure to fill them in an appropriate way and try again.
Q: My site doesn't work when I enable related_blog_posts. Why?
A: This is probably due to the classifier reborn plugin, which is used to calculate related posts. If the error states Liquid Exception: Zero vectors can not be normalized..., it means that it could not calculate related posts for a specific post. This is usually caused by empty or minimal blog posts without meaningful words (i.e. only stop words) or even specific characters you used in your posts. Also, the calculus for similar posts are made for every post, which means every page that uses layout: post, including the announcements. To change this behavior, simply add related_posts: false to the front matter of the page you don't want to display related posts on.

Features

Publications

Your publications' page is generated automatically from your BibTex bibliography. Simply edit _bibliography/papers.bib. You can also add new *.bib files and customize the look of your publications however you like by editing _pages/publications.md.

(click to expand) Author annotation:

In publications, the author entry for yourself is identified by string array scholar:last_name and string array scholar:first_name in _config.yml:

scholar:
  last_name: [Einstein]
  first_name: [Albert, A.]

If the entry matches one form of the last names and the first names, it will be underlined. Keep meta-information about your co-authors in _data/coauthors.yml and Jekyll will insert links to their webpages automatically. The co-author data format in _data/coauthors.yml is as follows,

"Adams":
  - firstname: ["Edwin", "E.", "E. P.", "Edwin Plimpton"]
    url: https://en.wikipedia.org/wiki/Edwin_Plimpton_Adams

"Podolsky":
  - firstname: ["Boris", "B.", "B. Y.", "Boris Yakovlevich"]
    url: https://en.wikipedia.org/wiki/Boris_Podolsky

"Rosen":
  - firstname: ["Nathan", "N."]
    url: https://en.wikipedia.org/wiki/Nathan_Rosen

"Bach":
  - firstname: ["Johann Sebastian", "J. S."]
    url: https://en.wikipedia.org/wiki/Johann_Sebastian_Bach

  - firstname: ["Carl Philipp Emanuel", "C. P. E."]
    url: https://en.wikipedia.org/wiki/Carl_Philipp_Emanuel_Bach

If the entry matches one of the combinations of the last names and the first names, it will be highlighted and linked to the url provided.

(click to expand) Buttons (through custom bibtex keywords):

There are several custom bibtex keywords that you can use to affect how the entries are displayed on the webpage:

abbr: Adds an abbreviation to the left of the entry. You can add links to these by creating a venue.yaml-file in the _data folder and adding entries that match.
abstract: Adds an "Abs" button that expands a hidden text field when clicked to show the abstract text
arxiv: Adds a link to the Arxiv website (Note: only add the arxiv identifier here - the link is generated automatically)
bibtex_show: Adds a "Bib" button that expands a hidden text field with the full bibliography entry
html: Inserts an "HTML" button redirecting to the user-specified link
pdf: Adds a "PDF" button redirecting to a specified file (if a full link is not specified, the file will be assumed to be placed in the /assets/pdf/ directory)
supp: Adds a "Supp" button to a specified file (if a full link is not specified, the file will be assumed to be placed in the /assets/pdf/ directory)
blog: Adds a "Blog" button redirecting to the specified link
code: Adds a "Code" button redirecting to the specified link
poster: Adds a "Poster" button redirecting to a specified file (if a full link is not specified, the file will be assumed to be placed in the /assets/pdf/ directory)
slides: Adds a "Slides" button redirecting to a specified file (if a full link is not specified, the file will be assumed to be placed in the /assets/pdf/ directory)
website: Adds a "Website" button redirecting to the specified link
altmetric: Adds an Altmetric badge (Note: if DOI is provided just use true, otherwise only add the altmetric identifier here - the link is generated automatically)
dimensions: Adds a Dimensions badge (Note: if DOI or PMID is provided just use true, otherwise only add the Dimensions' identifier here - the link is generated automatically)

You can implement your own buttons by editing the bib.html file.

Collections

This Jekyll theme implements collections to let you break up your work into categories. The theme comes with two default collections: news and projects. Items from the news collection are automatically displayed on the home page. Items from the projects collection are displayed on a responsive grid on projects page.

You can easily create your own collections, apps, short stories, courses, or whatever your creative work is. To do this, edit the collections in the _config.yml file, create a corresponding folder, and create a landing page for your collection, similar to _pages/projects.md.

Layouts

al-folio comes with stylish layouts for pages and blog posts.

The iconic style of Distill

The theme allows you to create blog posts in the distill.pub style:

For more details on how to create distill-styled posts using <d-*> tags, please refer to the example.

Full support for math & code

al-folio supports fast math typesetting through MathJax and code syntax highlighting using GitHub style:

Photos

Photo formatting is made simple using Bootstrap's grid system. Easily create beautiful grids within your blog posts and project pages:

Other features

GitHub's repositories and user stats

al-folio uses github-readme-stats and github-profile-trophy to display GitHub repositories and user stats on the /repositories/ page.

Edit the _data/repositories.yml and change the github_users and github_repos lists to include your own GitHub profile and repositories to the /repositories/ page.

You may also use the following codes for displaying this in any other pages.

<!-- code for GitHub users -->
{% if site.data.repositories.github_users %}
<div class="repositories d-flex flex-wrap flex-md-row flex-column justify-content-between align-items-center">
  {% for user in site.data.repositories.github_users %}
    {% include repository/repo_user.html username=user %}
  {% endfor %}
</div>
{% endif %}

<!-- code for GitHub trophies -->
{% if site.repo_trophies.enabled %}
{% for user in site.data.repositories.github_users %}
  {% if site.data.repositories.github_users.size > 1 %}
  <h4>{{ user }}</h4>
  {% endif %}
  <div class="repositories d-flex flex-wrap flex-md-row flex-column justify-content-between align-items-center">
  {% include repository/repo_trophies.html username=user %}
  </div>
{% endfor %}
{% endif %}

<!-- code for GitHub repositories -->
{% if site.data.repositories.github_repos %}
<div class="repositories d-flex flex-wrap flex-md-row flex-column justify-content-between align-items-center">
  {% for repo in site.data.repositories.github_repos %}
    {% include repository/repo.html repository=repo %}
  {% endfor %}
</div>
{% endif %}

Theming

A variety of beautiful theme colors have been selected for you to choose from. The default is purple, but you can quickly change it by editing the --global-theme-color variable in the _sass/_themes.scss file. Other color variables are listed there as well. The stock theme color options available can be found at _sass/variables.scss. You can also add your own colors to this file assigning each a name for ease of use across the template.

Social media previews

al-folio supports preview images on social media. To enable this functionality you will need to set serve_og_meta to true in your _config.yml. Once you have done so, all your site's pages will include Open Graph data in the HTML head element.

You will then need to configure what image to display in your site's social media previews. This can be configured on a per-page basis, by setting the og_image page variable. If for an individual page this variable is not set, then the theme will fall back to a site-wide og_image variable, configurable in your _config.yml. In both the page-specific and site-wide cases, the og_image variable needs to hold the URL for the image you wish to display in social media previews.

Atom (RSS-like) Feed

It generates an Atom (RSS-like) feed of your posts, useful for Atom and RSS readers. The feed is reachable simply by typing after your homepage /feed.xml. E.g. assuming your website mountpoint is the main folder, you can type yourusername.github.io/feed.xml

By default, there will be a related posts section on the bottom of the blog posts. These are generated by selecting the max_related most recent posts that share at least min_common_tags tags with the current post. If you do not want to display related posts on a specific post, simply add related_posts: false to the front matter of the post. If you want to disable it for all posts, simply set enabled to false in the related_blog_posts section in _config.yml.

Contributing

Contributions to al-folio are very welcome! Before you get started, please take a look at the guidelines.

If you would like to improve documentation, add your webpage to the list below, or fix a minor inconsistency or bug, please feel free to send a PR directly to master. For more complex issues/bugs or feature requests, please open an issue using the appropriate template.

Maintainers

Our most active contributors are welcome to join the maintainers team. If you are interested, please reach out!

_Maruan

_{Rohan Deb Sarkar}

_{Amir Pourmand}

_George

License

The theme is available as open source under the terms of the MIT License.

Originally, al-folio was based on the *folio theme (published by Lia Bogoev and under the MIT license). Since then, it got a full re-write of the styles and many additional cool features.

passt's People

Contributors

Stargazers

Watchers

passt's Issues

EOF (End Of File) Error on num_workers>0

I am trying to finetune the model on DCASE2020 dataset. I have prepared the sample ex_dcase.y file and dataset.py file inspired from esc50 dataset but whenever I increase the num_workers in train or test dataloader, I recieve the EOF File error. Basically 2 errors arise namely:
Traceback (most recent call last):
File "", line 1, in
File "path\venv\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "path\venv\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
ERROR - passt_Dcase2020 - Failed after 0:00:12!

Also the following error :
Traceback (most recent calls WITHOUT Sacred internals):
File "ex_dcase.py", line 436, in default_command
return main()
File "ex_dcase.py", line 275, in main
trainer.fit(
File "path\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "path\venv\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "path\venv\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_roll_func..roll_func'

Can you help me fix the above error, or suggest any changes that could work ?

Could not solve for environment specs

I clone the repo. As per the README:

conda install mamba -n base -c conda-forge

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/miniconda3

  added / updated specs:
    - mamba


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-22.11.1              |   py39h2804cbe_1         873 KB  conda-forge
    fmt-9.1.0                  |       hffc8910_0         171 KB  conda-forge
    krb5-1.20.1                |       h127bd45_0         1.0 MB  conda-forge
    libarchive-3.5.2           |       h69ec738_3         1.5 MB  conda-forge
    libcurl-7.87.0             |       hbe9bab4_0         304 KB  conda-forge
    libedit-3.1.20191231       |       hc8eb9b7_2          94 KB  conda-forge
    libev-4.33                 |       h642e427_1          98 KB  conda-forge
    libmamba-1.1.0             |       h1254013_2         1.0 MB  conda-forge
    libmambapy-1.1.0           |   py39h8f82c16_2         214 KB  conda-forge
    libnghttp2-1.47.0          |       h232270b_1         816 KB  conda-forge
    libsolv-0.7.23             |       hb5ab8b9_0         373 KB  conda-forge
    libssh2-1.10.0             |       hb80f160_3         218 KB  conda-forge
    libxml2-2.9.14             |       h9d8dfc2_4         656 KB  conda-forge
    lz4-c-1.9.3                |       hbdafb3b_1         147 KB  conda-forge
    lzo-2.10                   |    h642e427_1000         154 KB  conda-forge
    mamba-1.1.0                |   py39hde45b87_2          48 KB  conda-forge
    openssl-1.1.1s             |       h03a7124_1         1.5 MB  conda-forge
    pybind11-abi-4             |       hd8ed1ab_3          10 KB  conda-forge
    reproc-14.2.4              |       h1a8c8d9_0          27 KB  conda-forge
    reproc-cpp-14.2.4          |       hb7217d7_0          20 KB  conda-forge
    yaml-cpp-0.7.0             |       hb7217d7_2         133 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         9.4 MB

The following NEW packages will be INSTALLED:

  fmt                conda-forge/osx-arm64::fmt-9.1.0-hffc8910_0 
  icu                conda-forge/osx-arm64::icu-70.1-h6b3803e_0 
  krb5               conda-forge/osx-arm64::krb5-1.20.1-h127bd45_0 
  libarchive         conda-forge/osx-arm64::libarchive-3.5.2-h69ec738_3 
  libcurl            conda-forge/osx-arm64::libcurl-7.87.0-hbe9bab4_0 
  libedit            conda-forge/osx-arm64::libedit-3.1.20191231-hc8eb9b7_2 
  libev              conda-forge/osx-arm64::libev-4.33-h642e427_1 
  libiconv           conda-forge/osx-arm64::libiconv-1.17-he4db4b2_0 
  libmamba           conda-forge/osx-arm64::libmamba-1.1.0-h1254013_2 
  libmambapy         conda-forge/osx-arm64::libmambapy-1.1.0-py39h8f82c16_2 
  libnghttp2         conda-forge/osx-arm64::libnghttp2-1.47.0-h232270b_1 
  libsolv            conda-forge/osx-arm64::libsolv-0.7.23-hb5ab8b9_0 
  libssh2            conda-forge/osx-arm64::libssh2-1.10.0-hb80f160_3 
  libxml2            conda-forge/osx-arm64::libxml2-2.9.14-h9d8dfc2_4 
  lz4-c              conda-forge/osx-arm64::lz4-c-1.9.3-hbdafb3b_1 
  lzo                conda-forge/osx-arm64::lzo-2.10-h642e427_1000 
  mamba              conda-forge/osx-arm64::mamba-1.1.0-py39hde45b87_2 
  pybind11-abi       conda-forge/noarch::pybind11-abi-4-hd8ed1ab_3 
  reproc             conda-forge/osx-arm64::reproc-14.2.4-h1a8c8d9_0 
  reproc-cpp         conda-forge/osx-arm64::reproc-cpp-14.2.4-hb7217d7_0 
  yaml-cpp           conda-forge/osx-arm64::yaml-cpp-0.7.0-hb7217d7_2 
  zstd               conda-forge/osx-arm64::zstd-1.5.2-h8128057_4 

The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2022.10.11~ --> conda-forge::ca-certificates-2022.12.7-h4653dfc_0 
  libcxx                pkgs/main::libcxx-12.0.0-hf6beb65_1 --> conda-forge::libcxx-14.0.6-h2692d47_0 
  libzlib                                 1.2.12-ha287fd2_2 --> 1.2.13-h03a7124_4 
  openssl              pkgs/main::openssl-1.1.1s-h1a28f6b_0 --> conda-forge::openssl-1.1.1s-h03a7124_1 
  zlib                    pkgs/main::zlib-1.2.12-h5a0b063_2 --> conda-forge::zlib-1.2.13-h03a7124_4 

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            pkgs/main/osx-arm64::certifi-2022.12.~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0 
  conda              pkgs/main::conda-22.11.1-py39hca03da5~ --> conda-forge::conda-22.11.1-py39h2804cbe_1 


Proceed ([y]/n)? 


Downloading and Extracting Packages
                                                                                                                                     
Preparing transaction: done                                                                                                          
Verifying transaction: done                                                                                                          
Executing transaction: done

But then the next mambo command fails :\

mamba env create -f environment.yml

with

pkgs/r/osx-arm64                                              No change
pkgs/main/osx-arm64                                           No change
pkgs/main/noarch                                              No change
pkgs/r/noarch                                                 No change
conda-forge/osx-arm64                                4.7MB @ 351.1kB/s 13.6s
conda-forge/noarch                                  10.7MB @ 566.8kB/s 19.2s

                                                                                                                                     
Looking for: ['_libgcc_mutex==0.1=conda_forge', '_openmp_mutex==4.5=2_gnu', '_pytorch_select==0.1=cpu_0', 'appdirs==1.4.4=pyh9f0ad1d_0', 'audioread==2.1.9=py37h89c1867_4', 'blas==1.0=mkl', 'brotlipy==0.7.0=py37h5e8e339_1001', 'bzip2==1.0.8=h7f98852_4', 'c-ares==1.17.1=h7f98852_1', 'ca-certificates==2020.12.5=ha878542_0', 'cached-property==1.5.2=hd8ed1ab_1', 'cached_property==1.5.2=pyha770c72_1', 'certifi==2020.12.5=py37h89c1867_1', 'cffi==1.14.5=py37hc58025e_0', 'chardet==4.0.0=py37h89c1867_3', 'colorama==0.4.4=pyh9f0ad1d_0', 'cryptography==3.4.6=py37h5d9358c_0', 'cycler==0.10.0=py_2', 'decorator==4.4.2=py_0', 'docopt==0.6.2=py_1', 'ffmpeg==4.3.1=hca11adc_2', 'freetype==2.10.4=h0708190_1', 'gettext==0.19.8.1=h0b5b191_1005', 'gitdb==4.0.5=pyhd8ed1ab_1', 'gitpython==3.1.14=pyhd8ed1ab_0', 'gmp==6.2.1=h58526e2_0', 'gnutls==3.6.13=h85f3911_1', 'h5py==3.1.0=nompi_py37h1e651dc_100', 'hdf5==1.10.6=nompi_h6a2412b_1114', 'idna==2.10=pyh9f0ad1d_0', 'importlib-metadata==3.7.3=py37h89c1867_0', 'importlib_metadata==3.7.3=hd8ed1ab_0', 'intel-openmp==2020.2=254', 'joblib==1.0.1=pyhd8ed1ab_0', 'jpeg==9d=h36c2ea0_0', 'jsonpickle==1.4.1=pyh9f0ad1d_0', 'kiwisolver==1.3.1=py37h2527ec5_1', 'krb5==1.17.2=h926e7f8_0', 'lame==3.100=h7f98852_1001', 'lcms2==2.12=hddcbb42_0', 'ld_impl_linux-64==2.35.1=hea4e1c9_2', 'libblas==3.9.0=1_h86c2bf4_netlib', 'libcblas==3.9.0=5_h92ddd45_netlib', 'libcurl==7.75.0=hc4aaa36_0', 'libedit==3.1.20191231=he28a2e2_2', 'libev==4.33=h516909a_1', 'libffi==3.3=h58526e2_2', 'libflac==1.3.3=h9c3ff4c_1', 'libgcc-ng==9.3.0=h2828fa1_19', 'libgfortran-ng==9.3.0=hff62375_19', 'libgfortran5==9.3.0=hff62375_19', 'libgomp==9.3.0=h2828fa1_19', 'liblapack==3.9.0=5_h92ddd45_netlib', 'libllvm10==10.0.1=he513fc3_3', 'libnghttp2==1.43.0=h812cca2_0', 'libogg==1.3.4=h7f98852_1', 'libopenblas==0.3.12=pthreads_h4812303_1', 'libopus==1.3.1=h7f98852_1', 'libpng==1.6.37=h21135ba_2', 'librosa==0.8.0=pyh9f0ad1d_0', 'libsndfile==1.0.31=h9c3ff4c_1', 'libssh2==1.9.0=ha56f1ee_6', 'libstdcxx-ng==9.3.0=h6de172a_19', 'libtiff==4.2.0=hbd63e13_2', 'libvorbis==1.3.7=h9c3ff4c_0', 'libwebp-base==1.2.0=h7f98852_2', 'libzlib==1.2.11=h36c2ea0_1013', 'llvm-openmp==11.1.0=h4bd325d_1', 'llvmlite==0.36.0=py37h9d7f4d0_0', 'lz4-c==1.9.3=h9c3ff4c_1', 'matplotlib-base==3.3.4=py37h0c9df89_0', 'mkl==2020.2=256', 'mkl-service==2.3.0=py37h8f50634_2', 'munch==2.5.0=py_0', 'ncurses==6.2=h58526e2_4', 'nettle==3.6=he412f7d_0', 'ninja==1.10.2=h4bd325d_0', 'numba==0.53.0=py37h7dd73a4_1', 'numpy==1.20.1=py37haa41c4c_0', 'olefile==0.46=pyh9f0ad1d_1', 'openblas==0.3.12=pthreads_h04b7a96_1', 'openh264==2.1.1=h780b84a_0', 'openjpeg==2.4.0=hb52868f_1', 'openssl==1.1.1k=h7f98852_0', 'packaging==20.9=pyh44b312d_0', 'pandas==1.2.3=py37hdc94413_0', 'pillow==8.1.2=py37h4600e1f_1', 'pip==21.0.1=pyhd8ed1ab_0', 'pooch==1.3.0=pyhd8ed1ab_0', 'py-cpuinfo==7.0.0=pyh9f0ad1d_0', 'pycparser==2.20=pyh9f0ad1d_2', 'pyopenssl==20.0.1=pyhd8ed1ab_0', 'pyparsing==2.4.7=pyhd8ed1ab_1', 'pysocks==1.7.1=py37h89c1867_5', 'pysoundfile==0.10.3.post1=pyhd3deb0d_0', 'python==3.7.10=hffdb5ce_100_cpython', 'python-dateutil==2.8.1=py_0', 'python_abi==3.7=3_cp37m', 'pytz==2021.1=pyhd8ed1ab_0', 'readline==8.0=he28a2e2_2', 'requests==2.25.1=pyhd3deb0d_0', 'resampy==0.2.2=py_0', 'scikit-learn==0.24.1=py37h69acf81_0', 'scipy==1.6.1=py37h14a347d_0', 'setuptools==49.6.0=py37h89c1867_3', 'six==1.15.0=pyh9f0ad1d_0', 'smmap==3.0.5=pyh44b312d_0', 'sqlite==3.34.0=h74cdb3f_0', 'threadpoolctl==2.1.0=pyh5ca1d4c_0', 'tk==8.6.10=h21135ba_1', 'tornado==6.1=py37h5e8e339_1', 'typing_extensions==3.7.4.3=py_0', 'urllib3==1.26.4=pyhd8ed1ab_0', 'wrapt==1.12.1=py37h5e8e339_3', 'x264==1!161.3030=h7f98852_1', 'xz==5.2.5=h516909a_1', 'zipp==3.4.1=pyhd8ed1ab_0', 'zlib==1.2.11=h36c2ea0_1013', 'zstd==1.4.9=ha95c52a_0']


Could not solve for environment specs
Encountered problems while solving:
  - nothing provides requested _libgcc_mutex ==0.1 conda_forge
  - nothing provides requested _openmp_mutex ==4.5 2_gnu
  - nothing provides requested audioread ==2.1.9 py37h89c1867_4
  - nothing provides requested blas ==1.0 mkl
  - nothing provides requested brotlipy ==0.7.0 py37h5e8e339_1001
  - nothing provides requested bzip2 ==1.0.8 h7f98852_4
  - nothing provides requested c-ares ==1.17.1 h7f98852_1
  - nothing provides requested ca-certificates ==2020.12.5 ha878542_0
  - nothing provides requested certifi ==2020.12.5 py37h89c1867_1
  - nothing provides requested cffi ==1.14.5 py37hc58025e_0
  - nothing provides requested chardet ==4.0.0 py37h89c1867_3
  - nothing provides requested cryptography ==3.4.6 py37h5d9358c_0
  - nothing provides requested ffmpeg ==4.3.1 hca11adc_2
  - nothing provides requested freetype ==2.10.4 h0708190_1
  - nothing provides requested gettext ==0.19.8.1 h0b5b191_1005
  - nothing provides requested gmp ==6.2.1 h58526e2_0
  - nothing provides requested gnutls ==3.6.13 h85f3911_1
  - nothing provides requested h5py ==3.1.0 nompi_py37h1e651dc_100
  - nothing provides requested hdf5 ==1.10.6 nompi_h6a2412b_1114
  - nothing provides requested importlib-metadata ==3.7.3 py37h89c1867_0
  - nothing provides requested intel-openmp ==2020.2 254
  - nothing provides requested jpeg ==9d h36c2ea0_0
  - nothing provides requested kiwisolver ==1.3.1 py37h2527ec5_1
  - nothing provides requested krb5 ==1.17.2 h926e7f8_0
  - nothing provides requested lame ==3.100 h7f98852_1001
  - nothing provides requested lcms2 ==2.12 hddcbb42_0
  - nothing provides requested ld_impl_linux-64 ==2.35.1 hea4e1c9_2
  - nothing provides requested libblas ==3.9.0 1_h86c2bf4_netlib
  - nothing provides requested libcblas ==3.9.0 5_h92ddd45_netlib
  - nothing provides requested libcurl ==7.75.0 hc4aaa36_0
  - nothing provides requested libedit ==3.1.20191231 he28a2e2_2
  - nothing provides requested libev ==4.33 h516909a_1
  - nothing provides requested libffi ==3.3 h58526e2_2
  - nothing provides requested libflac ==1.3.3 h9c3ff4c_1
  - nothing provides requested libgcc-ng ==9.3.0 h2828fa1_19
  - nothing provides requested libgfortran-ng ==9.3.0 hff62375_19
  - nothing provides requested libgfortran5 ==9.3.0 hff62375_19
  - nothing provides requested libgomp ==9.3.0 h2828fa1_19
  - nothing provides requested liblapack ==3.9.0 5_h92ddd45_netlib
  - nothing provides requested libllvm10 ==10.0.1 he513fc3_3
  - nothing provides requested libnghttp2 ==1.43.0 h812cca2_0
  - nothing provides requested libogg ==1.3.4 h7f98852_1
  - nothing provides requested libopenblas ==0.3.12 pthreads_h4812303_1
  - nothing provides requested libopus ==1.3.1 h7f98852_1
  - nothing provides requested libpng ==1.6.37 h21135ba_2
  - nothing provides requested libsndfile ==1.0.31 h9c3ff4c_1
  - nothing provides requested libssh2 ==1.9.0 ha56f1ee_6
  - nothing provides requested libstdcxx-ng ==9.3.0 h6de172a_19
  - nothing provides requested libtiff ==4.2.0 hbd63e13_2
  - nothing provides requested libvorbis ==1.3.7 h9c3ff4c_0
  - nothing provides requested libwebp-base ==1.2.0 h7f98852_2
  - nothing provides requested libzlib ==1.2.11 h36c2ea0_1013
  - nothing provides requested llvm-openmp ==11.1.0 h4bd325d_1
  - nothing provides requested llvmlite ==0.36.0 py37h9d7f4d0_0
  - nothing provides requested lz4-c ==1.9.3 h9c3ff4c_1
  - nothing provides requested matplotlib-base ==3.3.4 py37h0c9df89_0
  - nothing provides requested mkl ==2020.2 256
  - nothing provides requested mkl-service ==2.3.0 py37h8f50634_2
  - nothing provides requested ncurses ==6.2 h58526e2_4
  - nothing provides requested nettle ==3.6 he412f7d_0
  - nothing provides requested ninja ==1.10.2 h4bd325d_0
  - nothing provides requested numba ==0.53.0 py37h7dd73a4_1
  - nothing provides requested numpy ==1.20.1 py37haa41c4c_0
  - nothing provides requested openblas ==0.3.12 pthreads_h04b7a96_1
  - nothing provides requested openh264 ==2.1.1 h780b84a_0
  - nothing provides requested openjpeg ==2.4.0 hb52868f_1
  - nothing provides requested openssl ==1.1.1k h7f98852_0
  - nothing provides requested pandas ==1.2.3 py37hdc94413_0
  - nothing provides requested pillow ==8.1.2 py37h4600e1f_1
  - nothing provides requested pysocks ==1.7.1 py37h89c1867_5
  - nothing provides requested python ==3.7.10 hffdb5ce_100_cpython
  - nothing provides requested readline ==8.0 he28a2e2_2
  - nothing provides requested scikit-learn ==0.24.1 py37h69acf81_0
  - nothing provides requested scipy ==1.6.1 py37h14a347d_0
  - nothing provides requested setuptools ==49.6.0 py37h89c1867_3
  - nothing provides requested sqlite ==3.34.0 h74cdb3f_0
  - nothing provides requested tk ==8.6.10 h21135ba_1
  - nothing provides requested tornado ==6.1 py37h5e8e339_1
  - nothing provides requested wrapt ==1.12.1 py37h5e8e339_3
  - nothing provides requested x264 ==1!161.3030 h7f98852_1
  - nothing provides requested xz ==5.2.5 h516909a_1
  - nothing provides requested zlib ==1.2.11 h36c2ea0_1013
  - nothing provides requested zstd ==1.4.9 ha95c52a_0
  - package pytz-2021.1-pyhd8ed1ab_0 requires python >=3, but none of the providers can be installed

The environment can't be solved, aborting the operation

This is on an OSX Apple Silicon machine

Openmic2018

Hi authors!

Will you release the code for the openmic2018?

Thanks a lot.

Where is input normalization applied?

Hi Khaled,

Could you please point me to where normalization is applied to inputs? (for the esc50 case or any other cases)

I am talking about channels mean and std such as written in the code below:

IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225)
IMAGENET_INCEPTION_MEAN = (0.5, 0.5, 0.5)
IMAGENET_INCEPTION_STD = (0.5, 0.5, 0.5)


def _cfg(url='', **kwargs):
    return {
        'url': url,
        'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': None,
        'crop_pct': .9, 'interpolation': 'bicubic', 'fixed_input_size': True,
        'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD,
        'first_conv': 'patch_embed.proj', 'classifier': 'head',
        **kwargs
    }

If the first training was done on ImageNet, then I guess ImageNet channels mean and std are applied to Audiosets input when finetuning on this dataset, and also to ESC50 inputs if further finetuning on this one. Am I correct?

Again, I am trying to refactor your code to have only the interesting portion for us fit into our already existing training scripts. But I don't see where those means and standard deviations are applied, whether in the dataset or in AugmentMel.

Thanks a lot (again)

Antoine

OpenMic fine-tuned model?

Do you mind releasing the OpenMic fine-tuned model? So OpenMic style predictions can be made out of the box, without any training?

Pretrained models config

Hi, How can I know configurations used for pre-training models?
e.g. u_patchout, s_patchout_t, s_patchout_f etc...

Thank you!

Evaluate my own model

Hi authors! How can I evaluate my own trained model?

RuntimeError: The size of tensor a (2055) must match the size of tensor b (99) at non-singleton dimension 3

I use a trained model for inference and I encounter this problem when the file length is long.
Traceback (most recent call last):
File "", line 1, in
File "/home/xingyum/anaconda3/envs/ba3l/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xingyum/models/PaSST/output/openmic2008/_None/checkpoints/src/hear21passt/hear21passt/wrapper.py", line 38, in forward
x, features = self.net(specs)
File "/home/xingyum/anaconda3/envs/ba3l/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xingyum/models/PaSST/output/openmic2008/_None/checkpoints/src/hear21passt/hear21passt/models/passt.py", line 507, in forward
x = self.forward_features(x)
File "/home/xingyum/models/PaSST/output/openmic2008/_None/checkpoints/src/hear21passt/hear21passt/models/passt.py", line 454, in forward_features
x = x + time_new_pos_embed
RuntimeError: The size of tensor a (2055) must match the size of tensor b (99) at non-singleton dimension 3

.net and .net_swa parameters in .ckpt file

We have finetuned the passt_s_swa_p16_s16_128_ap473 model on Dcase 2020 dataset for scene classificiation. Now we are trying to use the finetuned model by loading params from ckpt file using state dictionary. But it says it has two types of params .net and.net_swa. Which params are we supposed to use for the architecture

Which config can reproduce the results in paper?

test my own model

Hello, I would like to ask a few questions
I see that the pre-trained models are all .pt files, and the model I trained without changing the default parameters is in the form of .ckpt. But it doesn't matter, when I use "passt_s_swa_p16_128_ap476" as a pre-training model to verify my fine-turn model, some problems arise:
First of all, checkpoint saves another batch of parameters headed by net_swa., which may be related to the use of swa in the code, but the swa used in the introduction of the pre-training model is also used. Why is there no net_swa. parameter when printing the pre-training model, so when I load my own model, there is a problem of Unexpected key(s) in state_dict. I think it may be caused by this part of the code. How to solve this problem?

In addition, I would like to ask, if a single piece of audio verifies my own model, how should the script be written?

Fine tuning on novel dataset

Hello. Firstly, thank you for this great work! I've already had very promising results looking at the "scene" embeddings from these models, and looking to fine tune a model on a new dataset - similar to ESC50 & others. (as a side note, using scene embeddings & a logistic regression, I'm having acceptably good results, however I'm convinced true fine tuning would be significantly better).

I'm having a bit of trouble interpreting the example scripts. Are you able to give a simple explanation of what is required for fine-tuning (e.g. the data format, directories vs JSON file, formal of labels CSV, etc)? It's quite hard to reverse engineer this from the code. I have a directory of files, and known labels, and simply want to fine tune a model on it. And once the data is in place, which functions/CLI scripts should be invoked?

Many thanks, and if I'm missing something obvious, apologies. I know the Audioset page has a few more details but it's still not crystal clear how to proceed. Cheers!

Wavmix for the ESC50 dataset

Hello, thanks a lot for you amazing work and for publishing the code!

I was trying to run the ex_esc50.py with wavmix=True but got the error:

RuntimeError: "nll_loss_forward_no_reduce_cuda_kernel_index" not implemented for 'Double'

since when using wavmix the ground truth is not an integer anymore.

Would it not be more appropriate to use the KL-divergence as loss function instead of the crossentropy?

Installation issues

Hi, I am trying to install and run the PaSST-S method on my own data but I get this error when I run python ex_audioset.py help

File "ex_audioset.py", line 16, in <module>
    from helpers.mixup import my_mixup
ModuleNotFoundError: No module named 'helpers.mixup'

Training Logs

Hi authors, thanks for the great work!
Is there any log files in the training stage?
I didn't find it.

OpenMic2018

Hello author,

Thank you for the great job!
After calling this line of code:
python ex_openmic.py with trainer.precision=16 -p -m mongodb_server:27000:audioset21_balanced -c "OpenMIC PaSST base"

I'm running into an error with the openMic classification.

Traceback (most recent calls):
File "/home/user/Desktop/PaSST/src/sacred/sacred/observers/mongo.py", line 511, in parse_mongo_db_arg
g = re.match(get_pattern(), mongo_db).groupdict()
AttributeError: 'NoneType' object has no attribute 'groupdict0'

Please what am I missing or doping wrong?

No module named 'ba3l.ingredients'

hi, i want to train the PaSST with Audioset
But when i runed "ex_audioset.py", i faced error: "No module named 'ba3l.ingredients"
I already finished setting up the environment as follow the Readme
how can i fix it

can use on 8k audio ?

FSD50K - validating on eval data

Hi! First off, excellent work with the module. It's showing great results so far in my project.
I'm having trouble, however, with an experiment. I am trying to fine-tune and train the model on subsets (3k samples for training and validating) and have created hdf5 files for that. The paths in config.basedatasets are corrected for this.

The problem that I run into is that when I run the command:
python ex_fsd50k.py evaluate_only with passt_s_swa_p16_s16_128_ap473
the program uses the evaluation data for validation. I confirmed this by making a change in fsd50k/dataset.py:

def __len__(self):
    if self.hdf5_file == "audioset_hdf5s/mp3/FSD50K.eval_mp3.hdf":
        return 300
    return self.length

which affects the number of validation batches.

I really don't understand what is going on. Isn't the model supposed to validate on the validation data?

Kindest regards, Ludvig.

Pre-trained models on ESC-50

Hi Khaled,

I want to use the following checkpoints.

Just to make sure, when you say pre-trained models on ESC-50 in this case, you mean (in chronological order):

Using a model trained on ImageNet
To then train it on Audioset
And later fine-tune on it ESC-50

If so, how can I know which config of default_cfgs in model.py was used for these checkpoints above?

Also, have you pre-trained on all ESC-50 folds at once? During a cross-validation in machine learning with sklearn's GridSearch, the model is ultimately refit on all folds with the best hyperparams config found. Shouldn't we do the same in Deep Learning?

Cheers

Antoine

Error when trying to pip install repo

Hi @kkoutini,

I get an error when running the following line after my conda env creation:

Any idea?

This line was working a couple of months ago when I had created a first environment. But not anymore it seems.

Many thanks

Antoine

Is it possible to use this project directly for a code example for instrument recognition?

Can I output the labels directly with the pretrained model or do I need to do fine-tuning for openmic-2018

is `config.dyn_norm` enabled?

I wasn't able to find this parameter in the config: I self.config.dyn_norm enabled in the mel_forward function?

time_new_pos_embed

Hi Khaled,

I am playing with your code a bit and I struggle to understand these few lines below:

        # Adding Time/Freq information
        if first_RUN: print(" self.time_new_pos_embed.shape", self.time_new_pos_embed.shape)
        time_new_pos_embed = self.time_new_pos_embed
        if x.shape[-1] < time_new_pos_embed.shape[-1]:
            if self.training:
                toffset = torch.randint(1 + time_new_pos_embed.shape[-1] - x.shape[-1], (1,)).item()
                if first_RUN: print(f" CUT with randomoffset={toffset} time_new_pos_embed.shape",
                                    time_new_pos_embed.shape)
                time_new_pos_embed = time_new_pos_embed[:, :, :, toffset:toffset + x.shape[-1]]
            else:
                time_new_pos_embed = time_new_pos_embed[:, :, :, :x.shape[-1]]
            if first_RUN: print(" CUT time_new_pos_embed.shape", time_new_pos_embed.shape)
        else:
            warnings.warn(
                f"the patches shape:{x.shape} are larger than the expected time encodings {time_new_pos_embed.shape}, x will be cut")
            x = x[:, :, :, :time_new_pos_embed.shape[-1]]
        x = x + time_new_pos_embed

Especially the slicing of time_new_pos_embed with toffset. I understand the slicing in the first else and the second else but I don't get why the slicing is randomized at training. If it's a position embedding surely it shouldn't be random right?

Many thanks in advance.

Antoine

audio inference

@kkoutini
Thanks for sharing nice work. I want to know how to read an audio file and do full inference. Can you show me the example? How to do preprocess?

Changing the depth of PASST.

I want to change the depth of the transformer while finetuning the model. I am using the following command (inspired from ESC50) :

python3 ex_dcase.py with models.net.s_patchout_t=10 models.net.s_patchout_f=5 basedataset.fold=1 -p

I have already prepared the ex_dcase.py and dataset.py files for DCASE2020 dataset (inspired from ESC50 file provided by you). I have already been able to finetune the whole model once. Now I want to add a depth parameter to the commandline to run finetune script, so that I can control how many block I want to finetune on the architecture.
Currently I change the depth by changing the depth variable of the desired architecture here.
Suggest the required changes I need to make so that I can execute a command in the commandline and only finetune selective layers.

RuntimeError: stft requires the return_complex parameter be given for real inputs

Hello!
I am using the following code:

from hear21passt.base import get_basic_model,get_model_passt
import torch
# get the PaSST model wrapper, includes Melspectrogram and the default pre-trained transformer
model = get_basic_model(mode="logits")
print(model.mel) # Extracts mel spectrogram from raw waveforms.
print(model.net) # the transformer network.

# example inference
model.eval()
model = model.cuda()
with torch.no_grad():
    # audio_wave has the shape of [batch, seconds*32000] sampling rate is 32k
    # example audio_wave of batch=3 and 10 seconds
    audio = torch.ones((3, 32000 * 10))*0.5
    audio_wave = audio.cuda()
    logits=model(audio_wave)

I am getting the following error:

RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

How can I solve this issue please?
Thank you!

Inference Issue

Hello,

First of all, thank you for the awesome and very well-written paper and repo.

I currently want to use the embedding of these pre-trained models for my project. The following is the inference code I wrote for fsd50k.

import torch
import numpy as np
import librosa
from hear21passt.base import get_basic_model, get_model_passt, get_scene_embeddings, get_timestamp_embeddings, load_model

model = get_basic_model(mode="logits")
model.net = get_model_passt(arch="fsd50k-n",  n_classes=200, fstride=16, tstride=16)
model.eval()
model = model.cuda()

audio, sr = librosa.load("../dataset/fsd50k/mp3/FSD50K.dev_audio/102863.mp3", sr = 32000, mono=True)
audio = torch.from_numpy(np.array([audio]))
audio_batch = torch.cat((audio, audio, audio), 0).cuda()

embed = get_scene_embeddings(audio_batch, model)
model(audio_batch)

When I do embed.shape I get torch.Size([3, 1295]), so I basically get what I need already. But, I double check to try get the logit through model() and it give me the following error:

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_13937/329924078.py in <module>
----> 1 model(audio_batch)

/data/scratch/ngop/.envs/vqgan2/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/data/scratch/ngop/src/hear21passt/hear21passt/wrapper.py in forward(self, x)
     36         specs = self.mel(x)
     37         specs = specs.unsqueeze(1)
---> 38         x, features = self.net(specs)
     39         if self.mode == "all":
     40             embed = torch.cat([x, features], dim=1)

/data/scratch/ngop/.envs/vqgan2/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/data/scratch/ngop/src/hear21passt/hear21passt/models/passt.py in forward(self, x)
    525         if first_RUN: print("x", x.size())
    526 
--> 527         x = self.forward_features(x)
    528 
    529         if self.head_dist is not None:

/data/scratch/ngop/src/hear21passt/hear21passt/models/passt.py in forward_features(self, x)
    472             time_new_pos_embed = time_new_pos_embed[:, :, :, :x.shape[-1]]
    473             if first_RUN: print(" CUT time_new_pos_embed.shape", time_new_pos_embed.shape)
--> 474         x = x + time_new_pos_embed
    475         if first_RUN: print(" self.freq_new_pos_embed.shape", self.freq_new_pos_embed.shape)
    476         x = x + self.freq_new_pos_embed

RuntimeError: The size of tensor a (135) must match the size of tensor b (99) at non-singleton dimension 3

However, I tried a few other audios in fsd50k, and some were able to give me logits and the correct prediction, but some just give errors like this. What could the issues be? Do I need to worry about it, or could I use the embedding? My other question is whether the input batch is fixed? For the model I loaded, I have to input the batch of 3 audio. Is there a way for me to input a different batch?

Binarizing linear predictions

Dear authors,

Thank you for the great work!
I would like to know what is the best way to binarize the linear predicted probabilities in a way that :

0 : audio label is absent
1: audio label is present

If you have any suggestion for binarization issue , it would be great to know it.

And one more question, as I understood from the paper linear probability value for each label shows the presence of that audio label in the input audio and probability value doesn't depend on the duration of audio label happens. I mean if it happens during the very short duration or long duration. Am I right?

Another question is there any possible difference between feeding audio data that is typically 20-90 seconds long (which is not monophonic) vs slicing it in chunks or running second-by-second predictions. I would like to know is it good idea to run second-by-second prediction with Passt?

It would be great for me to get your answers to the above-mentioned questions.

Anar Sultani

ImportError: cannot import name 'F1' from 'torchmetrics' (/app/anaconda3/lib/python3.7/site-packages/torchmetrics/init.py)

python ex_openmic.py
Traceback (most recent call last):
File "ex_openmic.py", line 5, in
from pytorch_lightning.callbacks import ModelCheckpoint
File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/init.py", line 65, in
from pytorch_lightning import metrics
File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/metrics/init.py", line 16, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/metrics/classification/init.py", line 19, in
from pytorch_lightning.metrics.classification.f_beta import F1, FBeta # noqa: F401
File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/metrics/classification/f_beta.py", line 16, in
from torchmetrics import F1 as _F1
ImportError: cannot import name 'F1' from 'torchmetrics' (/app/anaconda3/lib/python3.7/site-packages/torchmetrics/init.py)

envs:
Name: torch
Version: 1.12.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /app/anaconda3/lib/python3.7/site-packages
Requires: typing-extensions
Required-by: torchvision, torchmetrics, torchaudio, timm, test-tube, Ba3l, pytorch-lightning

The meaning of "swa"

When use your code for training model, there is "swa": true in the config file. So, what's the meaning of "swa"?

mismatch version of pytorch-lighting and sarced

Hello,when I after running the code following,

and I run the code

but I encontered the issue:

Is this the wrong version of pytorch-lighting and sarced?When I upgrad the pytorch-lighting to the latest version,the issue is solved but the issue with sacred has not been solved.
Could you please provide some help?Thank you very much!

I have a problem. why convert wav to mp3?

I have a problem. why convert .wav to .mp3 and 32k? And what would happen if converting to 16K and use .wav file?

Is it possible to install the passt with python=3.6?

Hi, thanks so much for sharing the great work! I'd like to use PaSST for downstream tasks and integrate it into existing conda environment with python=3.6 (it 's kind of painful to upgrade python from 3.6 to 3.7/3.8 due to many inconsistent packages). I know that python>=3.7 is required to install PaSST, but I'm wandering if it's possible to install it with python=3.6?

Inference ESC-50 fine-tuned model

Hello, authors.
Thank you for sharing the great work.

I tried to fine-tuned AudioSet pretrained model passt-s-f128-p16-s10-ap.476-swa.pt on ESC-50 dataset by using ex_esc50.py.
I got checkpoints saved in output/esc50/_None/checkpoints/epoch=4-step=2669.ckpt.
I want to load the checkpoint and inference with audio file. I am trying to load the checkpoint model and tried to used passt_hear21 for inference but kinda lost track of the process.

Could you please share how to inference with the saved checkpoints on audio file?

Fixing weights for fine-tuning?

Hi Khaled,

Do you fix weights of embeddings and attention blocks after loading pretrained checkpoints for finetuning, or is it just an initialization and they are further updated through finetuning?
I can't really find the answer in your code.

Many thanks.

From ViT models to audio

Hi Khaled,

In your code, there is the possibility to create a ViT architecture and load the corresponding pretrained weights (like "vit_tiny_patch16_224").

Do we agree that such architectures only work with similar size inputs (224224 for example)? If so, how did you finetune a model on Audioset that was initially trained on Imagenet (going from 224224 to 128*998 for example)? Is this procedure in some code in your repo?

I read the AST paper I guess you took inspiration from and they talk about it in some details.
I was just wondering how I would do the whole process (ImageNet -> AudioSet -> ESC50) on my end.

Thanks a lot.

Antoine

The loop in the diagram

This is an amazing job! But I have a question: what does the loop in the diagram mean? In fact, I didn't find the loop operation in the paper and codes. Thanks!

Changing tdim for pretrained model

Thanks for sharing such great work! I want to use the pre-trained model but changing input_tdim is giving an error. My audio clips are relatively small and hence i need a smaller input_tdim. How do I do that? The error I get is due to the pretrained layer's size not equal to the current size of the model(After using input_tdim)

Inference on AudioSet

Thank you for the code and inference script.
I understand that the PaSST model has been trained on AudioSet with sampling rate of 32kHz.
I am trying to make inference using the pre trained model.
Could you please let me know if I have to retrain the model with AudioSet (sampling rate of 16kHz) data to use it to make inference on 16kHz data or is there any other way?

Also, curious to know why did you use 32kHz instead of already available 16kHz AudioSet data?

Thanks in advance.

kaggle

excuse me,I wonder to know how should I setup PaSST on kaggle?I have tried some times,but I failed

Getting started with a custom dataset

Hi,

Thank you for your great work!

I want to use PaSST for my custom dataset, different classification task.

Are there any minimal instructions/code for running the model for a different dataset? From which file should I start?

Does PaSST support multi-channel audio wav?

Best

difference of fine-tuning the pretrained models

I'm sorry to bother you. I want to ask the difference between the two ways to get pre-training models. I don't know if I understand correctly
The first is in the ''Getting a pre-trained model for fine tuning'' part. The code is

from hear21passt.base import get_basic_model,get_model_passt
import torch
# get the PaSST model wrapper, includes Melspectrogram and the default pre-trained transformer
model = get_basic_model(mode="logits")
print(model.mel) # Extracts mel spectrogram from raw waveforms.

# optional replace the transformer with one that has the required number of classes i.e. 50
model.net = get_model_passt(arch="passt_s_swa_p16_128_ap476",  n_classes=50)
print(model.net) # the transformer network.


# now model contains mel + the transformer pre-trained model ready to be fine tuned.
# It's still expecting input of the shape [batch, seconds*32000] sampling rate is 32k

model.train()
model = model.cuda()

The second is in the ''Pre-trained models'' part.

from models.passt import get_model
model  = get_model(arch="passt_s_swa_p16_128_ap476", pretrained=True, n_classes=527, in_channels=1,
                   fstride=10, tstride=10,input_fdim=128, input_tdim=998,
                   u_patchout=0, s_patchout_t=40, s_patchout_f=4)

I have two questions. Is it the first way to obtain the pre-trained model and only fine-tune the layers in transformer blocks related to num_classes? The other layers' weights will not changed?
And Is it the second way to obtain the pre-trained model will load weights of all layers and train them again ? Or are the two ways the same?

kkoutini / passt Goto Github PK

passt's Introduction

al-folio

User community

Lighthouse PageSpeed Insights

Table Of Contents

Getting started

Installation

Local setup using Docker (Recommended on Windows)

Local Setup (Standard)

Deployment

Upgrading from a previous version

FAQ

Features

Publications

Collections

Layouts

The iconic style of Distill

Full support for math & code

Photos

Other features

GitHub's repositories and user stats

Theming

Social media previews

Atom (RSS-like) Feed

Related posts

Contributing

Maintainers

License

passt's People

Contributors

Stargazers

Watchers

Forkers

passt's Issues

Recommend Projects

Recommend Topics

Recommend Org