Giter VIP home page Giter VIP logo

harmony's Introduction

Harmony

Music Metadata Aggregator and MusicBrainz Importer

Features

  • Lookup of release metadata from multiple sources by URL and/or GTIN
  • Metadata providers convert source data into a common, harmonized representation
  • Additional sources can be supported by adding more provider implementations
  • Merging of harmonized metadata from your preferred providers
  • Seeding of MusicBrainz releases using the merged metadata
  • Resolving of external entity identifiers to MBIDs
  • Automatic guessing of title language and script
  • Permalinks which load snapshots of the originally queried source data

Usage

Most modules of this TypeScript codebase use web standards and should be able to run in modern browsers and other JavaScript runtimes. Only the Fresh server app and the CLI were written specifically for Deno.

The following instructions assume that you have the latest Deno version installed.

You can start a local development server with the following command:

deno task dev

You can now open the logged URL in your browser to view the landing page. Try doing some code changes and see how the page automatically reloads.

For a production server you should set the PORT environment variable to your preferred port and DENO_DEPLOYMENT_ID to the current git revision (commit hash or tag name). Alternatively you can run the predefined task which automatically sets DENO_DEPLOYMENT_ID and runs server/main.ts with all permissions:

deno task server

Other environment variables which are used by the server are documented in the configuration module.

There is also a small command line app which can be used for testing:

deno task cli

Architecture

The entire code is written in TypeScript, the components of the web interface additionally use JSX syntax.

A brief explanation of the directory structure should give you a basic idea how Harmony is working:

  • harmonizer/: Harmonized source data representation and algorithms
    • types.ts: Type definitions of harmonized releases (and other entities)
    • merge.ts: Merge algorithm for harmonized releases (from multiple sources)
  • providers/: Metadata provider implementations, one per subfolder
    • base.ts: Abstract base classes from which all providers inherit
    • registry.ts: Registry which manages all supported providers, instantiated in mod.ts
  • lookup.ts: Combined release lookup which accepts GTIN, URLs and/or IDs for any supported provider from the registry
  • musicbrainz/: MusicBrainz specific code
    • seeding.ts: Release editor seeding
    • mbid_mapping.ts: Resolving of external IDs/URLs to MBIDs
  • server/: Web app to lookup releases and import them into MusicBrainz
    • routes/: Request handlers of the Fresh server (file-based routing)
    • static/: Static files which will be served
    • components/: Static Preact components which will be rendered as HTML by the server
    • islands/: Dynamic Preact components which will be re-rendered by the client
  • utils/: Various utility functions

Let us see what happens if someone looks up a release using the website:

  1. The Fresh app handles the request to the /release route in server/routes/release.tsx.
  2. A combined release lookup is initiated, which finds the matching provider(s) in the registry and calls their release lookup methods.
  3. Each requested provider fetches the release data and converts it into a harmonized release.
  4. Once all requested providers have been looked up, the individual release are combined into one release using the merge algorithm.
  5. The route handler calls the MBID mapper, handles errors and renders the release page, including a hidden release seeder form.
  6. In order to create the release seed, the harmonized release is converted into the format expected by MusicBrainz where some data can only be put into the annotation.

All requests which are initiated by a provider will be cached by the base class using snap_storage (persisted in snaps.db and a snaps/ folder). Each snapshot contains the response body and can be accessed by request URL and a timestamp condition. This allows edit notes to contain permalinks which encode a timestamp and the necessary info to initiate the same lookup again, now with the underlying requests being cached.

Contributing

Your contributions are welcome, be it code, documentation or feedback.

If you want to contribute a bigger feature, please open a discussion first to be sure that your idea will be accepted.

Before submitting your changes, please make sure that they are properly formatted and pass the linting rules and type checking:

deno fmt --check
deno lint
deno task check

There is also a Deno task which combines the previous commands:

deno task ok

harmony's People

Contributors

aerozol avatar atj avatar kellnerd avatar monkeydo avatar mwiencek avatar phw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

harmony's Issues

Allow to use standard provider when linking to Harmony with URL parameter

For integration with external tools it would be convenient to be able to easily link to Harmony with a URL or GTIN. Links like this should be supported:

With URL: https://harmony.pulsewidth.org.uk/release?url=https%3A%2F%2Fgoatgirl.bandcamp.com%2Falbum%2Fbelow-the-waste

With GTIN: https://harmony.pulsewidth.org.uk/release?gtin=191402047554

There are two separate behaviors:

  1. Linking with URL works and triggers a lookup, but all default providers are disabled. To enable a provider an explicit parameter for this provider has to be passed, e.g. deezer=. This is inconvenient for third-party tools, as they need to hardcode all possible providers. It would be better if the default providers would be used instead.
  2. The GTIN link behaves differently. It does not automatically trigger a lookup. But instead it just shows the form and has the default providers set properly.

The second case is more convenient and probably intentional. Having case 1. with url parameter behave the same would likely solve the issue.

Not sure whether both cases should trigger an automatic lookup.

Add an `.env.example` file

One thing missing from the readme is how should the .envbe structured for usage. This could be done using a file in the root of the repository containing a proper .env file with dummy data

add a button to clear the release lookup fields

when I'm adding multiple releases, I find it easiest to have my importer in one window and the artists' page in another, so I can just click and drag a link when moving to the next release. with how Harmony currently works, I've got to highlight the whole field and backspace before I can do this

an alternate option would be to clear the provider and GTIN fields at the top after looking up a release, but there might be a reason to show that even after the lookup. perhaps a second "new lookup" set of fields could work too? I'm up for any solutions~

Yandex Music

(as suggested in #5)

Yandex Music is a Russian music streaming service developed by Yandex. Users select musical compositions, albums, collections of musical tracks to stream to their device on demand and receive personalized recommendations. The service is also available as web browser. Service is available in Armenia, Azerbaijan, Belarus, Georgia, Israel, Kazakhstan, Kyrgyzstan, Moldova, Russia, Tajikistan, Turkmenistan and Uzbekistan. Subscription can only be paid from supported countries above, but the service is then available in all other countries. (wiki)

Example of an album: https://music.yandex.ru/album/12353342
Open JSON API: https://api.music.yandex.net/albums/12353342/ (or https://api.music.yandex.net/albums/12353342/with-tracks for additional info on tracks from album, such as the distributor of release). VPN might be needed to open those (mirror for the "with-tracks" response: https://www.jsonkeeper.com/b/YKSE)

The API does not support neither GTAN nor ISRC. Also, the "label" section of response takes the info from the P-line of release and in most cases would remove words "Productions", "Music", "Publishing" and etc., as well as split one label onto multiple ones if there's a slash in its name (like here).

API supports showing whether it's an album, single, podcast or an audiobook (since they all have a link of https://music.yandex.ru/album/album_id).

There's also an unofficial implantation of an API at https://github.com/MarshalX/yandex-music-api/releases but token needed to use it

Metadata Providers

List of sources/websites for which a metadata provider has been implemented or requested.

Leave a comment which includes at least a link to an example release for a quick provider request, or create a separate issue which is named after the provider and labeled with provider Metadata provider if you have more research to share.
Detailed requests for sources with an open API and good documentation are more likely to be implemented.
Edit: Let's be honest, every request which contains more details than just the name or URL of the source is probably worth its own issue already.

If you plan to work on a provider, be it doing more research or actually implementing it, please create a separate issue which is named after the provider and ask for it being assigned to you.

Open JSON API

  • #49
  • Deezer (GTIN, ISRC)
  • iTunes (per region; no GTIN)
  • Discogs (GTIN)
  • YouTube (GTIN) (#31)

Restricted JSON API

  • Spotify (requires token; GTIN, ISRC) (#16)
  • Tidal (requires token, per region; GTIN, ISRC) (#11)
  • Qobuz (documentation, requires app ID, IP-based 404s for unavailable regions; GTIN, ISRC)
  • Apple Music (requires paid token; GTIN, ISRC)
  • Beatport (requires token; GTIN, ISRC)
  • Soundcloud (requires authentication; GTIN, ISRC) (#35)
  • #12 (only accessible in some regions)
  • #14 (requires authentication)

Tokens and other secrets should not be included in this repository but loaded from environment variables.

HTML Scraping

  • Bandcamp (embedded JSON; GTIN)
  • Beatport (embedded JSON; GTIN, ISRC) (#2)
  • OTOTOY (#34)

Uncategorized Requests

Add support for d.ontun.es to submit ISRCs

Currently Harmony gives you the option to use MagicISRC to submit ISRCs but MagicISRC doesn't submit any edit note saying where those ISRCs come from. Knowing what release ISRCs were added from is very important, so it'd be nice if we also had the option to use d.ontun.es to submit ISRCs as it includes an edit note saying what release the ISRCs came from.

Deezer's API returns alternative release with different GTIN and availability

Lookup using all providers

Providers have returned multiple different GTIN: 794558049368 (Spotify, iTunes, Tidal), 197189103209 (Deezer)

Lookup without Deezer returns a release which is only available outside of Europe

Lookup using only Deezer returns a release which is only available in Europe with a different barcode

Does this happen because Deezer does not have the other release or because their API is queried from a European server?

We should probably adapt the Deezer provider to return an error message which contains the correct GTIN to lookup instead of returning the result.

GTINs with similar issues:

  • 00602527812519 returns a different Deezer release which is available in 0 regions
  • 00602527650579 even returns two other barcodes, one for Deezer (see case above), and one for iTunes (known issue)
  • 5099908794857: US vs outside of US (Deezer) releases

better handling of feat. artists

featured artists are handled very inconsistently across the various platforms, with Spotify removing feats and putting them in the artist field, Deezer keeping feat in the title and the artist field, and Apple Music only keeping feats in the track title. I think if a service has a featured artist, this should be reflected in the harmonized data, both on the track level and potentially on the release level (if all tracks have the same feat, especially for singles)

here's a decent cross section of the variants on this release

image

SoundCloud

I know it's mentioned in #5, but I figured I'd start up a ticket with a link to the API docs at least~

https://developers.soundcloud.com/docs

a couple notes about SoundCloud:

  • some title cleanup might be needed? especially stuff like "[FREE DL IN DESCRIPTION]", which is quite common
  • GTIN and ISRC are optional metadata
  • playlists and albums are both implemented as "sets", and perhaps both should be importable? I know a lot of artists don't properly set the type for what's pretty clearly an album (even Taylor Swift has some such examples)
  • Creative Commons Licenses should be detected and added when present
  • track downloads are optional and should be detected (sometimes with a Buy link that goes to a file hosting service like MediaFire or MEGA, or with a DL LINK IN DESCRIPTION)

No support for geo.music.apple.com links

When trying to put a geo.music.apple.com link, Harmony displays an error:
No provider supports https://geo.music.apple.com/XX/album/_/1234567890?mt=1&app=music&ls=1&at=1000lHKX
where XX is region code (e.g. US), and 1234567890 is the album's ID

Optimize SQLite DB to improve cache performance

The amount of cached data is growing and the underlying SQLite database of my SnapStorage library is likely becoming a bottleneck. First observed 3 weeks ago:

What is concerning me more are the processing times, each provider takes about 1000 ms while these durations should go down to about 10-20 ms for cached results… This generally seems to happen for all permalinks right now, but most are “only” taking 300-400 ms to process, I will investigate.
Edit: Restarting the app did not help unfortunately. Using my local server brings the processing times down to about 20 ms once the API results are cached. The main difference is that my local server’s snapshot directory only contains 25M of data (400k sqlite DB) while the pulsewidth server uses 150M already (3M sqlite DB), but the performance can’t scale that badly!?

Latest numbers: 476M of data (compressed), 9.7M / 18.3M SQLite DB (compressed / uncompressed), 79k rows in uri, 82k rows in snap

See kellnerd/snap_storage#2 for details.

avoid using Various Artists when not all providers use it

I just recently imported a single (permalink) with release 4 artists and found that Spotify gave the release artist as "Various Artists", yet all other providers listed out all the artists.

image

I seem to remember seeing somewhere that Spotify does this, but I can't say for certain if they do this in general or not. either way, I think Harmony should err on the side of crediting the artists listed rather than Various Artists, at least in cases like this

Roadmap

This is just a loosely ordered list of things I already have on my radar, to be cleaned up later™️.

Harmonized Data

  • Merge missing properties into the preferred provider's data
  • Check for conflicting properties during merge
    • Warning for duration
    • Error for GTIN
    • Error for incompatible medium track counts
    • Merge tracks if the total track count matches (1 medium vs N media)
    • Skip missing tracklist! (by far the most common error in the test phase logs)
    • Merge empty medium into medium with tracks
  • Release date quality ranking, plausibility checks for each provider (new attribute "date.warning"), merge strategy "prefer latest"
  • Generally warn about pre-release data
  • Guess featured artists from titles (#39)
  • Copyright notices
  • Explicitness of tracks
  • Optional title cleanup (numeric prefix, ETI style etc.)
    • Deezer allows crediting the same artist multiple times
  • Customizable search & replace rules
  • Audiobook / audio drama mode?
  • Preserve catalog numbers
  • Improve language detection (skip too short inputs, try alternatives?)

Providers

  • iTunes: Ensure that collection.trackCount equals the number of returned tracks
  • iTunes: Warn about responses which contain multiple release variants for an UPC
  • iTunes: Country list for optional region lookups
  • iTunes: Use region from URL
  • iTunes: Try to use artist (or ISRC?) region for canonical (region-specific) URL
  • iTunes: Try next region instead of throwing if JSON parsing fails (see screenshot)
  • Spotify: Pad GTIN with zeros if no results are found (example)
  • Deezer: Truncate padded GTIN
  • Bandcamp: Try band URL as label URL
  • Bandcamp: Add untitled hidden tracks, only their count is available as OG meta header
    • Extract more than just trAlbum into snapshots -> wrapper object, avoid deserializing JSON
  • Bandcamp: Try band URL as label URL, extract label from packages
  • Bandcamp: Check whether band is part of the release artist before using it as label
  • iTunes: Warn about missing tracklist
  • Bandcamp: VA releases
  • Bandcamp: /track URLs (#7)
  • Bandcamp: Extract release ISRCs from all /track URLs (expensive, only if there is no better source)
  • iTunes: Drop " - Single" from title (#9)
  • Bandcamp: Extract track images, from embedded player (only for pre-releases so far)
  • #46
  • iTunes: Show not only the URL with the last region when all lookup attempts failed
  • Bandcamp: no tracks https://2nxmusic.bandcamp.com/album/stolen-lullabies
  • Bandcamp: custom domains (#8)
  • Deezer: API sometimes returns too many tracks: https://musicbrainz.org/edit/112474481 or https://www.deezer.com/album/303245

MusicBrainz

  • Suggest existing release group
  • Find release group or similar releases, reuse recordings? Similar query as MB duplicates tab?
  • Resolve external links to MBIDs
    • Don't resolve ambiguous URLs to MBIDs
    • Allow two URL rels if for the same target entity (e.g. download and streaming)
    • Cache pending requests in a map, parallel resolving of all identifiers
    • Use resolved MBID of release artist for unresolved but identically named track artists
    • Combine release and track artists which share identifiers or names to avoid inconsistent results in edge cases (#54)
  • Guess release group types (#15)
  • Create edit note
    • Add permalink / homepage / repository URL (and version?)
  • Optionally fill the annotation with additional data (make the sections configurable)
    • Copyright notice
    • Availability
    • Release and track level credits (text only so far)
    • Explicitness (show, but do not seed for tracks; add to release disambiguation?)
  • Detect European releases (special country XE)
  • Target seeder at existing release
  • Use ampersand for last joinphrase by default
  • Support track URLs for other providers and suggest to look their release up

Infrastructure

  • URL lookup -> GTIN -> parallel GTIN lookups
  • Support provider-specific messages
  • Return all provider error messages if no lookup was successful
  • Allow to choose and exclude providers (Provider preferences)
  • Allow providers to return multiple releases, i.e. different variants (e.g. for Bandcamp)
  • Cache management: https://github.com/kellnerd/snap_storage
    • Invalidation strategy: FIFO/LRU/TTL? Maximum age (optional)
    • In memory and/or long-time cache? JSON files with compression or Redis?
    • Cache multiple versions with timestamps (daily? only if there have been changes?)
    • Let the requester know how old the data is and whether it is from the cache
    • Permalinks to specific cached version (include GTIN, enabled providers, optional additional URLs or ProviderName=ProviderId pairs)
  • Optimize lookups (perform no GTIN lookup if ID was already looked up)
    • These repeated lookups also skew the calculated processing time for the initial provider (e.g. Deezer track requests are now cached)
  • Use as few requests as possible (only make additional API calls for a provider if data is missing, e.g. iTunes regions or Deezer ISRCs)
  • Lookup by metadata (label and catno, title, artist, track count etc.) for providers without GTIN
  • Create provider feature categories (e.g. streaming, physical, with GTIN/ISRC, GTIN lookup, scraper, audio drama, Japanese etc.)
  • Lookup the entire discography of a given artist/label
  • Make MusicBrainz base url configurable (environment variable)
  • Deduplicate lookup ReleaseOptions.regions option by using an ordered set
  • Manage lookup state: Each provider "Example" is split into two classes ExampleProvider and ExampleReleaseLookup, where ExampleReleaseLookup has a (readonly) property provider
    • Splits general request logic and release processing logic
    • Possible to store release lookup state as class properties
    • Separation of unrelated tasks once we add artist/label lookups later, e.g. as ExampleArtistLookup and ExampleLabelLookup
  • Warn that available regions may not be accurate before the release date has passed (anywhere on earth, UTC-12)
  • Extract provider URLs from link shortener pages
  • Extract provider IDs and GTIN from a-tisket URLs
  • Write more test cases...
  • Preserve URL blurb (for Beatport)
  • Improve logging of AggregateErrors, they make it a PITA to find the real issue

Web Interface

  • Display header with logo and description
    • Harmony: Music Metadata Aggregator and MusicBrainz Importer/Seeder
    • Design banner logo and icon
  • Display footer with version, repo URL and support URL (environment variables DENO_DEPLOYMENT_ID, REPO_BASE_URL, optional COMMIT_BASE_URL, SUPPORT_URL)
  • Add OpenGraph meta tags
  • Allow to choose and exclude providers (persistent provider checkboxes)
  • Persist preferred regions input
  • Show provider and alternative values for interesting properties
    • Improve track length comparison, Deezer truncates instead of rounding
  • Settings page/section with persisted checkboxes
  • Multiple URL inputs (dynamic form)
  • Provider URL detection on the frontend (URLPattern polyfill for Firefox and Safari? https://caniuse.com/mdn-api_urlpattern)
  • CSS
  • Provider icons (external links or data URIs? inline TSX SVG? SVG sprite built with TSX)
  • Post-submission route/page ("release actions"):
    • ISRC submission (kepstin/tatsumo/custom?)
    • Artwork (ECAU)
    • External links (for artists, maybe for labels?) (#33)
  • Dynamic region list display: count, compact flags, detailed list
  • Group regions by continent
  • Serve documentation, written in Markdown
  • Use HTTPS by passing key and cert options to start() Support X-Forwarded-Proto proxy header
  • Trim GTIN input to avoid unnecessary errors

seed track URLs to MusicBrainz recordings

mostly for individual track pages, but I believe Bandcamp can have different licenses per track (tho I don't know if that'd be a recording or work URL...), for example

Support release group types

It would be good if providers could set the primary type and if this would be seeded when submitting to MB.

Not all providers will support this, but it is sometimes possible to at least detect singles and EPs. If in doubt a provider should likely keep this field empty.

Some notes on specific implementations:

  1. iTunes: No specific support, but the suffixes - Single and - EP seem to be commonly added to singles / EPs. These should be stripped (see #9) and then can be used for seeding the primary type as well. a-tisket does this.
  2. Spotify: Releases have the field album_type, which is one of album, single or compilation. Maybe it is too broad to use the album type (better leave it empty and have the user decide), but single and compilation should be fine to use.
  3. Bandcamp: At least standalone tracks could be detected as "Single".
  4. Tidal: no types specified
  5. Guessing the release type from title might work in many cases. Most of the MB submission user scripts do this, see https://github.com/murdos/musicbrainz-userscripts/blob/master/lib/mbimport.js#L302-L325
  6. For any provider making use of the MusicAlbum schema there is a MusicAlbumReleaseType. Theoretically this supports the types AlbumRelease, BroadcastRelease, EPRelease and SingleRelease. But e.g. Bandcamp does not make full use of this and seems to use AlbumRelease generally, except for standalone tracks it uses SingleRelease.

Generally it seems that if specific types, in particular single or EP, are detectable, this could be seeded. In most cases a source type of "album", if given, might be too unspecific and better kept out.

In the release editor the primary type can be seeded using the field type.

Artist link apple/itunes. Difference?

Starting with https://www.deezer.com/fr/album/10882160 and harmony gives
https://music.apple.com/gb/artist/505840851

This leads to MB not autodetecting the service:
Bildschirmfoto zu 2024-06-09 14-46-11

Correct for autodetection would be https://itunes.apple.com/gb/artist/id505840851
I don't know if these are two separated services or just URL redundancy for itunes. If it's the same service, changing the output URL via harmony should easily fix it or is there some technical reason against?

For now I'll stick with the itunes link :)
https://musicbrainz.org/artist/2e21383f-f71e-4367-bfa8-5a02c74643a8

Support setting CC license for Bandcamp provider

Bandcamp contains many Creative Commons licensed releases, see for example https://aeonsable.bandcamp.com/album/aenigma-2023

The Bandcamp importer user script supports reading the license information and sets the license URL when seeding, see https://github.com/murdos/musicbrainz-userscripts/blob/master/bandcamp_importer.user.js#L188-L195

Similar could be done in the Bandcamp provider.

See https://community.metabrainz.org/t/harmony-music-metadata-aggregator-and-musicbrainz-importer/698641/12

Barcode collision

Sometimes barcodes are not as unique as they should be...

635669065024 returns two different releases (different artists, but same label) with the iTunes, Spotify and Tidal providers. Deezer's API only returns one of them (YBC III), for the others it seems to be random which one is the first result that gets returned.

The iTunes provider at least warns about this, the other providers currently ignore this issue silently.

Select the release with the matching GTIN if iTunes API returns multiple

https://harmony.pulsewidth.org.uk/release?gtin=197875266348&itunes=&region=GB&ts=1717477988

iTunes: The API also returned 1 other result, which was skipped: https://music.apple.com/gb/album/1702051779

The other result would have been the correct one with GTIN 197875266348.

iTunes: Extracted GTIN 197985529395 (from artwork URL) does not match the looked up value 197875266348

In this case, both image URLs contain the corresponding barcode, but this is not always the case unfortunately:

https://harmony.pulsewidth.org.uk/release?gtin=882951718827&itunes=&region=GB&ts=1717495471

iTunes: The API also returned 1 other result, which was skipped: https://music.apple.com/gb/album/600624295

That would've been the correct result 🫤

Another example where GTIN would help: https://harmony.pulsewidth.org.uk/release?gtin=822603266801&itunes=&region=GB&ts=1717435544

Spotify provider

Implement a Spotify provider based on the Spotify Web API.

Implementation notes:

  • General API access should be very similar to the Tidal provider, including the client credentials auth flow.
  • Individual regions don't seem to be queried separately. Instead the API when queried without a "market" set returns a list of all markets the release is available on.
  • Primary type (#15) could be supported, at least for single and compilation types.
  • Fetching the full track list for an album can involve multiple calls, the initial result from the album request only contains the first page of tracks.
  • ISRCs are available, but seem to require a separate call, as the track info being returned as part of the album exclude this data.
  • Pad GTIN with zeros if no results are found (example). a-tisket already does this. See also #6
  • Spotify has a concept of Track Relinking, where tracks not being available in a specific market get swapped out with a similar track that is available. Not sure about the implications, we'll need some examples for this. Might be that if the data gets queried without region that the relinking is not indicated. If this can be detected it would at least be good to show a warning.
  • Copyright information gets returned with separate entries for © and ℗. Because this is clearly separated the entries not always contain the corresponding symbol. If we just import the text the entries cannot be distinguished. The provider should add the symbols based on type if not present in the given text.
  • Similar to Deezer label info is a single text that sometimes contains multiple labels separated by /.

Related to #5

Providers using an OAuth token should try to refresh the token on 401 responses

Providers using OAuth tokens (currently Tidal and Spotify) persist the token for the token lifetime, then do a refresh. This is usually working fine. But should the token become invalid for any reason on the server side this will block any requests until the currently stored token is expired.

It would be better if the providers would attempt to refresh the token if they get a 401 Unauthorized status response and retry the current request once. Only if it also fails with a new token raise the error exception.

Handle incomplete releases

For some releases (pre-releases?), Tidal's API does not return all tracks:

The missing tracks are not shown on tidal.com/browse/album pages at all, on listen.tidal.com pages they are displayed greyed out.

Since the API returns at least the correct track count we could try to fill the tracklist (for single medium releases) with [unknown] tracks to allow for these releases being combined with other sources which have the track titles and lengths.

Some digital releases reuse the physical release's GTIN

Originally reported on the forums:

It seems for Bandcamp Harmony is lacking the check if a barcode is used for another edition like the userscript does:

https://harmony.pulsewidth.org.uk/release?bandcamp=consvmer%2Fseelenfrieden&ts=1718342804

According to the listing at Apple Music it should be 3617389461901

I would say this is a data error and it should be sufficient to unset the digital release GTIN only in case of a reused GTIN. If it is different from all physical release GTINs on the Bandcamp page (or when there are no physical packages) it should still be fine to use it.

Copy to clipboard buttons

These buttons to quickly copy data have been requested for (unavailable) region lists and external links, but they also make sense for other data:

  • Region lists
  • External links
  • Annotation (or its individual sections: copyright, credits, availability...)
  • Tracklist (for the MB track parser)

Tidal: support video releases

Tidal also provides videos as separate entities. They come with title, cover image, duration, release date, ISRC copyright info. Seems to be well suited to be added as releases on their own.

Examples:

API provides the /videos/{id} endpoint, see https://developer.tidal.com/reference/web-api?spec=catalogue&ref=get-video .

Example response for https://tidal.com/browse/video/358461354

{
  "resource": {
    "artifactType": "video",
    "id": "358461354",
    "title": "My Boy Only Breaks His Favorite Toys (Lyric Video)",
    "image": [
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/1024x256.jpg",
        "width": 1024,
        "height": 256
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/1080x720.jpg",
        "width": 1080,
        "height": 720
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/160x107.jpg",
        "width": 160,
        "height": 107
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/160x160.jpg",
        "width": 160,
        "height": 160
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/320x214.jpg",
        "width": 320,
        "height": 214
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/320x320.jpg",
        "width": 320,
        "height": 320
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/480x480.jpg",
        "width": 480,
        "height": 480
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/640x428.jpg",
        "width": 640,
        "height": 428
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/750x500.jpg",
        "width": 750,
        "height": 500
      },
      {
        "url": "https://resources.tidal.com/images/931df7cf/57ce/47f8/9a6e/c7cea3e19287/750x750.jpg",
        "width": 750,
        "height": 750
      }
    ],
    "releaseDate": "2024-04-19",
    "artists": [
      {
        "id": "3557299",
        "name": "Taylor Swift",
        "picture": [
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/1024x256.jpg",
            "width": 1024,
            "height": 256
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/1080x720.jpg",
            "width": 1080,
            "height": 720
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/160x107.jpg",
            "width": 160,
            "height": 107
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/160x160.jpg",
            "width": 160,
            "height": 160
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/320x214.jpg",
            "width": 320,
            "height": 214
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/320x320.jpg",
            "width": 320,
            "height": 320
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/480x480.jpg",
            "width": 480,
            "height": 480
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/640x428.jpg",
            "width": 640,
            "height": 428
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/750x500.jpg",
            "width": 750,
            "height": 500
          },
          {
            "url": "https://resources.tidal.com/images/03a7ff5b/e309/4c66/9df7/d469d8049c3d/750x750.jpg",
            "width": 750,
            "height": 750
          }
        ],
        "main": true
      }
    ],
    "duration": 208,
    "trackNumber": 0,
    "volumeNumber": 0,
    "isrc": "USUMV2400558",
    "copyright": "© 2024 Taylor Swift",
    "properties": {},
    "tidalUrl": "https://tidal.com/browse/video/358461354"
  }
}

seed artist URLs to MB artist

one feature I miss from a-tisket is how it can seed an edit to add artist URLs from the services it supports to the MusicBrainz artist

Beatport: Warn about catalog numbers which look like GTINs

While Beatport is one of the few shops providing catalog numbers they aren't always the actual catalog number.
For example, Cold Transmission Music has catalog numbers in the form of CT\d+ but on Beatport it's the GTIN for some reason.
Maybe have a suspicious catalog number warning and tell the user to do some research?

Scanner also has a known catalog number format. Beatport is wrong.
No idea how to detect that one though. It looks kinda derived from the GTIN?

Normalize and merge copyright lines

Continuing the discussion from #22 (comment)

We can factor out the copyright normalization logic and reuse it for other providers, e.g. as suggested for Tidal in https://community.metabrainz.org/t/harmony-music-metadata-aggregator-and-musicbrainz-importer/698641/15
But I'd do this after merging this PR. It also needs some further research. I know Tidal includes the copyright text both with and without the © symbol. What I'm unsure is whether this strictly contains copyright © info, or whether it also can sometimes contain phonographic copyright ℗ info. Spotify has those separated, which makes it easier.

I fully agree, this is enough for its own PR and it needs more research.
Tidal also has a copyright property at the track level by the way, this should also be considered if it is different from the release level coypright. So far they were identical for the releases which I have checked, maybe a compilation has different values there.

For starters I have a commit in the dev branch which displays the alternative copyright values.
When we have more examples we can decide how the release merge algorithm should handle these, one possibility would be to keep all and deduplicate them.

YouTube

continuing from discussion here.

so, after a very brief search, it seems there's no official YouTube Music API, only one for YouTube (and a few unofficial ones for YouTube Music)

a few items to be aware of specific to YouTube with examples where applicable:

  • metadata in the description is not at all standardized, perhaps save for distributed content (i.e. from distributors like DistroKid). this is probably only an issue if we want to eventually add relationships with Harmony (once that's possible of course)
  • a single video can have different titles, descriptions, and possibly different audio tracks per region
  • often an album will be released as a single video with chapter markers for each track (I believe these come from the video description)
  • video titles are not standardized, sometimes containing the artist name or [official video] or other nonsense, which maybe should or shouldn't be removed from MusicBrainz submissions? could probably add release disambiguations based on these (such as Visualizer, Lyric Video, Music Video, etc.)
  • there are categories for YouTube videos (including Music and Entertainment, to name a couple). I don't know if those would be important to use, as some "music" releases might be non-music (i.e. about Music Production or about Music), and some music videos might not be categorized as such (as well as some podcasts and other items people might want to import). I don't think these categories are visible on the video pages, but might be in the API

Improve artist matching and combining

  • Match individual artists and not full artist credit arrays
  • Match release and track artists in merge algorithm
  • Combine identifiers of all matched artists

This should solve multiple problems with consistent display of artists as linked entities and MBID resolving.

OTOTOY

I honestly couldn't find any API for OTOTOY, but since it is a Japanese store, most of the help pages aren't in English, so there might be.

https://ototoy.jp

that said, perhaps it could be scraped for data, especially since it's one of the few stores I know that shows catalog numbers (for example, here).

important note, OTOTOY does keep seperate pages for Lossless and High-Resolution releases, which would be the same MusicBrainz release (all other data being the same, of course)

SyntaxError: Unexpected end of JSON input

The Tidal API seems to return invalid JSON occasionally.

Unfortunately the logs don't contain the release lookup URL if only a single provider failed (error is currently handled in a place where we don't have access to that information), but here are some extracts of log lines which are likely correlated and might give a hint (although a Tidal lookup for that barcode works for me currently):

Jun 23 03:37:21 harmony deno[103430]: harmony.lookup [INFO] Beatport: Search returned no matching results
Jun 23 03:37:23 harmony deno[103430]: harmony.lookup [INFO] iTunes: API returned no results: https://itunes.apple.com/lookup?entity=song&limit=200&upc=5063381083869&country=jp
Jun 23 03:37:23 harmony deno[103430]: harmony.lookup [ERROR] SyntaxError: Unexpected end of JSON input
Jun 23 03:37:23 harmony deno[103430]:     at parse (<anonymous>)
Jun 23 03:37:23 harmony deno[103430]:     at packageData (ext:deno_fetch/22_body.js:370:14)
Jun 23 03:37:23 harmony deno[103430]:     at consumeBody (ext:deno_fetch/22_body.js:247:12)
Jun 23 03:37:23 harmony deno[103430]:     at eventLoopTick (ext:core/01_core.js:168:7)
Jun 23 03:37:23 harmony deno[103430]:     at async TidalProvider.query (file:///home/harmony/harmony/providers/Tidal/mod.ts:99:21)
Jun 23 03:37:23 harmony deno[103430]:     at async TidalReleaseLookup.getRawTracklist (file:///home/harmony/harmony/providers/Tidal/mod.ts:191:66)
Jun 23 03:37:23 harmony deno[103430]:     at async TidalReleaseLookup.convertRawRelease (file:///home/harmony/harmony/providers/Tidal/mod.ts:209:24)
Jun 23 03:37:23 harmony deno[103430]:     at async TidalReleaseLookup.getRelease (file:///home/harmony/harmony/providers/base.ts:290:19)
Jun 23 03:37:23 harmony deno[103430]:     at async Function.allSettled (<anonymous>)
Jun 23 03:37:23 harmony deno[103430]:     at async CombinedReleaseLookup.getProviderReleaseMapping (file:///home/harmony/harmony/lookup.ts:184:26)
Jun 23 03:37:23 harmony deno[103430]: harmony.lookup [INFO] Beatport: Search returned no matching results

If someone wants to work on this and needs more log samples, just let me know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.