johnwmillr / lyricsgenius Goto Github PK

View Code? Open in Web Editor NEW

887.0 887.0 158.0 1010 KB

Download song lyrics and metadata from Genius.com 🎶🎤

Home Page: http://www.johnwmillr.com/scraping-genius-lyrics/

License: MIT License

Python 100.00%

download-lyrics genius-api genius-lyrics lyrics python scraping-lyrics song-lyrics

lyricsgenius's People

Contributors

Stargazers

Watchers

Forkers

kyu sedce rossgoodwin laurolima devuri clarkevandenhoven hugo-nattagh chug2k ievsantillan npmccord jakestanger nickreiher zbanov rickyrajinder nsafai jmstevens dinoleal passabilities vitominheere victoriahuynh ben-schindler iphilipp hotgiardiniera wetherbyg alichass notheryne j-asonwang schoothuang jiafi excursus bacaron crazycrud jessestuart ludehon rezakhn zhanglipku floese kazikame orioncrocker danielcliu bamotav ompetta darreldonald adamspannbauer ballin2much boconne3 johngarrett dmlunde desolovev ray-hackshaw practicalpenguin 5l1v3r1 palstatt sameer25-py gelbpunkt zar92 mwormely patrickbutler yujing1997 denueg wambuidenis gerardinho10 usteve sneha161 luismi74 eeishaan lu7sodaa jdhazard dapeng2018 yuvrajraghuvanshis h-y-b-o jordanpcf simonbhatta4ya steffo99 allerter eflarus somehume aphidian hanna-freuden aviaefrat sellclectic myahmao bebemoon apangasa theagrik jaevibing mepc36 xlysander12 arnoldmak12 ibruthecreator sjferry-hub cupofgeo jetamartin rapperasistan onuratakan digiarchitect qaboahene brinedfish geikha poetic-justice-group

lyricsgenius's Issues

set up as a pypi module?

If this is actually considered, this will require

adding a setup.py
moving config from config to inside python code

Artist.save_lyrics failing

Describe the bug
Using the code artist.save_lyrics() I am given an error when running the script

Expected behavior
I expected the lyrics of a chosen song to be saved to a file

To Reproduce
Describe the steps required to reproduce the behavior.
Use the following code:

import lyricsgenius as genius

api = genius.Genius("MY TOKEN") # Replaced my api token with "MY TOKEN"
artist = api.search_artist("Ariana Grande", max_songs=1)
song = api.search_song("thank u, next", artist.name)
artist.add_song(song)
artist.save_lyrics()

Include the error message associated with the bug.

Traceback (most recent call last):
  File "C:\Users\sebfa\PycharmProjects\TTS\main.py", line 8, in <module>
    artist.save_lyrics()
  File "C:\Users\sebfa\PycharmProjects\TTS\venv\lib\site-packages\lyricsgenius\artist.py", line 109, in save
_lyrics
    filename = "Lyrics_{}.{}".format(self.artist.replace(" ", ""), format_)
AttributeError: 'Artist' object has no attribute 'artist'

Version info

Package version: 1.0.0
OS: Windows 10

Additional context
Add any other context about the problem here.

1 def songsAreSame(s1, s2):
2 from difflib import SequenceMatcher as sm # For comparing similarity of lyrics
3 seqA = sm(None, s1.lyrics, s2['lyrics'])
4 seqB = sm(None, s2['lyrics'], s1.lyrics)
5 return seqA.ratio() > 0.5 or seqB.ratio() > 0.5

I'm curious as to the purpose of the second SM on line 4 (line 80 in artist.py), wouldn't this be one possible cause of the bottleneck occurring during the JSON writing (line 101 artist.py)? If the second SM is necessary, I believe using a permutation approach to lyric checks could reduce the time to write to file. that is mentioned in the comment above the line.

E.g - A temp list would be created and "Song A" would be compared with "B" and "C", then "A" would be removed from the temp list and "B" would be compared with only "C"

Unicode problem with entering artist name

Sometimes, inputting an artist will result in the "Did you mean..." because Genius.com returns \u200b[artist].

Maybe re.sub would help?

How to avoid "SKIPPING `song name` (already found in artist collection)"?

Hi, great wrapper!

I'm trying to grab all the Radiohead lyrics from Genius to do some analysis on them. When I try and save all the songs to the json file I get the message

SKIPPING song name (already found in artist collection)

In my case it's

SKIPPING "Morning Bell/Amnesiac" (already found in artist collection)
SKIPPING "Hunting Bears" (already found in artist collection)
SKIPPING "Feral" (already found in artist collection)

How can I avoid this? I need the data for these three songs.

Thanks!

artist.save_lyrics failing

import lyricsgenius as genius
access_token = 'XXXX'
api = genius.Genius(access_token)
artist = api.search_artist("The Beatles", max_songs=3)
artist.save_lyrics(format_='json', filename='out.json')

.\python\lyrics>py -3 ./genius.py
Searching for songs by The Beatles...

Song 1: "12-Bar Original"
Song 2: "1822!"
"1 [Booklet]" is not valid. Skipping.
"20 Greatest Hits - Art and Tracklist" is not valid. Skipping.
Song 3: ""Abbey Road" side two"

Reached user-specified song limit (3).
Done. Found 3 songs.
Traceback (most recent call last):
File "./genius.py", line 19, in
artist.save_lyrics(format_='json', filename='out.json')
File "C:\Python3\lib\site-packages\lyricsgenius\artist.py", line 129, in save_lyrics
lyrics_to_write['songs'][-1]['album'] = song.album
File "C:\Python3\lib\site-packages\lyricsgenius\song.py", line 45, in album
if 'album' in self._body and 'name' in self._body['album']:
TypeError: argument of type 'NoneType' is not iterable

.\python\lyrics>py -3 --version
Python 3.6.2

Originally posted by @robot3498712 in #71 (comment)

Package won't install

Describe the bug
When I try to install globally using pip install lyricsgenius, I get the following output:

Collecting lyricsgenius
  Using cached https://files.pythonhosted.org/packages/9d/4e/8cd3ff464d5c08e745bfae7c8ea96e64a3584e248ed8b57b9c2d102150d1/lyricsgenius-1.0.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-kJMjH9/lyricsgenius/setup.py", line 21, in <module>
        with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
    TypeError: 'encoding' is an invalid keyword argument for this function
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-kJMjH9/lyricsgenius/

Expected behavior
A global pip install would work without errors.

To Reproduce
Describe the steps required to reproduce the behavior.

Open terminal
pip install lyricsgenius

Include the error message associated with the bug.

TypeError: 'encoding' is an invalid keyword argument for this function
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-kJMjH9/lyricsgenius/

Version info

Package version: Latest
OS: macOS

Additional context
I'm coming from a Node background so this could easily be something I'm doing but I tried this with pipenv, virtualenv, global pip install, and on an AWS Cloud9 instance (to make sure my global pip isn't muddied) and I got similar results each time so I'm thinking there could be an issue at play.

Write more tests

There really need to be more unit tests for this package. Help wanted!

The package currently runs with continuous integration on Travis-CI.

Searching songs is slow

Searching for songs is pretty slow. How can we speed things up?

Not enough documentation

Need to add more documentation to pretty much all of the functions.

Hi, I'm new to Python. Do you have any idea what this error might mean?

TypeError: 'NoneType' object is not subscriptable

Originally posted by @raunakdaga in #55 (comment)

Skipping songs taking longer than fetching one

First of all, thanks for the nice program, seems to work well for the most part.
I'm trying to build a corpus of lyrics for a project at my university, so I try to fetch all the songs of the artists I want to incorporate.
Once the program fetched most of the songs, it seems to find many duplicates and attempts to skip, but skipping takes way longer than fetching a song.
Is there any way to speed up the skipping process?
Best regards.

Trouble with cyrillic

TRY:

import lyricsgenius as genius
api = genius.Genius('token')
song = api.search_song('Возможно')

print(song.lyrics)

Possible Solution:

api.py

110: lyrics = html.find("div", class_="lyrics").get_text().encode('ascii','ignore').decode('ascii')

change to

110: lyrics = html.find("div", class_="lyrics").get_text()

Add more usage examples and documentation

Is your feature request related to a problem? Please describe.
Users aren't aware of what features are available in lyricsgenius and are requesting features that are already a part of the package.

Describe the solution you'd like
Add documentation to the README that includes examples for more use cases. Eventually it would be nice to have a dedicated documentation site.

Song and Artist info should export as JSON

It really makes the most sense to export (i.e. save) Song and Artist objects in JSON format.

Let the user pass a function to handle the verse, chorus (and other) tags

The artist tags in the middle of the lyrics may be very helpful, depending on the application. Maybe another solution is to pass the whole lyrics, with the [tags] unparsed.

Installation - AUR package link

I created a package of LyricsGenius for Arch Linux and published it to AUR.
Maybe you could put Arch Linux installation instructions under "Installation" like this:

Install the AUR package for Arch Linux manually:

curl -L -O https://aur.archlinux.org/cgit/aur.git/snapshot/python-lyricsgenius.tar.gz
tar -xvf python-lyricsgenius.tar.gz
cd python-lyricsgenius
makepkg -si

Speed issues

Hello! I've been attempting to use this wrapper (thank you for putting this up!), but I've been noticing that a lot of times the search_artist function slows to a crawl and takes quite a long time to return any results. Is this to avoid some sort of rate limiting? Is there anything that I can do on my end to improve the speed at which lyrics are returned? Thanks again!

EDIT: I think the speed issues were a result of some of the first songs not having any lyrics. Those results seem to take a lot longer than results with lyrics.

Genius API returns non-songs masquerading as songs

The Genius API includes entries the site refers to as songs that aren't actually songs.

For example, searching for Taylor Swift will return entries for liner notes and a booklet along with actual song lyrics.

My wrapper needs to be able to identify and reject these non-song entries. From what I can tell, the Genius API does not flag these items as non-songs — their type is still listed as "song" in the JSON object.

Is it legal doing the scrapping of html pages with the lyrics?

Hi @johnwmillr!

Pretty cool work LyricsGenius but I have a doubt: is it legal to scrap the lyrics from Genius HTML pages? Reading in the support forums I have found:

https://genius.com/discussions/277279-Get-the-lyrics-of-a-song

What do you think? The safe approach for building web interfaces seems to be to just embed the genius viewer. But it is more flexible if you have direct access to the lyrics contents.

Is it possible to add a timeout parameter for api.search_song()?

I'm scraping lyrics of a list of songs, got a Read Timed Out error. Is it possible to change timeout parameter from 5 to 30?

error message:
ReadTimeout: HTTPSConnectionPool(host='api.genius.com', port=443): Read timed out. (read timeout=5)

Version info

Package version: 0.9.5
OS: MacOS Mojave 10.14

Command line interface should use argparse

LyricsGenius/lyricsgenius/__main__.py

Line 17 in 6a91cd2

# There must be a standard way to handle "--" inputs on the command line

The current method for accepting inputs from the command line uses ad-hoc string parsing. The proper way to parse command line inputs is argparse.

Switching to argparse should be fairly straightforward, maybe a good first issue.

_result_is_lyrics customization

🏷 Enhancement

I agree with most of the filters being applied to reject songs, but having the ability to pass in a list of extra lyric filters or customize the existing criteria could provide additional value to users.

def _result_is_lyrics(self, song_title):
    """Returns False if result from Genius is not actually song lyrics"""
        regex = re.compile(
            r"(tracklist)|(track list)|(album art(work)?)|(liner notes)|(booklet)|(credits)|(remix)|(interview)|(skit)", re.IGNORECASE)
        return not regex.search(song_title)

UnicodeEncodeError when parsing Genius.com search results

Occasionally my code barfs when it encounters a character the ascii codec can't encode.

python genius/genius.py --search_song "Begin Again"

Searching for "Begin Again"...
Traceback (most recent call last):
    File "genius/genius.py", line 397, in <module>
        song = G.search_song(sys.argv[2])                                
    File "genius/genius.py", line 147, in search_song
        found_title  = str(search_hit['title']).translate(None,' ').lower()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u200b' in position 0: ordinal not in range(128)

I assume there is a standard easy fix to this issue. So, I should fix it.

Link to blog is wrong

In the README the link to the blog post is wrong.

Searching for song or artist name requires exact match

My code in the search_song() and search_artist() functions requires an exact match between the user's query and the result returned from the Genius.com search.

Here's an example of the issue:

python genius.py --search_song "Hello Goodbye" "The Beatles"
    Searching for "Hello Goodbye" by The Beatles...
    Specified song was not first result :(

search_song() didn't find "Hello Goodbye" because the top result from Genius.com was "Hello, Goodbye" (note the comma).

Whereas this works:

python genius.py --search_song "Hello, Goodbye" "The Beatles"
   Searching for "Hello, Goodbye" by The Beatles...
   Done.

      "Hello, Goodbye" by The Beatles:
      You say yes, I say no
      You say stop and I say go go go, oh no
      You say goodbye and I say hello
      Hello h...

One simple fix would be stripping any punctuation and capitalization from both the user's search term and the Genius.com search results.

Error while searching for all lyrics by Kanye West

From a comment on my blog:

I'm looking to use it to analyze how an artist's lyrics change over different albums. My first thought was just to pull all of the artist's songs, but I believe there is a song in their directory with missing lyrics that is causing the search to quit.

So is there either a.) a way to avoid the search from stopping or b.) a way to pull songs by album instead of by artist?

I got the error when using the search function on Kanye West. The seach will run up to "All Falls Down" and it prints this AttributeError: 'NoneType' object has no attribute 'get_text' and stops. Looking on the website, the next song on his list of songs is "All Falls Down (Live)" and says it is "Missing Lyrics" so I assumed this caused the error.

So this probably has to do with calling the get_text() function when there aren't actually lyrics available.

Character not recognized

Describe the bug
When I try to scrape lyrics of the top 10 popular Kanye songs, it doesn't recognize one character.

Expected behavior
return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0150' in position 2716: character maps to <undefined>

This error will pop up, and I think this means it encountered the character u0150.

To Reproduce
Describe the steps required to reproduce the behavior.

`if scrape_mode is True:
artist = genius.search_artist("Kanye West", max_songs=10, sort="popularity")
lyrics = ''

for i in range(10):
    with open('Kanye.txt', 'a') as file:
        file.write(artist.songs[i].lyrics)`

Include the error message associated with the bug.

Version info

Package version [import lyricsgenius; print(lyricsgenius.__version__)]
OS: [e.g. macOS, Windows, etc.]

Additional context
Add any other context about the problem here.

JSON is not well formatted

When viewing any of the JSON files exported by any of the save() functions in a Quicklook preview or trying to open the file in Sublime, I get a warning: JSON is not well formatted: Unexpected EOF. The JSON files can still be read into Python just fine using the json module, but I should figure out why I get this warning.

Add option for rate-limiting the API requests

It'd be good to have the option to limit the request rate of any API requests.

More generally, LyricsGenius should have a system in place for handling error responses from the Genius API.

save_lyrics() got an unexpected keyword argument 'format'

save_lyrics() got an unexpected keyword argument 'format'
I can't seem to decide what format I want to save a lyrics in.

Is there a way to add a parameter to make the search_artist function return nothing if an artist is not found?

Currently when running on Jupyter I get a dialogue box that gives me a suggestion for another artists. Is there a way to avoid this dialogue box and have the function return something like "not found"? I'm scraping hundreds of artists so dealing with the dialogue box is a bit difficult.

Remove the header from non-english lyrics

On some (if not all) non-english lyrics, there is a header, in accordance with the genius guide.

You can check two of them here and here.

The "save lyrics" methods should be Song and Artist class methods

It'd make sense to at least have the option to do the following:

# Save lyrics for a single song
song = api.search_song("Hello, Goodbye", "The Beatles")
song.save_lyrics()

# Save all lyrics from a given artist
artist = api.search_artist("The Beatles")
artist.save_lyrics()

Currently you save lyrics by calling api.save_artist_lyrics(artist).

Is there a way to just return a JSON object with save_lyrics and not actually download the file?

Is your feature request related to a problem? Please describe.
Write a clear and concise description of what the problem is -- e.g. "I'm always frustrated when [...]"

Describe the solution you'd like
Write a clear and concise description of what you want to happen.

Describe alternatives you've considered
Write a clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

FileNotFoundError when saving song with a "/"

I was trying to save all the lyrics for songs from an artist, but the save_lyrics() function stopped once it hit a song that has a "/" in the song title.

Here is the error message I received:
FileNotFoundError: [Errno 2] No such file or directory: 'lyrics_arianagrande_blessed/rainbow.json'

To reproduce:
artist_name = "{Ariana Grande}"
artist = api.search_artist(artist_name)
artist.save_lyrics()

(The song is the 32nd song of hers pulled up.)

Can't find certain songs

Describe the bug
When searching for certain songs, no songs are returned. Examples include:

"Sunflower" by Post Malone and Swae Lee
"The Glorious Five" by Logic

Expected behavior
Results should show for songs that are easily searchable using the genius.com UI.

To Reproduce
Describe the steps required to reproduce the behavior.

From the CLI, run the following command: lyricsgenius song "Sunflower" "Post Malone"

Error message associated with the bug:

Searching for "Sunflower" by Post Malone...
Could not find specified song. Check spelling?
Could not find specified song. Check spelling?

Version info

1.0.2
OS: macOS

Additional context
Doesn't appear to be an issue with special characters or too many characters (in the song or artist).

PyPI now supports Markdown

PyPI supposedly now supports Markdown: Markdown Descriptions on PyPI.

I'd like to remove the README.rst file and update the version on PyPI.

Song titles and artist names need to be too exact when searching

Describe the bug
Searching for a song by a given artist requires an input for both song name and artist title too close to the exact song title and artist name.

Expected behavior
Searching for "problems" by "jay z" should get the song "99 problems", but the search fails, even though the search works on Genius.com. Searching for "99 problems" without an artist argument does find the correct song.

To Reproduce
Describe the steps required to reproduce the behavior.

song = api.search_song("99 problems", "jay z")
song is None, but it should have found the song.

Additional context
The lyricsgenius search should be just as flexible as the Genius.com search.

Error message

Hi, I get an error message while using your code:

import lyricsgenius as genius
api = genius.Genius('----my api code ---')
artist = api.search_artist('Andy Shauf', max_songs=3)

Error message:

Traceback (most recent call last):
File "C:/Users/Chris/AppData/Local/Programs/Python/Python37-32/top2000/181208 top2000.py", line 3, in
artist = api.search_artist('Andy Shauf', max_songs=3)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python37-32\lib\site-packages\lyricsgenius\api.py", line 283, in search_artist
found_name = artist_info['artist']['name']
TypeError: 'NoneType' object is not subscriptable

Can you help me with this?
Many thanks!

search_artist should use Genius's the list songs endpoint from the artist's page

Is your feature request related to a problem? Please describe.
The current Genius.search_artist method relies on a heuristic for finding song's by the requested artist. This method is slow, inefficient, and may miss songs that belong to the artist.

Describe the solution you'd like
Use the same endpoint Genius.com uses when listing songs on an artist's page.

Here is an example of the all songs endpoint for Jay-Z:

https://genius.com/api/artists/2/songs?page=5&sort=popularity

Additional context
Not sure if this API endpoint is publicly listed by Genius, but the endpoint returns a 200 when I make a request to it.

Song.save_lyrics doesn't include song title in default file name

Describe the bug
The Song.save_lyrics method saves a file name with artist name but not song title, potentially overwriting different songs by the same artist.

Expected behavior
Default file name should be f"Lyrics_{song.title}_{song.artist}.txt".

To Reproduce
Describe the steps required to reproduce the behavior.

song = api.search_song("99 problems Jay-z")
song.save_lyrics()

Additional context
Problem is an issue if saving multiple songs individually, potentially by the same artist. I should provide a save_songs method that accepts a list of songs.

Song search needs titles check

Describe the bug
The search_song method doesn't check that it's returning the correct song.

Expected behavior
Searching for "99 problems" returns a Drake song, "All Me", instead of the expected "99 Problems" by Jay-Z.

To Reproduce
Describe the steps required to reproduce the behavior.

song = api.search_song("99 problems")
print(song)

Additional context
I should probably add a check in to make sure we're not missing a search result that actually matches the song name.

Search is case sensitive

Search is case sensitive, but shouldn't be.
For example:
song = api.search_song('lose yourself', 'Eminem')
returns no results, whereas if you search by url:
https://genius.com/search?q=lose%20yourself
it returns the correct result.
To fix, edit _clean() function:

def _clean(self, s):
    return s.translate(str.maketrans('','',punctuation)).replace('\u200b', " ").strip().lower()

I.e. just add .lower()

Encoding error during saving of lyrics for an artist

I tried fetching the lyrics of the french rapper Nekfeu and saving them in txt format but I got that error
Traceback (most recent call last): File "lyrics_fetch.py", line 6, in <module> artist.save_lyrics(format = "txt") File "D:\Anaconda3\lib\site-packages\lyricsgenius\artist.py", line 134, in save_lyrics lyrics_file.write(lyrics_to_write) File "D:\Anaconda3\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 20121: character maps to <undefined>

Artist search fails on "Tupac"

Artist search fails when searching for "Tupac" because Genius.com lists him as "2Pac".

The artist page for 2Pac has an AKA section that includes "Tupac". It would probably be possible to check if the user's search term is included in the AKA section of the first artist search result, continuing with the search if a match is found.

Lyrics only?

Is there a way to only pull down the lyrics?
I'm having to sort through the files to remove year/album/artist/etc, and I got to thinking that there just has to be a better way of doing it.

Feature request: add support for the Genius annotations

If we're using the Genius API we really should allow the user to access the lyric annotations, not just the lyrics themselves. It'd take some thought to figure how to properly organize and structure the lyrics, but those decisions may be guided by how Genius already formats their API responses.

Would the lyrics be keys in a dictionary corresponding to the annotation? Would the annotations just be stored sequentially in a list? What's the best format?

Classes within genius.py should be split into separate files

I dunno, should they? What's the standard? Guess it couldn't hurt. Makes more sense than just having a genius.py file all by itself in the genius/ directory.

You'd have something like this:

genius/
    genius.py
    api.py
    Song.py
    Artist.py

It's not an issue, but..

I got a little question here. Is there a node.js version of this, or can somebody convert it?

I'm making a project that gets lyrics, and this is exactly what I need - but it's Python :(

Thanks!

johnwmillr / lyricsgenius Goto Github PK

lyricsgenius's People

Contributors

Stargazers

Watchers

Forkers

lyricsgenius's Issues

TRY:

Possible Solution:

Recommend Projects

Recommend Topics

Recommend Org