johnwmillr / lyricsgenius Goto Github PK
View Code? Open in Web Editor NEWDownload song lyrics and metadata from Genius.com 🎶🎤
Home Page: http://www.johnwmillr.com/scraping-genius-lyrics/
License: MIT License
Download song lyrics and metadata from Genius.com 🎶🎤
Home Page: http://www.johnwmillr.com/scraping-genius-lyrics/
License: MIT License
If this is actually considered, this will require
Describe the bug
Using the code artist.save_lyrics()
I am given an error when running the script
Expected behavior
I expected the lyrics of a chosen song to be saved to a file
To Reproduce
Describe the steps required to reproduce the behavior.
Use the following code:
import lyricsgenius as genius
api = genius.Genius("MY TOKEN") # Replaced my api token with "MY TOKEN"
artist = api.search_artist("Ariana Grande", max_songs=1)
song = api.search_song("thank u, next", artist.name)
artist.add_song(song)
artist.save_lyrics()
Include the error message associated with the bug.
Traceback (most recent call last):
File "C:\Users\sebfa\PycharmProjects\TTS\main.py", line 8, in <module>
artist.save_lyrics()
File "C:\Users\sebfa\PycharmProjects\TTS\venv\lib\site-packages\lyricsgenius\artist.py", line 109, in save
_lyrics
filename = "Lyrics_{}.{}".format(self.artist.replace(" ", ""), format_)
AttributeError: 'Artist' object has no attribute 'artist'
Version info
Additional context
Add any other context about the problem here.
1 def songsAreSame(s1, s2):
2 from difflib import SequenceMatcher as sm # For comparing similarity of lyrics
3 seqA = sm(None, s1.lyrics, s2['lyrics'])
4 seqB = sm(None, s2['lyrics'], s1.lyrics)
5 return seqA.ratio() > 0.5 or seqB.ratio() > 0.5
I'm curious as to the purpose of the second SM on line 4 (line 80 in artist.py), wouldn't this be one possible cause of the bottleneck occurring during the JSON writing (line 101 artist.py)? If the second SM is necessary, I believe using a permutation approach to lyric checks could reduce the time to write to file. that is mentioned in the comment above the line.
E.g - A temp list would be created and "Song A" would be compared with "B" and "C", then "A" would be removed from the temp list and "B" would be compared with only "C"
Sometimes, inputting an artist will result in the "Did you mean..." because Genius.com returns \u200b[artist].
Maybe re.sub would help?
Hi, great wrapper!
I'm trying to grab all the Radiohead lyrics from Genius to do some analysis on them. When I try and save all the songs to the json file I get the message
SKIPPING
song name
(already found in artist collection)
In my case it's
SKIPPING "Morning Bell/Amnesiac" (already found in artist collection)
SKIPPING "Hunting Bears" (already found in artist collection)
SKIPPING "Feral" (already found in artist collection)
How can I avoid this? I need the data for these three songs.
Thanks!
import lyricsgenius as genius
access_token = 'XXXX'
api = genius.Genius(access_token)
artist = api.search_artist("The Beatles", max_songs=3)
artist.save_lyrics(format_='json', filename='out.json')
.\python\lyrics>py -3 ./genius.py
Searching for songs by The Beatles...
Song 1: "12-Bar Original"
Song 2: "1822!"
"1 [Booklet]" is not valid. Skipping.
"20 Greatest Hits - Art and Tracklist" is not valid. Skipping.
Song 3: ""Abbey Road" side two"
Reached user-specified song limit (3).
Done. Found 3 songs.
Traceback (most recent call last):
File "./genius.py", line 19, in
artist.save_lyrics(format_='json', filename='out.json')
File "C:\Python3\lib\site-packages\lyricsgenius\artist.py", line 129, in save_lyrics
lyrics_to_write['songs'][-1]['album'] = song.album
File "C:\Python3\lib\site-packages\lyricsgenius\song.py", line 45, in album
if 'album' in self._body and 'name' in self._body['album']:
TypeError: argument of type 'NoneType' is not iterable
.\python\lyrics>py -3 --version
Python 3.6.2
Originally posted by @robot3498712 in #71 (comment)
Describe the bug
When I try to install globally using pip install lyricsgenius
, I get the following output:
Collecting lyricsgenius
Using cached https://files.pythonhosted.org/packages/9d/4e/8cd3ff464d5c08e745bfae7c8ea96e64a3584e248ed8b57b9c2d102150d1/lyricsgenius-1.0.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-kJMjH9/lyricsgenius/setup.py", line 21, in <module>
with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
TypeError: 'encoding' is an invalid keyword argument for this function
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-kJMjH9/lyricsgenius/
Expected behavior
A global pip install would work without errors.
To Reproduce
Describe the steps required to reproduce the behavior.
pip install lyricsgenius
Include the error message associated with the bug.
TypeError: 'encoding' is an invalid keyword argument for this function
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-kJMjH9/lyricsgenius/
Version info
Additional context
I'm coming from a Node background so this could easily be something I'm doing but I tried this with pipenv, virtualenv, global pip install, and on an AWS Cloud9 instance (to make sure my global pip isn't muddied) and I got similar results each time so I'm thinking there could be an issue at play.
There really need to be more unit tests for this package. Help wanted!
The package currently runs with continuous integration on Travis-CI.
Searching for songs is pretty slow. How can we speed things up?
Need to add more documentation to pretty much all of the functions.
Hi, I'm new to Python. Do you have any idea what this error might mean?
TypeError: 'NoneType' object is not subscriptable
Originally posted by @raunakdaga in #55 (comment)
First of all, thanks for the nice program, seems to work well for the most part.
I'm trying to build a corpus of lyrics for a project at my university, so I try to fetch all the songs of the artists I want to incorporate.
Once the program fetched most of the songs, it seems to find many duplicates and attempts to skip, but skipping takes way longer than fetching a song.
Is there any way to speed up the skipping process?
Best regards.
import lyricsgenius as genius
api = genius.Genius('token')
song = api.search_song('Возможно')print(song.lyrics)
api.py
110: lyrics = html.find("div", class_="lyrics").get_text().encode('ascii','ignore').decode('ascii')
change to
110: lyrics = html.find("div", class_="lyrics").get_text()
Is your feature request related to a problem? Please describe.
Users aren't aware of what features are available in lyricsgenius
and are requesting features that are already a part of the package.
Describe the solution you'd like
Add documentation to the README
that includes examples for more use cases. Eventually it would be nice to have a dedicated documentation site.
It really makes the most sense to export (i.e. save) Song and Artist objects in JSON format.
The artist tags in the middle of the lyrics may be very helpful, depending on the application. Maybe another solution is to pass the whole lyrics, with the [tags] unparsed.
I created a package of LyricsGenius for Arch Linux and published it to AUR.
Maybe you could put Arch Linux installation instructions under "Installation" like this:
Install the AUR package for Arch Linux manually:
curl -L -O https://aur.archlinux.org/cgit/aur.git/snapshot/python-lyricsgenius.tar.gz
tar -xvf python-lyricsgenius.tar.gz
cd python-lyricsgenius
makepkg -si
Hello! I've been attempting to use this wrapper (thank you for putting this up!), but I've been noticing that a lot of times the search_artist
function slows to a crawl and takes quite a long time to return any results. Is this to avoid some sort of rate limiting? Is there anything that I can do on my end to improve the speed at which lyrics are returned? Thanks again!
EDIT: I think the speed issues were a result of some of the first songs not having any lyrics. Those results seem to take a lot longer than results with lyrics.
The Genius API includes entries the site refers to as songs that aren't actually songs.
For example, searching for Taylor Swift will return entries for liner notes and a booklet along with actual song lyrics.
My wrapper needs to be able to identify and reject these non-song entries. From what I can tell, the Genius API does not flag these items as non-songs — their type is still listed as "song" in the JSON object.
Hi @johnwmillr!
Pretty cool work LyricsGenius but I have a doubt: is it legal to scrap the lyrics from Genius HTML pages? Reading in the support forums I have found:
https://genius.com/discussions/277279-Get-the-lyrics-of-a-song
What do you think? The safe approach for building web interfaces seems to be to just embed the genius viewer. But it is more flexible if you have direct access to the lyrics contents.
I'm scraping lyrics of a list of songs, got a Read Timed Out error. Is it possible to change timeout parameter from 5 to 30?
error message:
ReadTimeout: HTTPSConnectionPool(host='api.genius.com', port=443): Read timed out. (read timeout=5)
Version info
LyricsGenius/lyricsgenius/__main__.py
Line 17 in 6a91cd2
The current method for accepting inputs from the command line uses ad-hoc string parsing. The proper way to parse command line inputs is argparse
.
Switching to argparse
should be fairly straightforward, maybe a good first issue.
🏷 Enhancement
I agree with most of the filters being applied to reject songs, but having the ability to pass in a list of extra lyric filters or customize the existing criteria could provide additional value to users.
def _result_is_lyrics(self, song_title):
"""Returns False if result from Genius is not actually song lyrics"""
regex = re.compile(
r"(tracklist)|(track list)|(album art(work)?)|(liner notes)|(booklet)|(credits)|(remix)|(interview)|(skit)", re.IGNORECASE)
return not regex.search(song_title)
Occasionally my code barfs when it encounters a character the ascii
codec can't encode.
python genius/genius.py --search_song "Begin Again"
Searching for "Begin Again"...
Traceback (most recent call last):
File "genius/genius.py", line 397, in <module>
song = G.search_song(sys.argv[2])
File "genius/genius.py", line 147, in search_song
found_title = str(search_hit['title']).translate(None,' ').lower()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u200b' in position 0: ordinal not in range(128)
I assume there is a standard easy fix to this issue. So, I should fix it.
In the README the link to the blog post is wrong.
My code in the search_song()
and search_artist()
functions requires an exact match between the user's query and the result returned from the Genius.com search.
Here's an example of the issue:
python genius.py --search_song "Hello Goodbye" "The Beatles"
Searching for "Hello Goodbye" by The Beatles...
Specified song was not first result :(
search_song()
didn't find "Hello Goodbye" because the top result from Genius.com was "Hello, Goodbye" (note the comma).
Whereas this works:
python genius.py --search_song "Hello, Goodbye" "The Beatles"
Searching for "Hello, Goodbye" by The Beatles...
Done.
"Hello, Goodbye" by The Beatles:
You say yes, I say no
You say stop and I say go go go, oh no
You say goodbye and I say hello
Hello h...
One simple fix would be stripping any punctuation and capitalization from both the user's search term and the Genius.com search results.
From a comment on my blog:
I'm looking to use it to analyze how an artist's lyrics change over different albums. My first thought was just to pull all of the artist's songs, but I believe there is a song in their directory with missing lyrics that is causing the search to quit.
So is there either a.) a way to avoid the search from stopping or b.) a way to pull songs by album instead of by artist?
I got the error when using the search function on Kanye West. The seach will run up to "All Falls Down" and it prints this AttributeError: 'NoneType' object has no attribute 'get_text' and stops. Looking on the website, the next song on his list of songs is "All Falls Down (Live)" and says it is "Missing Lyrics" so I assumed this caused the error.
So this probably has to do with calling the get_text()
function when there aren't actually lyrics available.
Describe the bug
When I try to scrape lyrics of the top 10 popular Kanye songs, it doesn't recognize one character.
Expected behavior
return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0150' in position 2716: character maps to <undefined>
This error will pop up, and I think this means it encountered the character u0150.
To Reproduce
Describe the steps required to reproduce the behavior.
`if scrape_mode is True:
artist = genius.search_artist("Kanye West", max_songs=10, sort="popularity")
lyrics = ''
for i in range(10):
with open('Kanye.txt', 'a') as file:
file.write(artist.songs[i].lyrics)`
Include the error message associated with the bug.
Version info
import lyricsgenius; print(lyricsgenius.__version__)
]Additional context
Add any other context about the problem here.
When viewing any of the JSON files exported by any of the save()
functions in a Quicklook preview or trying to open the file in Sublime, I get a warning: JSON is not well formatted: Unexpected EOF
. The JSON files can still be read into Python just fine using the json
module, but I should figure out why I get this warning.
It'd be good to have the option to limit the request rate of any API requests.
More generally, LyricsGenius should have a system in place for handling error responses from the Genius API.
Currently when running on Jupyter I get a dialogue box that gives me a suggestion for another artists. Is there a way to avoid this dialogue box and have the function return something like "not found"? I'm scraping hundreds of artists so dealing with the dialogue box is a bit difficult.
On some (if not all) non-english lyrics, there is a header, in accordance with the genius guide.
It'd make sense to at least have the option to do the following:
# Save lyrics for a single song
song = api.search_song("Hello, Goodbye", "The Beatles")
song.save_lyrics()
# Save all lyrics from a given artist
artist = api.search_artist("The Beatles")
artist.save_lyrics()
Currently you save lyrics by calling api.save_artist_lyrics(artist)
.
Is your feature request related to a problem? Please describe.
Write a clear and concise description of what the problem is -- e.g. "I'm always frustrated when [...]"
Describe the solution you'd like
Write a clear and concise description of what you want to happen.
Describe alternatives you've considered
Write a clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
I was trying to save all the lyrics for songs from an artist, but the save_lyrics() function stopped once it hit a song that has a "/" in the song title.
Here is the error message I received:
FileNotFoundError: [Errno 2] No such file or directory: 'lyrics_arianagrande_blessed/rainbow.json'
To reproduce:
artist_name = "{Ariana Grande}"
artist = api.search_artist(artist_name)
artist.save_lyrics()
(The song is the 32nd song of hers pulled up.)
Describe the bug
When searching for certain songs, no songs are returned. Examples include:
Expected behavior
Results should show for songs that are easily searchable using the genius.com UI.
To Reproduce
Describe the steps required to reproduce the behavior.
lyricsgenius song "Sunflower" "Post Malone"
Error message associated with the bug:
Searching for "Sunflower" by Post Malone...
Could not find specified song. Check spelling?
Could not find specified song. Check spelling?
Version info
Additional context
Doesn't appear to be an issue with special characters or too many characters (in the song or artist).
PyPI supposedly now supports Markdown: Markdown Descriptions on PyPI.
I'd like to remove the README.rst
file and update the version on PyPI.
Describe the bug
Searching for a song by a given artist requires an input for both song name and artist title too close to the exact song title and artist name.
Expected behavior
Searching for "problems" by "jay z" should get the song "99 problems", but the search fails, even though the search works on Genius.com. Searching for "99 problems" without an artist argument does find the correct song.
To Reproduce
Describe the steps required to reproduce the behavior.
song = api.search_song("99 problems", "jay z")
song
is None
, but it should have found the song.Additional context
The lyricsgenius
search should be just as flexible as the Genius.com search.
Hi, I get an error message while using your code:
import lyricsgenius as genius
api = genius.Genius('----my api code ---')
artist = api.search_artist('Andy Shauf', max_songs=3)
Error message:
Traceback (most recent call last):
File "C:/Users/Chris/AppData/Local/Programs/Python/Python37-32/top2000/181208 top2000.py", line 3, in
artist = api.search_artist('Andy Shauf', max_songs=3)
File "C:\Users\Chris\AppData\Local\Programs\Python\Python37-32\lib\site-packages\lyricsgenius\api.py", line 283, in search_artist
found_name = artist_info['artist']['name']
TypeError: 'NoneType' object is not subscriptable
Can you help me with this?
Many thanks!
Is your feature request related to a problem? Please describe.
The current Genius.search_artist
method relies on a heuristic for finding song's by the requested artist. This method is slow, inefficient, and may miss songs that belong to the artist.
Describe the solution you'd like
Use the same endpoint Genius.com uses when listing songs on an artist's page.
Here is an example of the all songs endpoint for Jay-Z:
Additional context
Not sure if this API endpoint is publicly listed by Genius, but the endpoint returns a 200 when I make a request to it.
Describe the bug
The Song.save_lyrics
method saves a file name with artist name but not song title, potentially overwriting different songs by the same artist.
Expected behavior
Default file name should be f"Lyrics_{song.title}_{song.artist}.txt"
.
To Reproduce
Describe the steps required to reproduce the behavior.
song = api.search_song("99 problems Jay-z")
song.save_lyrics()
Additional context
Problem is an issue if saving multiple songs individually, potentially by the same artist. I should provide a save_songs
method that accepts a list of songs.
Describe the bug
The search_song
method doesn't check that it's returning the correct song.
Expected behavior
Searching for "99 problems" returns a Drake song, "All Me", instead of the expected "99 Problems" by Jay-Z.
To Reproduce
Describe the steps required to reproduce the behavior.
print(song)
Additional context
I should probably add a check in to make sure we're not missing a search result that actually matches the song name.
Search is case sensitive, but shouldn't be.
For example:
song = api.search_song('lose yourself', 'Eminem')
returns no results, whereas if you search by url:
https://genius.com/search?q=lose%20yourself
it returns the correct result.
To fix, edit _clean()
function:
def _clean(self, s):
return s.translate(str.maketrans('','',punctuation)).replace('\u200b', " ").strip().lower()
I.e. just add .lower()
I tried fetching the lyrics of the french rapper Nekfeu and saving them in txt format but I got that error
Traceback (most recent call last): File "lyrics_fetch.py", line 6, in <module> artist.save_lyrics(format = "txt") File "D:\Anaconda3\lib\site-packages\lyricsgenius\artist.py", line 134, in save_lyrics lyrics_file.write(lyrics_to_write) File "D:\Anaconda3\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 20121: character maps to <undefined>
Artist search fails when searching for "Tupac" because Genius.com lists him as "2Pac".
The artist page for 2Pac has an AKA section that includes "Tupac". It would probably be possible to check if the user's search term is included in the AKA section of the first artist search result, continuing with the search if a match is found.
Is there a way to only pull down the lyrics?
I'm having to sort through the files to remove year/album/artist/etc, and I got to thinking that there just has to be a better way of doing it.
If we're using the Genius API we really should allow the user to access the lyric annotations, not just the lyrics themselves. It'd take some thought to figure how to properly organize and structure the lyrics, but those decisions may be guided by how Genius already formats their API responses.
Would the lyrics be keys in a dictionary corresponding to the annotation? Would the annotations just be stored sequentially in a list? What's the best format?
I dunno, should they? What's the standard? Guess it couldn't hurt. Makes more sense than just having a genius.py
file all by itself in the genius/
directory.
You'd have something like this:
genius/
genius.py
api.py
Song.py
Artist.py
I got a little question here. Is there a node.js version of this, or can somebody convert it?
I'm making a project that gets lyrics, and this is exactly what I need - but it's Python :(
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.