abhishek-vinjamoori / subtitleextractor Goto Github PK
View Code? Open in Web Editor NEWThis repository is aimed at downloading subtitles from popular Internet Services.
This repository is aimed at downloading subtitles from popular Internet Services.
DevTools listening on ws://127.0.0.1:12342/devtools/browser/9676a5c1-9792-4c93-9d67-a99799503e4c
Traceback (most recent call last):
File "C:\Users\stern\OneDrive\Documents\Subtitle Extractor\SubtitleExtractor-master\setup.py", line 115, in
main()
File "C:\Users\stern\OneDrive\Documents\Subtitle Extractor\SubtitleExtractor-master\setup.py", line 106, in main
updateServicesservices
File "C:\Users\stern\OneDrive\Documents\Subtitle Extractor\SubtitleExtractor-master\setup.py", line 47, in amazonUpdate
amazonDriver = webdriver.Chrome()
File "C:\Users\stern\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 81, in init
desired_capabilities=desired_capabilities)
File "C:\Users\stern\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 157, in init
self.start_session(capabilities, browser_profile)
File "C:\Users\stern\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "C:\Users\stern\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\stern\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created exception
from unknown error: Runtime.executionContextCreated has invalid 'context': {"auxData":{"frameId":"525EC51DA5BAF86B951CED57002827AD","isDefault":true,"type":"default"},"id":1,"name":"","origin":"://"}
(Session info: chrome=75.0.3770.142)
(Driver info: chromedriver=2.22.397933 (1cab651507b88dec79b2b2a22d1943c01833cc1b),platform=Windows NT 10.0.17763 x86_64)
Currently they are downloaded in XML format.
Convert them to .srt
createSoupObject is common to bbc, comedycentral, crackle, crunchyroll, fox, hulu, netflix, youtube and can be refactored. Despite it being in common already?
deleteUnnecessaryfiles is common to all, but already appears in common.
downloadXMLTranscript is common to bbc, crackle, and youtube
convertXMLToSrt is common to bbc, crunchyroll, and youtube
downloadDfxpTranscript is common to amazon, comedycentral and netflix
getContentID1 and getContentID2 are common to comedycentral and fox, and hulu, but hulu has a different version.
getShowJson, getShowDetails, getSubtitleUrl, and processShowName are common to comedycentral and fox. comedycentral and fox seem to share a lot of code.
BBC and crackle share the same getTitle code.
Multiple instances across many files where code is never called upon or has no purpose. Example:
firefox_profile = webdriver.FirefoxProfile() firefox_profile.set_preference('permissions.default.stylesheet', 2) firefox_profile.set_preference('permissions.default.image', 2) firefox_profile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', 'false')
^ This block of code is present in some files, however firefox_profile
is never used because a chrome driver is initiated instead 'driver = webdriver.Chrome()
'
There are also some unnecessary pass
statements at the end of functions which have no purpose.
Multiple blocks of code have been commented off in numerous locations (possibly done for testing purposes when code was being written) in the repository, these should also be removed to keep the code clean.
I wanted to start contributing to this repo. Can the admin please provide pat-tags to the issues?
========================== RESTART: C:\se\setup.py ==========================
Warning (from warnings module):
File "C:\Python36\lib\site-packages\bs4_init_.py", line 146
warnings.warn("You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.")
UserWarning: You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.
Traceback (most recent call last):
File "C:\se\setup.py", line 115, in
main()
File "C:\se\setup.py", line 106, in main
updateServicesservices
File "C:\se\setup.py", line 70, in amazonUpdate
pageSource = BeautifulSoup(pageSource, "lxml", from_encoding="utf8").pre.text
File "C:\Python36\lib\site-packages\bs4_init_.py", line 165, in init
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
Downloading Subtitles from Crackle throws an Authorization Error.
when i try to downlod from hulu i got this messge:
C:\Python\lib\site-packages\bs4_init_.py:203: UserWarning: You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.
warnings.warn("You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.")
Unable to get the subtitles. Please try again and open an issue to request for support for this video.
Subtitles not downloaded.
someone can help me how to fix it?
There is some Junk text which is being pre-fixed before the start of the subtitle file, which causes the error.
In comedycentral.py
on line 181:
SubsUrl += str(self.contentID)
The variable contentID is never initialized and therefore the script will throw an error when getSubtitleUrl() is called.
In crackle.py
on line 140 and 142:
titleString = self.soup.programme.display_title.title.string
...
titleString += self.soup.programme.display_title.subtitle.string
The variable self.soup is never initialized.
-Lenovo-Z50-70:~/Desktop/Meh/SubtitleExtractor-master$ ./SubtitleExtractor.py
Downloading Subtitles
Paste the link here : https://www.youtube.com/watch?v=sl_U0wMcLzY
youtube
Detected YouTube
Processing....
Title - ๐ STOP! -- YOU NEED AT LEAST 200 IQ TO PLAY THIS GAME ๐
<<<------ Choose the corressponding number for selecting the language ----->>>
1
An unknown error occurred
Error with language selection.
See if it is possible to get auto-generated subtitles.
Youtube
Those modules share identical or very similar functions that can be listed in common:
Fox and Hulu share the first half of getSubtitles
BBC and crackle have very similar getEpisodeID functions that can be converted into one by listing searchStringList as function parameter and making some minor tweaks
Fox and Comedy Central have same getSubtitleUrl function
Crackle, BBC and Youtube have nearly same downloadXMLTranscript function. Youtube one is a bit different and I am not sure how to merge them and do not deny possibility that it is impossible
Crunchyroll, youtube and bbc have exactly same convertXMLToSrt function. In Netflix module this function is different, but I think that same function as for this 3 can be reused.
Amazon, hulu and netflix getTitle functions differ only slightly;
comedicentral & fox and bbc & crackle getTitle functions are identical
For Comedy Central's and Fox's getContentID1/2, getShowJson, getShowDetails and processShowName functions are identical
Amazon, netflix and comedycentral have similar downloadDfxpTranscript functions
comedycentral, crunchyroll, fox, netflix, youtube have nearly same standardCheck function
P.S. I evaluated newamazon module as it seams newer
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.