digital-monad / ttds Goto Github PK
View Code? Open in Web Editor NEWGroup coursework for the Text Technologies for Data Science course.
Group coursework for the Text Technologies for Data Science course.
Create the script to build the index file. Script should generate the term positional inverted index for the song lyric corpus, following a hierarchical format, like the movie quote search group. This should allow us to display actual song lyric lines as results, rather than just the song title/the entirety of the lyrics.
Proposed structure (up for discussion):
term1 : {
song1 : {
[(line0, pos2), (line0, pos13), (line12, pos0)]
},
song2 : {
[(line3, pos5), (line10, pos3)]
}
}
Use BERT libraries to make script to expand search terms.
The base search algorithm - given a query Q, return a ranked list of documents D ordered by their relevance to the query using BM25.
This should be responsive for both mobile and desktop usage - this feature will be graded for usability
These are grouped into 1 issue so that there is consistency between the 3 algorithms
Parse the input search query. Determine which search is required (boolean, BM25 ranked, proximity etc.) and pass it to the relevant search function.
Create function to apply preprocessing to arbitrary input text. Should include some subset of the following:
Compressing the information from CSV file to Binary?
Use the Spotify API and web scrape Genius Lyrics to get song lyric data as per this guide. For now I guess just put it into the format from the guide, i.e. a pandas
dataframe (not sure how big this will have to be).
Functions to be rewritten:
3 collections should be inserted into MongoDB: LyricsMetadata, InvertedIndex, SongsMetaData
SongsMetaData - csv (display frontend)
LyricsMetaData - csv (display frontend)
InvertedIndex - pickle
Missing index_writer.py inserts pickle index file into Pymongo/MongoDB
Because data collection script is finally completed, we now need to obtain as much data as possible.
This means collecting artists' initials from JSON files and then translated into CSV.
This would be then used to create index files
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.