lbryio / lighthouse.js Goto Github PK

Lighthouse - A lightning fast search for the LBRY blockchain

Home Page: https://lbryio.github.io/lighthouse/

License: MIT License

JavaScript 98.45% Shell 1.55%

lighthouse.js's Introduction

Lighthouse - A lightning fast search for the LBRY blockchain

Lighthouse is a lightning-fast advanced search engine API for publications on the lbrycrd with autocomplete capabilities. The official lighthouse instance is live at https://lighthouse.lbry.com

What does Lighthouse consist of?

Elasticsearch as a backend db server.
LBRYimport, an importer that imports the claims into the Elasticsearch database.
Lighthouse API server, which serves the API and does all calculations about what to send to the end user.

API Documentation / Usage example

To make a simple search by string:

https://lighthouse.lbry.com/search?s=stringtosearch

To get autocomplete suggestions:

https://lighthouse.lbry.com/autocomplete?s=stringtocomp

The full API documentation

Installation

Prerequisites

To get started you should clone the git:

git clone https://github.com/lbryio/lighthouse

Make sure elasticsearch is running and run (from the lighthouse dir):

./gendb.sh

Install dependencies:

npm run install --production=false

Build and run Lighthouse:

npm run prod

You are now up and running! You can connect to lighthouse at http://localhost:50005, api documentation is here. Lighthouse will continue syncing in the background. It usually takes ~15 minutes before all claims are up to date in the database.

Contributing

Contributions to this project are welcome, encouraged, and compensated. For more details, see lbry.com/faq/contributing

License

This project is MIT Licensed © LBRYio, Filip Nyquist

Security

We take security seriously. Please contact [email protected] regarding any security issues. Our PGP key is here if you need it.

Contact

The primary contact for this project is @tiger5226 ([email protected])

lighthouse.js's People

Contributors

Stargazers

Watchers

Forkers

matwaller belfordz lbrydocs kyroskoh marcdeb1 vyaspranjal33 sridhareaswaran filipnyquist sccalabr tommyteavee aminemahd13

lighthouse.js's Issues

Implent the search logic and sorting into the API

Return more than 10 search results

Lighthouse currently results 10 results, what do we need to do to increase this number? https://lighthouse.lbry.io/search?s=bitcoin

We need to make sure that the app can support more than 10 results by paginating: lbryio/lbry-desktop#1261

Filters

Filtering server side

Is there any news / plans for filter integration,
like search only for a specific content type: audio / video / files,
or channel and tags...
(if the daemon ever support it) 🙃

lbryio/lbry-desktop#664 (comment)

Autocomplete Query is not returning the proper results

The autocomplete query suffers from the same problems that search did previously. the value section of the elastic document is of type nested, which means the query needs to also be a nested query. The result therefore is actually only searching the name of a claim instead of search for the best auto complete term across the main fields of title, description and author. Below is an example that is current:

https://lighthouse.lbry.io/autocomplete?s=test%20a

Result:

[
"make-a-test-tube-thunderstorm",
"Make a Test Tube Thunderstorm!",
"NurdRage","a-test-of-wills-charles-todd-mobi","A Test of Wills By Charles Todd Mobi Format","upload-test-12-11-17-a","spee.ch","how-to-detect-a-secret-nuclear-test","How To Detect A Secret Nuclear Test","minutephysics"]

Now if you look at the first result make-a-test-tube-thunderstorm and search that with

https://lighthouse.lbry.io/search?s=test%20a

Result:

[
{"name":"test","claimId":"a607349ce83d3a86bac87a967ce7f9647e1ba736","value":{"claimType":"streamType","stream":{"metadata":{"preview":"","license":"Creative Commons Attribution 4.0 International","licenseUrl":"","thumbnail":"","nsfw":false,"author":"test","description":"test","language":"en","title":"test","version":"_0_1_0"},"source":{"sourceType":"lbry_sd_hash","source":"29ad218c61c599499b22c17228371b5fe9a6e725edc9ef691a7819b3e7406500467852920629fbcddbcacf13d31579d4","version":"_0_0_1","contentType":"image/png"},"version":"_0_0_1"},"version":"_0_0_1"}},
{"name":"make-a-test-tube-thunderstorm","claimId":"40d193673b0730907449a3dde387b2cdb0314eff",
...

You can see that the claim name test is first but then "name":"make-a-test-tube-thunderstorm" is second. So it is only searching the name field.

The internal server error should be a separate issue. We should not hit an error by entering a query. I will create another issue for this.

Lastly, since the elastic search query needs to be modified, getting this right takes some time, I don't think this is a level 1, so I increased it to a level 2.

lighthouse documentation is not working

Try the autocomplete function here, it fails:

https://lbryio.github.io/lighthouse/

Searching FreeKeene/freekeene/free keene does not return correct result

I would expect lbry://@FreeKeene in the results

ide files accidentally committed

https://github.com/lbryio/lighthouse/pull/84

pulled in ide files. This should be removed.

cannot find claim with search terms from title

cannot get this claim to show in search results: lbry://tbg-48Bm1Nlos-E#6f8e02e095aa6bf17e1e511b164974ba087fece4 when searching for Marathon / Retro Games / Live Stream

claimTrieCache.json should be generated on first boot

claimTrieCache.json should not exist in git but get generated on first launch.

searching "retro games" brings back irrelevant results / searching with quotes

See below for results searching "retro games". This may be a more general issue when searching with quotes

Internal server error on autocomplete

In app example

This is needed for the new release that uses the autocomplete.

Auto-deploy/release lighthouse

https://github.com/semantic-release/semantic-release

This is a really cool and popular package. Then we just need to process the github webhook when a tagged release is created to download, unzip and startup the binary.

Acceptance Criteria

Definition of Done

searching for "aquarium" does not return correct result

I'd expect this to return: lbry://lobstersr#057bbbff837c8c108dc6b5a8af20bcb3ab8ad31b

"john cleese" search term does not return @JohnCleese channel

Looks like search has issues with multi term queries

autocomplete is producing errors on production

Try the following query:

https://lighthouse.lbry.io/autocomplete?s=fillerino

A query should not cause an internal server error.

Return all claims for a search term and not only from winning claim.

If I search for one it should show all the claims not only the winning claim. That's how content which is not winning could be discovered. I remember bringing it up earlier, but I forgot what was the verdict on it.

Additional input from @tzarebczan and @kauffj..?

Might be related to #32.

Clean up and lint importer code

Importer needs to add the block that the claim was made in as depth wont work in a a plain-non-updating-all-claims database.

Get chainquery merged and live into production.

@tiger5226 is working on integrating chainquery into lighthouse. When my review is done the next step is deployment to a live server, this means starting over with a new elasticsearch database.

My main idea was to redeploy ligthhouse to a new server, get it synced up and then change the DNS for lighthouse.lbry.io to the new one. Then let the DNS change propagate and remove the old server.

Feel free to add your ideas down below!

Partial searches for channel not returning

Searching for the channelname wouldnt come up unless you used @ in your search. It also had to be an exact match to get results

Add channel parameter to search api

This will be to support searching within a specific channel for the app.

Prepare lighthouse for app integration.

search currently seems to work as it should and im starting to make the final changes for it to be lbry-app ready ideally i would like lbry-app to have minimal to no changes to be able to connect to lighthouse to make this happen the following needs to be done.

Required changes to lighthouse;

Updating to post as so it matches the prexisting call.
Altering the response to only return name claimId value

Nice to haves features to add;

add getClaimsInBlock
add getSuggestedClaims
add getTotalClaimsValue
Adding claim statistics calls.

Update lighthouse documentation to follow our format.

Update the lighthouse README.md to follow the standards that we have set for documentation.
See "Repository standards" https://lbry.tech/contribute/

spent claims show up in search results

When lighthouse queries chainquery for updated/new claims it doesn't do a check if it is spent.

It needs to check if the claim is spent and then delete it from elastic search so it is no longer searchable. The app will not resolve a spent claim and will only show that the position is available if someone wants to use it. cd0f980
Lighthouse also needs to delete claims from elastic search that are expired. cd0f980

Add weight for Claims with bids that are controlling

The elastic search query needs to be adjusted such that when there is a choice between two claims of equal weight in the search results, the one that is controlling gets a higher score which then means it returns at a higher place in the search results.

This also originated from #58 but since this is a separate issue from claim updates not being sync I pulled it into a separate issue.

Write documentation and add autodoc creation to the API server

if sync fails it never runs again

The sync process should continue to try on schedule, even if it hits an error. In the current state, if it hits an error the only way to launch the sync process again is restart the lighthouse service on the machine.

This issue is triggered by the investigation of #68. The claim trie had not been synced since the last deployment because it failed once when elastic search connection was not instantly available. Making sure it can retry later, is important to avoiding this in the future.

Exclude blocked content from search results

List of blocked outpoints is at api.lbry.io/file/list_blocked

searching nintendo does not bring up all results

Searching for "nintendo" does not bring up lbry://re2-lA-q8DMm5r8#8e1e8b42451a0c0dc3db4e8b4f645f2ad802e665 which has "Nintendo Hardware and Revisitation - Rerez Talks" as the title.

Create base of the API server with Nodejs v8 and koa

searching with lbry://xxxxx fails, bad request

I just copy/pasted this url:lbry://ee-8blgmbXfDEc into search and I noticed a bad request return in the network logs - the top result is the exactly URL, but at the bottom you see:

Search Results for lbry://ee-8blgmbXfDEc ?

lbry://ee-8blgmbXfDEc Be the first

I can see users copy/pasting URLs and running into this issue if they don't click Enter immediately.

Searching "Danielo" does not return @Danielo channel

This channel was created on spee.ch recently a user noticed that he wasn't able to search for it in the app.

Ansible config for easy deployment

Make the importer a module of the API server so we can control it from inside the API server.

Search ranking improvement

When searching @imineblocks you get two channels with the same name. The channel with only one item ranks higher than the other channel with 300 items and the vanity name claim.

The one with the vanity name and more items should rank on top. The search result should also show which channel has the vanity name claim when multiple channels of the same name is listed.

Irrelevant results

While searching for "Simpsons," the search result included this video and I can't find any mention of the search term:

lbry://acl-0dLJ2uwkSXI#a1b5f77ced4dcc042478ca2d32c7de14d6c3c524

Weight search results by effective_amount

Prompted by the fact that if I search "disaster", "It's A Disaster" is currently the third result.

Add better error handling and logging!

Add better error handling and logging sowe know when something goes wrong!

Change the API doc to swagger instead of jsdoc.

The documentation should be changed to swagger, which will make us follow the OpenAPI spec.
This will make it easier for people to make API clients and give them a better overview of the API.
The code for this is close to ready, it needs some final fixing and swagger checks.

Search queries with certain parameters return "400 Bad Request"

I am currently working on lbryio/lbry-desktop#1743 and am running into a few issues. For a piece of content, I am taking the title, and performing a search to build a list of recommended content.

Example:

title: What's So Great About Jojo's Bizarre Adventure (500k Special!)
query: https://lighthouse.lbry.io/search?s=What%27s%20So%20Great%20About%20Jojo%27s%20Bizarre%20Adventure%20(500k%20Special!)&size=20&from=0
response: Bad Request

It seems ( or ) is what's causing it. If I just do a search with any query and include ( or ) I see the same result. Still looking to see what other characters cause this.

README should include information about live lighthouse instance(s)

How do I connect to them? What is the API signature and/or where are the API docs located?

(I know the answers to these, but we're fine with people using this service other than us.)

Get config values from ~/.lbrycrd/lbrycrd.conf

Get config values from ~/.lbrycrd/lbrycrd.conf instead of set values.

channels don't show up in search results

Can't seem to get any channels to show up in search results.

breaking change from Chainquery

The query used to update the claims will no longer work when OdyseeTeam/chainquery#39 is deployed. The change needed is minor modified will change to modified_at.

lighthouse.lbry.io should redirect to github repo

Make ./gendb.sh idempotent

Right now it errors if you run it a second time, saying "index exists". It should detect that it already created the db, and not try to do it a second time.

Ideally, this should be compatible with ansible somehow. Ansible needs to know when something changed vs nothing changed. For example, the script could exit with code 1 if nothing changed. Or maybe print a message to that effect.

Query breaks elastic search with pagination release.

Problematic Query

https://lighthouse.lbry.io/search?s=test&size=100&from=10000

Error: [query_phase_execution_exception] Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

Elastic Search https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index-modules.html

Defaults to index.max_result_window which defaults to 10000.

We need to prevent the edge case where someone might pass a size + from greater than 10000.

lbryio / lighthouse.js Goto Github PK

lighthouse.js's Introduction

Lighthouse - A lightning fast search for the LBRY blockchain

What does Lighthouse consist of?

API Documentation / Usage example

Installation

Prerequisites

Contributing

License

Security

Contact

lighthouse.js's People

Contributors

Stargazers

Watchers

Forkers

lighthouse.js's Issues

Filtering server side

Acceptance Criteria

Definition of Done

Recommend Projects

Recommend Topics

Recommend Org