Giter VIP home page Giter VIP logo

lighthouse.js's Introduction

Lighthouse - A lightning fast search for the LBRY blockchain

Codacy Badge MIT licensed

Lighthouse is a lightning-fast advanced search engine API for publications on the lbrycrd with autocomplete capabilities. The official lighthouse instance is live at https://lighthouse.lbry.com

What does Lighthouse consist of?

  1. Elasticsearch as a backend db server.
  2. LBRYimport, an importer that imports the claims into the Elasticsearch database.
  3. Lighthouse API server, which serves the API and does all calculations about what to send to the end user.

API Documentation / Usage example

To make a simple search by string:

https://lighthouse.lbry.com/search?s=stringtosearch

To get autocomplete suggestions:

https://lighthouse.lbry.com/autocomplete?s=stringtocomp

The full API documentation

Installation

Prerequisites

To get started you should clone the git:

git clone https://github.com/lbryio/lighthouse

Make sure elasticsearch is running and run (from the lighthouse dir):

./gendb.sh

Install dependencies:

npm run install --production=false

Build and run Lighthouse:

npm run prod

You are now up and running! You can connect to lighthouse at http://localhost:50005, api documentation is here. Lighthouse will continue syncing in the background. It usually takes ~15 minutes before all claims are up to date in the database.

Contributing

Contributions to this project are welcome, encouraged, and compensated. For more details, see lbry.com/faq/contributing

License

This project is MIT Licensed ยฉ LBRYio, Filip Nyquist

Security

We take security seriously. Please contact [email protected] regarding any security issues. Our PGP key is here if you need it.

Contact

The primary contact for this project is @tiger5226 ([email protected])

lighthouse.js's People

Contributors

filipnyquist avatar tiger5226 avatar nikooo777 avatar matwaller avatar marcdeb1 avatar tzarebczan avatar belfordz avatar lyoshenka avatar dependabot[bot] avatar lbrydocs avatar strikerrus avatar vyaspranjal33 avatar sridhareaswaran avatar codacy-badger avatar ykris45 avatar vfioox avatar

Stargazers

BZK avatar  avatar Harrison Mayotte avatar Melroy van den Berg avatar Zokije El Banditos avatar  avatar Kyros Koh avatar [sCRiPTz-TEAM] avatar  avatar  avatar  avatar  avatar Dee Cheung avatar  avatar

Watchers

John B Nelson avatar Jeremy Kauffman avatar James Cloos avatar Victor Shyba avatar  avatar Javi Rueda avatar K. Kurokawa avatar Anthony avatar Jeffrey Picard avatar  avatar  avatar Jack Robison avatar Josh Finer avatar Robert Smith avatar Alyssa Callahan avatar  avatar  avatar  avatar

lighthouse.js's Issues

Filters

Filtering server side

Is there any news / plans for filter integration,
like search only for a specific content type: audio / video / files,
or channel and tags...
(if the daemon ever support it) ๐Ÿ™ƒ

lbryio/lbry-desktop#664 (comment)

Autocomplete Query is not returning the proper results

The autocomplete query suffers from the same problems that search did previously. the value section of the elastic document is of type nested, which means the query needs to also be a nested query. The result therefore is actually only searching the name of a claim instead of search for the best auto complete term across the main fields of title, description and author. Below is an example that is current:

https://lighthouse.lbry.io/autocomplete?s=test%20a

Result:

[
"make-a-test-tube-thunderstorm",
"Make a Test Tube Thunderstorm!",
"NurdRage","a-test-of-wills-charles-todd-mobi","A Test of Wills By Charles Todd Mobi Format","upload-test-12-11-17-a","spee.ch","how-to-detect-a-secret-nuclear-test","How To Detect A Secret Nuclear Test","minutephysics"]

Now if you look at the first result make-a-test-tube-thunderstorm and search that with

https://lighthouse.lbry.io/search?s=test%20a

Result:

[
{"name":"test","claimId":"a607349ce83d3a86bac87a967ce7f9647e1ba736","value":{"claimType":"streamType","stream":{"metadata":{"preview":"","license":"Creative Commons Attribution 4.0 International","licenseUrl":"","thumbnail":"","nsfw":false,"author":"test","description":"test","language":"en","title":"test","version":"_0_1_0"},"source":{"sourceType":"lbry_sd_hash","source":"29ad218c61c599499b22c17228371b5fe9a6e725edc9ef691a7819b3e7406500467852920629fbcddbcacf13d31579d4","version":"_0_0_1","contentType":"image/png"},"version":"_0_0_1"},"version":"_0_0_1"}},
{"name":"make-a-test-tube-thunderstorm","claimId":"40d193673b0730907449a3dde387b2cdb0314eff",
...

You can see that the claim name test is first but then "name":"make-a-test-tube-thunderstorm" is second. So it is only searching the name field.

The internal server error should be a separate issue. We should not hit an error by entering a query. I will create another issue for this.

Lastly, since the elastic search query needs to be modified, getting this right takes some time, I don't think this is a level 1, so I increased it to a level 2.

Auto-deploy/release lighthouse

https://github.com/semantic-release/semantic-release

This is a really cool and popular package. Then we just need to process the github webhook when a tagged release is created to download, unzip and startup the binary.

Acceptance Criteria

Definition of Done

  • Tested against acceptance criteria
  • Tested against the assumptions of the user story
  • The project builds without errors
  • Unit tests are written and passing
  • Tests on devices/browsers listed in the issue have passed
  • QA performed & issues resolved
  • Refactoring completed
  • Any configuration or build changes documented
  • Documentation updated
  • Peer Code Review performed

Get chainquery merged and live into production.

@tiger5226 is working on integrating chainquery into lighthouse. When my review is done the next step is deployment to a live server, this means starting over with a new elasticsearch database.

My main idea was to redeploy ligthhouse to a new server, get it synced up and then change the DNS for lighthouse.lbry.io to the new one. Then let the DNS change propagate and remove the old server.

Feel free to add your ideas down below!

Prepare lighthouse for app integration.

search currently seems to work as it should and im starting to make the final changes for it to be lbry-app ready ideally i would like lbry-app to have minimal to no changes to be able to connect to lighthouse to make this happen the following needs to be done.

Required changes to lighthouse;

  • Updating to post as so it matches the prexisting call.

  • Altering the response to only return name claimId value

Nice to haves features to add;

  • add getClaimsInBlock
  • add getSuggestedClaims
  • add getTotalClaimsValue
  • Adding claim statistics calls.

spent claims show up in search results

When lighthouse queries chainquery for updated/new claims it doesn't do a check if it is spent.

  • It needs to check if the claim is spent and then delete it from elastic search so it is no longer searchable. The app will not resolve a spent claim and will only show that the position is available if someone wants to use it. cd0f980

  • Lighthouse also needs to delete claims from elastic search that are expired. cd0f980

Add weight for Claims with bids that are controlling

The elastic search query needs to be adjusted such that when there is a choice between two claims of equal weight in the search results, the one that is controlling gets a higher score which then means it returns at a higher place in the search results.

This also originated from #58 but since this is a separate issue from claim updates not being sync I pulled it into a separate issue.

if sync fails it never runs again

The sync process should continue to try on schedule, even if it hits an error. In the current state, if it hits an error the only way to launch the sync process again is restart the lighthouse service on the machine.

This issue is triggered by the investigation of #68. The claim trie had not been synced since the last deployment because it failed once when elastic search connection was not instantly available. Making sure it can retry later, is important to avoiding this in the future.

searching with lbry://xxxxx fails, bad request

I just copy/pasted this url:lbry://ee-8blgmbXfDEc into search and I noticed a bad request return in the network logs - the top result is the exactly URL, but at the bottom you see:

Search Results for lbry://ee-8blgmbXfDEc ?

lbry://ee-8blgmbXfDEc Be the first

I can see users copy/pasting URLs and running into this issue if they don't click Enter immediately.

Search ranking improvement

When searching @imineblocks you get two channels with the same name. The channel with only one item ranks higher than the other channel with 300 items and the vanity name claim.

The one with the vanity name and more items should rank on top. The search result should also show which channel has the vanity name claim when multiple channels of the same name is listed.

Irrelevant results

While searching for "Simpsons," the search result included this video and I can't find any mention of the search term:

lbry://acl-0dLJ2uwkSXI#a1b5f77ced4dcc042478ca2d32c7de14d6c3c524

Change the API doc to swagger instead of jsdoc.

The documentation should be changed to swagger, which will make us follow the OpenAPI spec.
This will make it easier for people to make API clients and give them a better overview of the API.
The code for this is close to ready, it needs some final fixing and swagger checks.

Search queries with certain parameters return "400 Bad Request"

I am currently working on lbryio/lbry-desktop#1743 and am running into a few issues. For a piece of content, I am taking the title, and performing a search to build a list of recommended content.

Example:

title: What's So Great About Jojo's Bizarre Adventure (500k Special!)
query: https://lighthouse.lbry.io/search?s=What%27s%20So%20Great%20About%20Jojo%27s%20Bizarre%20Adventure%20(500k%20Special!)&size=20&from=0
response: Bad Request

It seems ( or ) is what's causing it. If I just do a search with any query and include ( or ) I see the same result. Still looking to see what other characters cause this.

Make ./gendb.sh idempotent

Right now it errors if you run it a second time, saying "index exists". It should detect that it already created the db, and not try to do it a second time.

Ideally, this should be compatible with ansible somehow. Ansible needs to know when something changed vs nothing changed. For example, the script could exit with code 1 if nothing changed. Or maybe print a message to that effect.

Query breaks elastic search with pagination release.

Problematic Query

https://lighthouse.lbry.io/search?s=test&size=100&from=10000

Error: [query_phase_execution_exception] Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

Elastic Search https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index-modules.html

Defaults to index.max_result_window which defaults to 10000.

We need to prevent the edge case where someone might pass a size + from greater than 10000.

Include full URLs in search results

URLs are typically what will ultimately be resolved or used by a service or application utilizing the search API. So it'd probably make sense to just return these.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.