Giter VIP home page Giter VIP logo

myaku's People

Contributors

nickmcl avatar

Stargazers

 avatar

Watchers

 avatar

myaku's Issues

Only update queries whose cached content changed on first page cache updates

Currently, the content for a query in the first page cache is rewritten by the crawler and rescore services whenever they make any change to a found lexical item for that query in the database.

Most of the time, the changes made to found lexical items in the database by these services does not affect the first page results for a query that matches that found lexical item. This means these services are wasting a bunch of time rewriting first page cache data with the exact same data when the update the first page cache in that case.

The services should be smarter about this so that they don't rewrite the first page cache for queries when nothing will change from the currently cached data for those queries.

Fix titles that are only white space characters

If a title is only white space, it will not be rendered in the search result tiles and thus will be unclickable.

If a title is only white space, either some other contextual data such as the blog section title should be used instead of the title, or at least some placeholder needs to be used in the search result tiles to make the article link clickable.

Fix & showing up in titles instead of &

Beautiful soup replaces '&" characters in text with "&". Currently, this isn't considered at all in the crawlers, so whenever they parse text from HTML with "&" characters, the resulting strings are getting stored with the "&" characters replaced with "&".

This needs to be fixed so that parsed strings are stored with "&" characters intact.

Don't include blocks of symbol characters in article previews

The article preview generator can currently include long strings of symbols in article previews like:

☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★☆★

Although small groups of symbols like 顔文字 can be fine in previews, including long strings of symbols like that almost certainly will make for a bad preview, so long strings of symbols should be excluded during preview creation.

Add surrounding sentences to article preview in search result tiles if the sentence matching the query is very short

Currently, the article preview in the search results tiles will only show the sentence containing the matched term for the search. This means that even if the sentence is only a few characters long, that is all that will be displayed for the preview.

Previews with only a few characters are basically useless, so in order to always get a meaningful preview, it would be better to also display the preceding and/or following sentences for the sentence containing the matched term in the case that the sentence containing the matched term is very short.

Prioritize capturing full quotes in article previews when possible

The article preview generator currently does nothing special in regards to quote characters. They're just treated like any other character.

This means that article previews can easily be generated that only take a portion of a quote when the full quote could have been used while still staying within the article preview character limits.

The article preview generator should start prioritizing on preview expansion including the full quote in the preview when it already includes part of a quote.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.