Giter VIP home page Giter VIP logo

completely's Introduction

Description

Completely is a Java autocomplete library.

Autocomplete involves predicting a word or phrase that the user may type based on a partial query. The goal is to provide instant feedback and avoid unnecessary typing as the user formulates queries. Performance is a key issue since each keystroke from the user could invoke a query, and each query should be answered within a few milliseconds. What's more, because users often make spelling mistakes while typing, autocomplete should tolerate errors and differences in representation.

Needless to say, a standard sequential search is bound to be ineffective for anything other than small data sets. By contrast, Completely relies on text preprocessing to create an in-memory index for efficiently answering searches in large data sets. All in all, there are three fundamental components at play:

  • Analyzer function to filter, tokenize and/or transform text prior to indexing;
  • Index data structure for storing the mapping of text to the corresponding sources;
  • Automaton engine for text matching when searching;

Together these can used to tackle a variety of use cases, wherein the choice of components or combination thereof depends solely on the application at hand.

Download

All release artifacts are available for download from the Maven central repository.

Build from source

Building Completely requires Maven 3 and Java 11, or newer.

Download the source code:

git clone https://github.com/fmmfonseca/completely.git

Build the JAR package:

mvn clean package -DskipTests

Run the sample

Install artifacts to the local repository:

mvn install

Execute the main class:

mvn exec:java -pl sample

References

  • Bořivoj Melichar. Approximate String Matching by Finite Automata;
  • Gonzalo Navarro. A Guided Tour to Approximate String Matching;
  • Leonid Boytsov. Indexing Methods for Approximate Dictionary Searching: Comparative Analysis;
  • Marios Hadjieleftheriou and Divesh Srivastava. Approximate String Processing;
  • Surajit Chaudhuri and Raghav Kaushik. Extending Autocompletion To Tolerate Errors;

License

Released under The Apache Software License, Version 2.0

completely's People

Contributors

fmmfonseca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

completely's Issues

Sorting concept

Currently sorting works using the Comparator in the AutocompleteEngine.
There can be a default comparator, and one can be given to the search() method to override the default.

After sorting, there is also the limit feature to cut the result at some point.

This flexibility requires 4 public methods in the interface right now.

Unfortunately, the Comparator logic is not enough for my use case. And I believe that also for others it won't be.

No one is forced to use it, it can be left as null, and then a custom sorting can be applied after searching in the user's code.

What I need is to include the search query in the comparison. It is not enough to just look at the results isolated.

Upload to Maven Central

It would be great if you could include this library as Maven-style dependency into your project without having to clone and "install" it locally. Uploading it to Maven Central - or one of its equivalents/mirrors - would certainly help.

Regarding reverse searching

I have a string to get indexed
a. [ Panasonic lcd item] [ SampleRecord that i saved [ count:10, location: delhi]]
b. [Iphone 7 buy ] [ SampleRecord that i saved [ count:100, location: delhi]]
c. [5 seater sofa set ] [ SampleRecord that i saved [ count:50, location: new york]]
d. [iphone 7 buy] [ SampleRecord that i saved [ count:400, location: jersey]]
Here count is no. of times particular text is searched, and location is in which location this search is made.

So i want to search like this "Top 100 searches made in delhi"

Your code should return me 'a' 'b' record not 'c','d'

and also if someone searches for iphone 7 delhi, should return 'c' 'd'
Can you suggest me how to do this via your code.

Autocompletion breaks for first character after space characters

Hey,

I have been having an issue which can be reproduced even with the sample completely application. Basically, whenever a search term consists of two words, Completely will stop working if only one character is entered for the second word. To show this using an example from your sample application, here's what happens if you keep adding one character between every search:

Query: "Western"

  • Western Sahara

Query: "Western "

  • Western Sahara

Query: "Western S"
No Results

Query: "Western Sa"

  • Western Sahara

In other words:

When searching "Western S", I would expect "Western Sahara" to be returned, however, Completely returns nothing. Once one more character is added - in the case the latter "a" - Completely becomes functional again. I have looked at your source code but I have not been able to see why this happens, however, I may have just missed something obvious.

Support multiple lookup types using one index

There are different concepts built-in already for index lookups. It's very flexible. Some examples:

  • exact match, lower case:
    using a HashMultiMap in the Index, EqualityAutomaton in the index lookup, and a LowerCaseTransformer as the Analyzer in the engine
  • starts-with exact match, lower case:
    using a PatriciaTrie in the Index, EqualityAutomaton in the index lookup, and a LowerCaseTransformer as the Analyzer in the engine
  • stats-with fuzzy match, lower case:
    using a PatriciaTrie in the Index, EditDistanceAutomaton in the index lookup, and a LowerCaseTransformer as the Analyzer in the engine

In my use case, with a million of indexed entries, I want to perform "exact starts-with" matching first. If that brings good results, fine, I take it. If not, then I go ahead and thry the "fuzzy starts-with" matching.

For this I currently either need 2 indexes (not an option, and technically not necessary), or some ugly syntax.

The problem is that the IndexLookupStrategy.lookup() method is not flexible. One option would be to pass in a closure how I want the lookup to be done. Then it could be controlled from the outside.

My current solution is this:
Create a PatriciaTrie instance.
Create 2 instances of IndexAdapter, give both the same trie. One uses the EqualityAutomaton and the other the EditDistanceAutomaton.
Create 2 engines, one per IndexAdapter,
Now feed my data for indexing only to 1 engine.
Now both engines are ready for querying.

What do you think?

AutocompleteEngine.search() should require an input

Currently, the search method accepts

  • null (explicitly with the annotation @nullable)
  • empty string

Running a search(null) throws an NPE, so that's against the documentation.
Running a search("") returns 0 results, that's as expected.

Instead of fixing the NPE case, I recommend not permitting both of those inputs. Why? They are user errors. It's useless to search for nothing, it's clear from the start that nothing can be found. Therefore the best is to document this and to throw an IllegalArgumentException.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.