Giter VIP home page Giter VIP logo

simplifiedsearch's Introduction

License: MIT Quality Gate Status

SimplifiedSearch

Simple way to add ranked fuzzy matching search.
For when you have up to a few thousand products, locations or similar and want to add a search that most users will see as smart, with minimal work.

Intended use case

Searching through lists of short phrases like country names or the subject line in emails.
Data in databases must first be loaded into memory in order to be searched.

.NET support

Tested with: .NETFramework4.8, NET6.0, NET8.0

Quickstart

Install

Nuget
> dotnet add package SimplifiedSearch

Code

Use extension method .SimplifiedSearchAsync(searchTerm, propertyToSearchLambda).
propertyToSearchLambda is optional. When missing, all properties will be searched (or the value, if the value is string, Enum, int, etc).

using SimplifiedSearch;

IList<Country> countries = GetListOfCountries();
IList<Country> matches = await countries.SimplifiedSearchAsync("thaiwan", x => x.CountryName);
foreach (var country in matches)
{
    Console.WriteLine(country.CountryName);
}
// output:
// Taiwan
// Thailand

Customization

New in version 1.3.0.

// Create searcher with custom selection of final result.
public class MyCustomSelector : SimplifiedSearch.SearchPipelines.ResultSelectors.IResultSelector
{
    public Task<IList<T>> RunAsync<T>(IList<SimilarityRankItem<T>> rankedList) => ...
}
SimplifiedSearchFactory.Instance.Add("MyCustomSearcher",
    c => c.ResultSelector = new MyCustomSelector());
var simplifiedSearch = SimplifiedSearchFactory.Instance.Create("MyCustomSearcher");
var searchResults = await simplifiedSearch.SimplifiedSearchAsync(list, "searchTerm");

// Override the default searcher, also used by the extension methods.
SimplifiedSearchFactory.Instance.Add(SimplifiedSearchFactory.DefaultName,
    c => c.ResultSelector = new MyCustomSelector());
var searchResults = await list.SimplifiedSearchAsync("searchTerm");

Acknowledgements

Inspiration

Lucenenet is the main inspiration for SimplifiedSearch.
SimplifiedSearch was originally started with the goal of delivering similar results to a spesific setup of Lucene analyzer and query.

Enablers

Provides the distance calculation needed for fuzzy search.
License: MIT https://github.com/DanHarltey/Fastenshtein/blob/master/LICENSE.

Provides the ascii folding needed to match accented characters to their ascii approximate equivalent (â, å, à, á, ä ≈ a).
License: MIT https://github.com/thecoderok/Unidecode.NET/blob/master/LICENSE.

Contributing

Bug reports, feature requests and pull requests are welcome.

  • The focus of the project is in making the simple use case work well, not on supporting many special cases.
  • For significant changes, make an issue for discussion before putting significant work into the change.
  • Follow the established code format.

simplifiedsearch's People

Contributors

tommysor avatar dependabot[bot] avatar

Stargazers

Joseph Finney avatar SandRock avatar Anıl Kaynar avatar  avatar

Watchers

 avatar

Forkers

sandrock

simplifiedsearch's Issues

Support word order relevance

In a list of items with the form "Book 1 Page 2".
Searching for multiple words (like "Book 1") would give many results.
Relevance would be equal for "Book 1 Page 2" and "Book 2 Page 1" (both items contain both search terms).

Search results would be much more relevant if it takes word order into consideration.

const string book1Page2 = "Book 1 Page 2";
const string book2Page1 = "Book 2 Page 1";

var list = new[]
{
    book1Page2,
    book2Page1
};

var actual = await list.SimplifiedSearchAsync("book 2");

var actual1 = actual[0];
Assert.Equal(book2Page1, actual1);

Assert.True(actual.Count >= 2, "Did not get a second result");
var actual2 = actual[1];
Assert.Equal(book1Page2, actual2);

Add adversarial tests

  • Many words in index
  • Many words in searchTerm
  • One very long word in index
  • One very long word in index
  • Injection attack

Prioritize exact match over ascii folded match

Search results would be more relevant if they take into account the original word (before ascii folding).

const string asciiName = "Nina";
const string accentedName = "Niña";

var list = new[]
{
    asciiName,
    accentedName
};

var actual = await list.SimplifiedSearchAsync(accentedName);

var actual1 = actual[0];
Assert.Equal(accentedName, actual1);

Return hits from short search terms

Currently starting full search from search term of at least 4 characters.
Would be more user friendly to start returning hits with shorter search terms, especially when the list being searched consists of short strings.

Way to identify a best result?

Hello,

I am discovering the project. It fits well for my use cases. I have a special need though.

I would like to be able to identify a single best result from a search.

private static List<string> GetSampleData()
{
    var data = new List<string>();
    data.Add("Internals");
    data.Add("Super internal");
    data.Add("Extra Internals");
    data.Add("Extra things");
    return data;
}

[Fact]
public async Task Search1_ExpectBestResult()
{
    var data = GetSampleData();
    // here my query is precise enough I would like only one match
    var result = await data.SimplifiedSearchAsync("extra internal"); // add option to only get the "best" match?
    Assert.Collection(result, x => Assert.Equal("Extra Internals", x));
    // FAILS: Collection: ["Extra Internals", "Super internal", "Extra things", "Internals"]
}

[Fact]
public async Task Search2_ExpectBestResult()
{
    var data = GetSampleData();
    // here my query is precise enough I would like only one match
    var result = await data.SimplifiedSearchAsync("internals"); // add option to only get the "best" match?
    Assert.Collection(result, x => Assert.Equal("Internals", x));
    // FAILS: Collection: ["Internals", "Extra Internals", "Super internal"]
}

[Fact]
public async Task Search3_NoBestResult()
{
    var data = GetSampleData();
    // here my query is not precise enough to find one best match
    var result = await data.SimplifiedSearchAsync("extra");
    Assert.Collection(
        result,
        x => Assert.Equal("Extra Internals", x),
        x => Assert.Equal("Extra things", x));
}

I do not known (yet) how the search system works. Is there a ranking for search results that I might use to get that result?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.