Giter VIP home page Giter VIP logo

marisa-trie's People

Contributors

s-yata avatar

Stargazers

 avatar

Watchers

 avatar  avatar

marisa-trie's Issues

Does marisa trie support wild card search

What steps will reproduce the problem?
1.It is not a issue , but i would like to ask abt an improvement.Does marisa 
trie does wild card search .

If user ypes "cat*r" does the marisa trie gives the output which contains all 
the combinationa matching to the pattern like 
'cater','caterpiller','catara','catira' etc.

Original issue reported on code.google.com by [email protected] on 19 Oct 2011 at 1:47

vector-test caused a segmentation fault

What steps will reproduce the problem?
1. Compile tests/vector-test.cc like:
$ g++ vector-test.cc -o vector-test -I../lib/ -lmarisa
(g++ 3.4.6)
2. Run the test.
3. A segmentation fault occurs.

What is the expected output? What do you see instead?
No segmentation fault occurs.

What version of the product are you using? On what operating system?

marisa-trie-0.2.0-beta4
CentOS 4.8

Please provide any additional information below.
The error seems to occur at vector-test.cc line 114.

Original issue reported on code.google.com by [email protected] on 10 May 2011 at 9:25

failing win 64bit build

Marisa-trie under win64 compiles but fails in marisa-test (segfault in 
TestTinyTrie).

I guess, the cause is that size_t is a 64bit value in win64 and marisa-trie 
assumes it's an 32bit value.

Original issue reported on code.google.com by [email protected] on 5 Aug 2013 at 7:01

License questions

1. License contains a reference to grnxx: "grnxx - An open-source fulltext 
search engine and column store."  Is this intentional?

2. BSD part contain "<ORGANIZATION>" placeholder which should be probably 
replaced with some organization name. 

3. Is it OK that I'm referring to you personally and to marisa-trie C++ library 
in a wrapper README (https://github.com/kmike/marisa-trie)? License says that 
"Neither the name of the <ORGANIZATION> nor the names of its contributors may 
be used to endorse or promote products derived from this software without 
specific prior written permission."

Original issue reported on code.google.com by [email protected] on 12 Apr 2013 at 6:47

Find All Prefixs doesn't work on Linux Mint 13 after installing thorugh PIP, but Filter By Prefix Does

What steps will reproduce the problem?
1. Put 98000 dict words in a list, loaded them in trie
2.Find all prefixes of a given key.
3.

What is the expected output? What do you see instead?
I expected to see the prefixes.  I seen single letters returned (The first 
letter of the prefix)

Filter them by prefix did work, I just used list comprehension to remove index.


What version of the product are you using? On what operating system?

I installed today through pip on Linux Mint 13 LTS Cinnamon.


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 25 Apr 2014 at 7:35

Stability of 0.2 version

Hi,
very nice job indeed.
I am wondering if version 0.2 is stable enough to be used in my projects 
(academic research in Health care)?
Do you have a roadmap of future developments?

best regards, and once again thanks for this enormous job

Original issue reported on code.google.com by [email protected] on 18 Oct 2012 at 2:29

Typo at Project Home

"The biggest advantage of libmarisa is that its **dictioanry** size is 
considerably more compact than others. See below for the dictionary size of 
other implementations.


Original issue reported on code.google.com by [email protected] on 28 Aug 2012 at 6:32

marisa-trie fails to save dictionary (mingw32)

It is reported that saving doesn't work under mingw32.

Issue #10 is a related issue.
http://code.google.com/p/marisa-trie/issues/detail?id=10

See also the following.
https://github.com/kmike/marisa-trie/issues/1#issuecomment-8135066

Original issue reported on code.google.com by [email protected] on 31 Aug 2012 at 1:21

sequential indexes

Hi ,

Is there any mechanism in marisa trie where i can retrieve the search 
candidates in an order,i mean if i search for 'b',

the search returns me

       id
b       7
ban    12
bang   13
ben    56
beng   57

and the ids are not sequential, is there a mechanism where i can retrieve the 
ids in a sequential order.
Note the strings are returned in a sequential order

Original issue reported on code.google.com by [email protected] on 27 Feb 2012 at 1:59

Compilation fails under certain circumstances in Xcode with stdio.h colliding with system include

The Xcode compiler get's 'confused' and can't include <cstdio> complaining that 
FILE is not a member of global namespace std.
After digging around for hours we suspected the file 'stdio.h' in marisa trie 
to 'confuse' the compiler and stop looking for the 'real' stdio.h.
While marisa trie includes stdio.h using two different ways - once with 
"stdio.h" and other times with <stdio.h> to distinguish between user and system 
include - this still didn't resolve our problem.

We had to rename the marisa trie stdio.h into stdio_xx.h (example name) to 
avoid this issue and change the sources accordingly.

What steps will reproduce the problem?
1. Create a project that uses marisa trie
2. Compile and 'install' the project into a directory
3. Include the marisa trie headers 

What is the expected output? What do you see instead?
Expected output was successfull compilation - we saw an error saying that 
<cstdio> in marisa's stdio.h couldn't be included

What version of the product are you using? On what operating system?
0.2.4 on OSX/iOS

Please provide any additional information below.

We'd be very happy if marisa could use 'non standard' include file names ... 
e.g. rename stdio.h, iostream, ... to something 'custom'

Thanks

Original issue reported on code.google.com by [email protected] on 17 Aug 2013 at 9:25

Alternative Python bindings

Hello,

I've created an alternative Python binding for marisa-trie: 
https://github.com/kmike/marisa-trie/

It is implemented in Cython and seems to be several times faster than included 
SWIG bindings. It is also possible to install these bindings just by "pip 
install marisa-trie", without manual downloading and compiling the library. The 
interface is closer to https://github.com/kmike/datrie than to original 
bindings; there are e.g. no Agent class.

I'll be glad if you include a link to my bindings somewhere in wiki or docs. 

Thanks for the marisa-trie, that's an impressive trie library!

Original issue reported on code.google.com by [email protected] on 17 Aug 2012 at 10:31

Marisa 0.2.4: Shared library doesn't build in MSys/MinGW64

What steps will reproduce the problem?
1. ./configure --prefix=/e/SDK/env-gcc-4.8-64bit --enable-sse2 --enable-sse3 
--enable-ssse3 --enable-sse4 --enable-sse4.1 --enable-sse4.2
2. make
3. make install

What is the expected output? What do you see instead?
Expect both shared and static libraries built. But only static library gets 
built.

What version of the product are you using? On what operating system?
Marisa 0.2.4, MSys, MinGW64, Windows 8.1 64bit.

Please provide any additional information below.
Configuring with --enable-static=no produces makefile that does nothing.
Missing rules for shared library?

Original issue reported on code.google.com by [email protected] on 27 Nov 2014 at 12:56

Mapper Constructor' mistake

What steps will reproduce the problem?
1. save a marisa-dict using save function
2. open the dictfile and read all info into a buffer
3. call map function

What is the expected output? What do you see instead?

map ok! see a exception instead.

What version of the product are you using? On what operating system?

marisa-0.1.4, Win7

Please provide any additional information below.

itseems something was wrong when I used map function:

and I found that in constructor:

Mapper::Mapper(const void *ptr, std::size_t size)
    : ptr_(ptr), origin_(NULL), avail_(size), size_(0),
      file_(NULL), map_(NULL) {
  MARISA_THROW_IF((ptr != NULL) && (size != 0), MARISA_PARAM_ERROR);
}

it should be (ptr == NULL) when throw exception, i also checked the newest beta 
version and that was ok.

Original issue reported on code.google.com by [email protected] on 30 Sep 2011 at 9:21

marisa-trie does not build under mingw32

I'm not a mingw user myself; a build error was reported here: 
https://github.com/kmike/marisa-trie/issues/1

The stat.h header from mingw: 
http://gitorious.org/mingw/mingw-runtime/blobs/6e654ca0ceb56a42ebaa23bd43b50d62c
4e4c0c1/include/sys/stat.h

_stat64 is indeed defined only #if __MSVCRT_VERSION__ >= 0x0601

Mingw default for this define is the following (for compatibility with older 
Windows):

define __MSVCRT_VERSION__ 0x0600

so _stat64 is not defined under mingw and marisa-trie build fails.

Similar issue: 
http://www.mail-archive.com/[email protected]/msg00741.html 
- the suggested fix was to manually define less restrictive __MSVCRT_VERSION__.

Original issue reported on code.google.com by [email protected] on 28 Aug 2012 at 12:55

Maximum keys in a marisa trie

Hey,

I have quite a large input consisting of about 2^32 keys that i would like to 
have a marisa trie for. I have a server with lots of memory (400GB RAM), and i 
was wondering if that was at all possible. 

right now i am getting marisa/grimoire/trie/../vector/bit-vector.h:52: 
MARISA_SIZE_ERROR: size_ == MARISA_UINT32_MAX: when trying to build. Is there 
any way to do so without having to make lots of smaller tries?

Thanks!

Original issue reported on code.google.com by [email protected] on 11 Dec 2013 at 9:54

What does id and weight infer in the below union of keyset.h

What does id and weight infer in the below union of keyset.h

union Union {
    UInt32 id;
    float weight;
  } union_;

Does id mean the id i get when i query marisa trie for a string?
If i put a weight , does the search vary ?
If varies in wht kinds , if there is a prefix and there are 10 word starting 
with that prefix , if weight is set , do i get all the 10 words and i hope i 
get 10 words sorted by weight.Is my interpreatation right?

if i want to use weight , then wht does id convey?
pls respond asap.

Original issue reported on code.google.com by [email protected] on 9 Dec 2011 at 2:37

Stepwise walking

I'm storing values in a trie with this encoding scheme:

<utf8-encoded unicode key> + chr(255) + <binary_value>

This works perfectly, but common prefix search is suboptimal with the current 
marisa-trie API.
Ideally, this should be implemented like this:

* walk to a char in a key;
* if char is not walkable, exit loop;
* test if data separator (chr(255)) is walkable;
* if it is walkable, add the current key to a list of prefixes.

Instead of this, it can be implemented like this now (pseudocode):

        while ind <= key_len:
            prefix = key[:ind]
            ag.set_query(prefix + _VALUE_SEPARATOR)
            if trie.predictive_search(ag):
                result.append(prefix)
            ind += 1

        return res

This is suboptimal because:

* there is no fail-fast if the current prefix is not in a trie;
* trie is walked from the root for each prefix.

Stepwise API would be great for making this more efficient.

Original issue reported on code.google.com by [email protected] on 23 Aug 2012 at 1:36

hardcoded archtecture list for word size

This is reported in debian.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=739126

Could you consider apply a patch or use stdint.h method?

---from---
marisa fails to build from source on s390x due to testsuite failures. It
appears that it uses an hardcoded list of architectures to determine the
size of a size_t type, which doesn't include s390x. The patch below
fixes the issue:

--- marisa-0.2.4.orig/lib/marisa/base.h
+++ marisa-0.2.4/lib/marisa/base.h
@@ -30,7 +30,7 @@ typedef uint64_t marisa_uint64;

 #if defined(_WIN64) || defined(__amd64__) || defined(__x86_64__) || \
     defined(__ia64__) || defined(__ppc64__) || defined(__powerpc64__) || \
-    defined(__sparc64__) || defined(__mips64__) || defined(__aarch64__)
+    defined(__sparc64__) || defined(__mips64__) || defined(__aarch64__) || 
defined(__s390x__)
  #define MARISA_WORD_SIZE 64
 #else  // defined(_WIN64), etc.
  #define MARISA_WORD_SIZE 32

BTW, __sparc64__ doesn't exist and should be replaced by (__sparc__ && 
__arch64__)

That said, I don't really see the point of using an hardcoded
architecture list to determine the size of a size_t type. This can be
done the following way:

| #include <stdint.h>
|
| #if SIZE_MAX == UINT64_MAX
|  #define MARISA_WORD_SIZE 64
| #else
|  #define MARISA_WORD_SIZE 32
| #endif

However as marisa is using autotools, the best way to do that would be
to add a test in configure.
------


Original issue reported on code.google.com by [email protected] on 22 Feb 2014 at 5:28

Map values

Please implement structure that can map string values to some objects. Thank 
you.

Original issue reported on code.google.com by fsqcds on 27 Apr 2014 at 9:01

Inherit Exception from std::exception

Cython wrapper is unable to provide detailed exception info because 
marisa::Exception is not a subclass of std::exception. Can you please inherit 
it from std::exception? The attached patch works for me.

Original issue reported on code.google.com by [email protected] on 30 Aug 2012 at 6:19

Attachments:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.