lucidworks / auto-phrase-tokenfilter Goto Github PK
View Code? Open in Web Editor NEWLucene Auto Phrase TokenFilter implementation
License: Other
Lucene Auto Phrase TokenFilter implementation
License: Other
synonym: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core synonym: Plugin init failure for [schema.xml] fieldType "text_autophrase": Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory
Here is the error message.
[ivy:resolve] WARN: ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: :: UNRESOLVED DEPENDENCIES ::
[ivy:resolve] WARN: ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] WARN: :: org.restlet.jee#org.restlet;2.1.1: not found
[ivy:resolve] WARN: :: org.restlet.jee#org.restlet.ext.servlet;2.1.1: not found
[ivy:resolve] WARN: ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] report for com.lucidworks.demo#autophrase-tokenfilter;working@norvar default produced in C:\Users\xxxxx.ivy2\cache\com.lucidworks.demo-autophrase-tokenfilter-default.xml
[ivy:resolve] report for com.lucidworks.demo#autophrase-tokenfilter;working@norvar compile produced in C:\Users\xxxxx.ivy2\cache\com.lucidworks.demo-autophrase-tokenfilter-compile.xml
[ivy:resolve] report for com.lucidworks.demo#autophrase-tokenfilter;working@norvar test produced in C:\Users\xxxxx.ivy2\cache\com.lucidworks.demo-autophrase-tokenfilter-test.xml
[ivy:resolve] resolve done (11201ms resolve - 118ms download)
[ivy:resolve]
[ivy:resolve] :: problems summary ::
[ivy:resolve] :::: WARNINGS
[ivy:resolve] module not found: org.restlet.jee#org.restlet;2.1.1
[ivy:resolve] ==== central: tried
[ivy:resolve] http://repo1.maven.org/maven2/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.pom
[ivy:resolve] -- artifact org.restlet.jee#org.restlet;2.1.1!org.restlet.jar:
[ivy:resolve] http://repo1.maven.org/maven2/org/restlet/jee/org.restlet/2.1.1/org.restlet-2.1.1.jar
[ivy:resolve] module not found: org.restlet.jee#org.restlet.ext.servlet;2.1.1
[ivy:resolve] ==== central: tried
[ivy:resolve] http://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.pom
[ivy:resolve] -- artifact org.restlet.jee#org.restlet.ext.servlet;2.1.1!org.restlet.ext.servlet.jar:
[ivy:resolve] http://repo1.maven.org/maven2/org/restlet/jee/org.restlet.ext.servlet/2.1.1/org.restlet.ext.servlet-2.1.1.jar
[ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
[ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve] :: org.restlet.jee#org.restlet;2.1.1: not found
[ivy:resolve] :: org.restlet.jee#org.restlet.ext.servlet;2.1.1: not found
[ivy:resolve] ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]
[ivy:resolve]
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
BUILD FAILED
Hello,
Could we have ready compiled versions of this? I appreciate the fact that compiling from source code is foolproof, but as a .NET developer that has to just get a simple SOLR index to work on his computer, I'd like to not have to install ant and compile my own jars!
E: Compiling fails out of the box. Had to resolve to lucidworks/query-autofiltering-component#4 to fix.
When i compile with solr 5.1.0, javac throw this error:
[javac] /Java/auto-phrase-tokenfilter/src/main/java/com/lucidworks/analysis/AutoPhrasingQParserPlugin.java:113: error: no suitable constructor found for WhitespaceTokenizer(StringReader)
[javac] WhitespaceTokenizer wt = new WhitespaceTokenizer( new StringReader( input ));
[javac] ^
[javac] constructor WhitespaceTokenizer.WhitespaceTokenizer(AttributeFactory) is not applicable
[javac](actual argument StringReader cannot be converted to AttributeFactory by method invocation conversion)
[javac] constructor WhitespaceTokenizer.WhitespaceTokenizer() is not applicable
[javac](actual and formal argument lists differ in length)
[javac] Note: /Java/auto-phrase-tokenfilter/src/main/java/com/lucidworks/analysis/AutoPhrasingTokenFilter.java uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error
build successful
.after solving all the dependencies I got this error.
This is the full stack trace in solr:
java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [brands-core] at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:472) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: Unable to create core [brands-core] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:737) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434) ... 5 more Caused by: org.apache.solr.common.SolrException: Error Instantiating queryParser, com.lucidworks.analysis.AutoPhrasingQParserPlugin failed to instantiate org.apache.solr.search.QParserPlugin at org.apache.solr.core.SolrCore.<init>(SolrCore.java:820) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:659) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723) ... 7 more Caused by: org.apache.solr.common.SolrException: Error Instantiating queryParser, com.lucidworks.analysis.AutoPhrasingQParserPlugin failed to instantiate org.apache.solr.search.QParserPlugin at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:588) at org.apache.solr.core.PluginBag.createPlugin(PluginBag.java:122) at org.apache.solr.core.PluginBag.init(PluginBag.java:217) at org.apache.solr.core.PluginBag.init(PluginBag.java:206) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:764) ... 9 more Caused by: java.lang.ClassCastException: class com.lucidworks.analysis.AutoPhrasingQParserPlugin at java.lang.Class.asSubclass(Class.java:3404) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:475) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:422) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:567) ... 13 more
Does the edismax query parser have the same problem with sending each parsed token to its own token stream as described in LUCENE-2605?
Does the AutoPhrasingQParserPlugin require the use of the default lucene parser internally or can it be swapped for the edismax query parser with minimal other modifications?
Documentation related to these questions would be appreciated.
does it handle single term synonym, If yes where do we have to provide the synonyms?
do I need to add changes in synonyms.txt or autophases.txt or both?
If this software is meant to be open source, could you kindly add licensing information? Many thanks.
Hi, we've added AutoPhraseTokenFilter on our SOLR 5.1.0 installation and everything works fine but SOLR throws an error when it tries to index a attachments . Here's the trace:
Caused by: java.lang.IllegalArgumentException: position increments (and gaps) must be >= 0 (got 65534) for field 'tf_attachments_field_library_attachments'
Depends on #4
In Lucene 5.0.0, TokenFilterFactories can be discovered automatically if they are included in a file called
org.apache.lucene.analysis.util.TokenFilterFactory
in the directory META-INF/services
For reference, see how it is done in the lucene-analyzers-common jar. Then, CustomAnalyzer.Builder can be used to easily build analysis pipelines including this component.
If I understand the code correctly, phraseMap should be read-only. However, it gets altered because references to the phrase lists are leaked into currentPhrases, to which other phrases are added. Not only does this leak memory, but I wouldn't be surprised if this causes actual bugs with recognizing the phrases (false positives). To fix this, the phraseMap.get calls need to be wrapped into CharArraySet.copy. I filed a pull request in the fork from emergecds, but the same changes apply here.
Hello,
I'd like to use this plugin for Solr 7.x (Solrcloud). I was able to build the plugin for Solr 7.1.0, passing all tests. Although when I install and index content I get an error regarding the field that uses the plugin:
Remote error message: Exception writing document id 36 to the index; possible analysis error: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=15,lastStartOffset=4 for field 'autophrase_field'
This error only occurs when the phrase being indexed is greater than 2 tokens.
Is this a known issue?
Is SOLR5 (5.3.1) compatible with this filter? I've run into issues compiling it against 5.3.1 libraries:
Does anyone have a fork of this working against 5.x libraries?
Thanks!
Hello,
I am using auto-phrase-tokenfilter with solr 5 and when am using the request handler am getting the following exception : java.lang.NoSuchMethodError: org.apache.lucene.analysis.core.WhitespaceTokenizer
Any Help ?
When search for text:seat cushion the query parser does not emit "seat cushion" or "seat cushions". Therefore the search defaults to single token based search. This will work better if you check for stemmed query tokens/phrases against stemmed versions of the phrases in autophrases.txt file when deciding to emit a phrase or a single token. I think this should happen in incrementToken() method (if I understand the code).
Any chance of this working on Solr 3.6? I compiled it and it's loading when I start Solr, but if I add a com.lucidworks.analysis.AutoPhrasingTokenFilterFactor to a fieldType in my schema, I get the following error on solr/tomcat startup:
SEVERE: java.lang.NoClassDefFoundError: org/apache/lucene/analysis/util/ResourceLoaderAware at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:398) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:429) at org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:1017) at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60) at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:453) at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:433) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:490) at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:123) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:478) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:332) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:216) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:424) at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4076) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4730) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:583) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at org.apache.catalina.core.StandardHost.start(StandardHost.java:822) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.util.ResourceLoaderAware at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 57 more
Hi, we've added AutoPhraseTokenFilter on our SOLR 4.10.3 installation and everything works fine but SOLR throws an error when it tries to index a big Excel 2010 file (1MB, around 5000 lines spread across 5 tabs). Here's the trace:
SearchApiException while indexing: "500" Status: Server Error: Server Error{"responseHeader":{"status":500,"QTime":193},"error":{"msg":"Exception writing document id 81pf49-index_par_d_faut_des_n_uds-6294 to the index; possible analysis error.","trace":"org.apache.solr.common.SolrException: Exception writing document id 81pf49-index_par_d_faut_des_n_uds-6294 to the index; possible analysis error.\r\n\tat org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)\r\n\tat org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)\r\n\tat org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)\r\n\tat org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:926)\r\n\tat org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1080)\r\n\tat org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:692)\r\n\tat org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)\r\n\tat org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)\r\n\tat org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)\r\n\tat org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)\r\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)\r\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\r\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\r\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\r\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:368)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\r\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)\r\n\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)\r\n\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)\r\n\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)\r\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\r\n\tat org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\r\n\tat java.lang.Thread.run(Unknown Source)\r\nCaused by: java.lang.IllegalArgumentException: position increments (and gaps) must be >= 0 (got 65536) for field 'tm_attachments_field_fichier'\r\n\tat org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:633)\r\n\tat org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)\r\n\tat org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)\r\n\tat org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)\r\n\tat org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)\r\n\tat org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)\r\n\tat org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)\r\n\tat org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)\r\n\t... 40 more\r\n","code":500}} in SearchApiSolrConnection->checkResponse() (line 541 of C:\Program Files (x86)\Zend\Apache2\htdocs\atrium\sites\all\modules\search_api_solr\includes\solr_connection.inc).
Does the filter have an issue with large files or is there a problem with my config ?
Thank you!
Was finally able to build the project using the fork for Solr 5 with the necessary change that I checked in. But still not able to use this successfully, when I add the QueryParser into the solrconfig.xml, solr is not able to start up and throws this error:
java.lang.ClassNotFoundException: org.apache.lucene.analysis.util.ResourceLoaderAware
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.