tamingtext / book Goto Github PK
View Code? Open in Web Editor NEWTaming Text Book Source Code
Home Page: http://www.tamingtext.com
Taming Text Book Source Code
Home Page: http://www.tamingtext.com
Taming Text, by Grant Ingersoll, Thomas Morton and Drew Farris is designed to teach software engineers the basic concepts of working with text to solve search and Natural Language Processing problems. The book focuses on teaching using existing open source libraries like Apache Solr, Apache Mahout and Apache OpenNLP to manipulate text. To learn more, visit http://www.manning.com/ingersoll. Getting Started --------------- Throughout this document, TT_HOME is the directory containing the checkout of the Taming Text code base. Taming Text uses Maven for building and running the code. To get started, you will need: 1. JDK 1.6+ 2. Maven 3.0 or higher 3. The OpenNLP English models, available at http://maven.tamingtext.com/opennlp-models/models-1.5. Place the models in the TT_HOME directory in a directory named opennlp-models. This can be done by using the following commands on UNIX from the TT_HOME directory: mkdir opennlp-models cd opennlp-models wget -nd -np -r http://maven.tamingtext.com/opennlp-models/models-1.5/ rm index.html* or using wget (https://eternallybored.org/misc/wget/) and 7-Zip (http://www.7-zip.org/) on windows (both must be added to the path environment variable): md opennlp-models cd opennlp-models wget -nd -np -r http://maven.tamingtext.com/opennlp-models/models-1.5/ del index.htm* 4. Get WordNet 3.0 and place it in the TT_HOME directory. This can be done by using the following commands on UNIX from the TT_HOME directory: wget -nd -np -m http://maven.tamingtext.com/wordnet/ rm index.html* tar -xf WordNet-3.0.tar.gz or using wget (https://eternallybored.org/misc/wget/) and 7-Zip (http://www.7-zip.org/) on windows (both must be added to the path environment variable): wget -nd -np -r http://maven.tamingtext.com/wordnet/ del index.html* 7z x WordNet-3.0.tar.gz 7z x WordNet-3.0.tar Building the Source ------------------- Prior to building the source, for those previously unfamiliar with Maven, it may be wise to read this to avoid future hassles: http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html To build the source, in TT_HOME: mvn clean package Running the Examples -------------------- Many of the examples can be run via the 'tt' script in the TT_HOME/bin directory. Running this script without arguments will display a list of the example names. Some of the samples are powered by pre-configured instances of solr. These can be started with the TT_HOME/bin/start-solr.sh script, which takes a single argument, the name of the instance to start. Available instances include solr-qa, solr-clustering and solr-tagging.
hello Mr., I am a translator, I want to translate English into Indonesian
I had the same problem that was mentioned in this forum post.
The solution seems to be to modify frankenstein.cmd to include target/dependency in CLASSPATH, as per frankenstein.sh.
Hi,
I've setup environment with Answer source code.
While running Solr with these parameters (-Xmx1024m -Dsolr.solr.home=c:\KMS\QA-taming\tamingText-src\apache-solr\solr-qa -Dsolr.data.dir=c:\KMS\QA-taming\tamingText-src\apache-solr\solr-qa\data -Dmodel.dir=c:\KMS\QA-taming\opennlp-models -Dwordnet.dir=c:\KMS\QA-taming\WordNet-3.0)
I'm getting the below error.
Please assist.
Regards,
Moshe
Apr 07, 2014 7:10:06 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.ClassCastException: com.tamingtext.texttamer.solr.SentenceTokenizerFactory cannot be cast to org.apache.solr.analysis.TokenizerFactory
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:966)
at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:986)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:60)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:453)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:433)
at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:490)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:123)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:481)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:335)
at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:165)
at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:653)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1239)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:466)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:222)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
I get the following test failures when building on Windows 7 x64, JDK 7.0_17, MVN 3.0.5. I get the same errors regardless of using the windows command line or buidling from cygwin.
https://gist.github.com/developmentalmadness/5110401
https://gist.github.com/developmentalmadness/5110276
https://gist.github.com/developmentalmadness/5110299
I wouldn't worry too much except I can't run the first example, frankenstein.cmd either:
C:\dev\github\tamingtextbook>"C:\Program Files\Java\jdk1.7.0_17\bin\java" -Xms512m -Xmx1024m -classpath ";.\target\test-
classes" com.tamingtext.frankenstein.Frankenstein
Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/tools/sentdetect/SentenceDetector
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2451)
at java.lang.Class.getMethod0(Class.java:2694)
at java.lang.Class.getMethod(Class.java:1622)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: opennlp.tools.sentdetect.SentenceDetector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 6 more
C:\dev\github\tamingtextbook>ENDLOCAL
Page 269 lines 4-5 states:
To answer a question like "When was Einstein born?" they suggest patterns like "$<$NAME$>$ was born in $<$LOCATION$>"
From the surrounding context, it looks like "When" should be replaced with "Where".
(I'll take a free copy as payment for my amazing editorial skills. grin)
When i run "mvn clean package" getting error messag like below on windows 8 PC.
cygwin warning:
MS-DOS style path detected: C:\apache-maven-3.0.4/boot/
Preferred POSIX equivalent is: /cygdrive/c/apache-maven-3.0.4/boot/
CYGWIN environment variable option "nodosfilewarning" turns off this warning.
Consult the user's guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Taming Text Source 0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: http://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.4.1/maven-clean-plugin-2.4.1.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.510s
[INFO] Finished at: Wed Dec 26 22:56:21 EST 2012
[INFO] Final Memory: 7M/309M
[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.apache.maven.plugins:maven-clean-plugin:2.4.1 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-clean-plugin:jar:2.4.1: Could not transfer artifact org.apache.maven.plugins:maven-clean-plugin:pom:2.4.1 from/to central (http://repo.maven.apache.org/maven2): Connection to http://repo.maven.apache.org refused: connect: Address is invalid on local machine, or port is not valid on remote machine -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
Hi
I tried the TrainMaxent program.
When I executes the program it says CategoryDataStream cannot be cast to opennlp.tools.util.ObjectStream
Any hint about the root cause of this problem
Hello,
I am trying to run the instance solr-qa but I take the following error message:
Error loading class 'com.tamingtext.texttamer.solr.SentenceTokenizerFactory'.
I am able to run correctly the instances solr-clustering and solr-tag.
My system is CentOS.
Find below the output of "bin/start-solr.sh solr-qa"
https://gist.github.com/liberisp/7099468
Thank you in advance
Not sure, where should I put it.
Just checked this morning and it is hacked.
If the removeConflicts method from NameFinderTest is called with an empty list an exception is thrown.
There's really no need to do anything in removeConflicts unless we have more than one item passed into the method via the list argument, so exit early if the list has less than 2 entries.
What is the reason for getting incomplete sentences in answers?
The sentences are randomly ended. How can this be solved?
hi,
while I am reading your book and following your instruction, I found some execution error
So I am writing to you
/book/bin/frankenstein.cmd file has an error.
your source is as shown below
6: for %%i in (..\lib*.jar) do set CLASSPATH=!CLASSPATH!;%%i
-- you should change like this
-- for %%i in (.\target\dependency*.jar) do set CLASSPATH=!CLASSPATH!;%%i
thanks.
java.io.FileNotFoundException: c:\projects\SolrWatson\TT-Home\WordNet-3.0\dict\verb.idx (The system cannot find the file specified) That error pop in several surefire reports.
From here: http://www.shiffman.net/teaching/a2z/wordnet/
"Just found the way to fix this: rename all index.noun, index.verb... to noun.idx, verb.idx..."
That response is from 2 years ago.
I copied in the index.noun etc files, renamed as noun.idx,, etc, then got this failure:
FileNotFoundException: c:\projects\SolrWatson\TT-Home\WordNet-3.0\dict\verb.dat (The system cannot find the file specified)
Did the same thing to data.noun, etc.
Those tests run fine. But, for now, I will not know if the system works since I'm running on Win7 64bit, with this failure not satisfied by anything I do:
Failed to set permissions of path: \tmp\hadoop-Admin\mapred\staging\Admin1270388141.staging to 0700
which is launched by the ExtractTrainingDataTest at the line: TrainClassifier.main(trainArgs);
rerunning even after giving that directory and sub directories full permissions, still fails. Perhaps org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) either doesn't check first, or doesn't understand Windows, or something. I'll have to wait till I replicate this experience on a *nix box.
Added later: with my wordnet changes, it builds fine on Ubuntu. Have yet to run exercises. No clue what those changes do to wordnet's behaviors.
Hello,
I have followed the readme to build the code downloaded from master branch yesterday. I keep getting the following errors during the test phase of maven build. It seems only 3 of the unit tests failed. I want to check if these are known issues or if there is any workaround so that I can complete the build and test out the sample QA system.
I am using Windows 2008 R2, JDK 1.7.0_05, Maven 3.2.2
Thanks
-Jimmy
Running com.tamingtext.carrot2.Carrot2ExampleTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.192 sec
Running com.tamingtext.classifier.bayes.BayesUpdateRequestProcessorTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.566 sec
Running com.tamingtext.classifier.bayes.ExtractTrainingDataTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.961 sec <<< FAILURE!
Running com.tamingtext.classifier.mlt.MoreLikeThisQueryTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.565 sec
Running com.tamingtext.fuzzy.LevenshteinDistanceTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.171 sec
Running com.tamingtext.fuzzy.OverlapMeasuresTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.737 sec
Running com.tamingtext.fuzzy.TrieNodeTest
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.182 sec
Running com.tamingtext.mahout.VectorExamplesTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.134 sec
Running com.tamingtext.opennlp.AnswerTypeTest
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 2.956 sec <<< FAILURE!
Running com.tamingtext.opennlp.ChunkParserTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.611 sec
Running com.tamingtext.opennlp.NameFinderTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.591 sec
Running com.tamingtext.opennlp.ParserTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.751 sec
Running com.tamingtext.opennlp.POSTaggerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.843 sec
Running com.tamingtext.qa.PassageRankingComponentTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.974 sec
Running com.tamingtext.qa.QATest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 53.369 sec <<< FAILURE!
Running com.tamingtext.sentences.SentenceDetectionTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.329 sec
Running com.tamingtext.snowball.SnowballStemmerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.189 sec
Running com.tamingtext.solr.SolrJTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.192 sec
Running com.tamingtext.texttamer.solr.NameFilterTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.377 sec
Running com.tamingtext.texttamer.solr.SentenceTokenizerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.457 sec
Running com.tamingtext.tika.TikaTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.889 sec
Running com.tamingtext.util.StringUtilTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.221 sec
Results :
Tests in error:
Tests run: 43, Failures: 0, Errors: 4, Skipped: 0
I am getting a build failure during mvn package command.
What I dont get in this error is that it could not resolve the carrot artifact in openNLP. Why would openNLP have Carrot.
I am appreciating this book quite a bit (read it yesterday cover to cover) and looking forward to testing
Thanks,
Eric
Error Below:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24.332s
[INFO] Finished at: Wed Jan 16 11:15:15 EST 2013
[INFO] Final Memory: 6M/81M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project taming-text: Could not resolve dependencies for project com.tamingtext:taming-text:jar:0.1-SNAPSHOT: Could not find artifact org.carrot2:carrot2-core:jar:3.6.0-SNAPSHOT in opennlp (http://opennlp.sourceforge.net/maven2/) -> [Help 1]
The repositories hosting various components seem to be awfully slow. Here is a sample of some of the download speeds being reported by "mvn clean package":
Downloaded: http://repo.maven.apache.org/maven2/org/apache/ant/ant-junit/1.7.1/ant-junit-1.7.1.pom (4 KB at 0.2 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/apache/ant/ant-parent/1.7.1/ant-parent-1.7.1.pom (5 KB at 0.3 KB/sec)
Downloaded: http://repo.maven.apache.org/maven2/org/apache/ant/ant/1.7.1/ant-1.7.1.pom (10 KB at 0.6 KB/sec)
Its not my network, I checked. Network speeds tested using speedtest-cli:
Download: 24.57 Mbit/s
Upload: 10.10 Mbit/s
so it appears that download speed is being throttled on the server side. Looking inside the pom.xml showed that the maven2 repo was pointing at this URL:
http://people.apache.org/maven-snapshot-repository
which probably works, but is not backed up with enough hardware compared to http://repo1.maven.org/maven2/ (which is the central maven2 repo based on this web page (http://www.mkyong.com/maven/where-is-maven-central-repository/).
After switching the url and rerunning "mvn clean package", the download speeds are significantly higher.
Downloaded: http://repo1.maven.org/maven2/org/codehaus/jackson/jackson-mapper-asl/1.4.0/jackson-mapper-asl-1.4.0.pom (2 KB at 15.4 KB/sec)
Downloaded: http://repo1.maven.org/maven2/com/googlecode/json-simple/json-simple/1.1/json-simple-1.1.pom (2 KB at 19.6 KB/sec)
While in the pom.xml, also noticed that the sf.net repository configured for OpenNLP. Nowadays OpenNLP is also available on maven central, so both repository definitions identified by the ids below can be safely removed from the pom.xml, and will result in much faster setup.
opennlp
apache.maven2.snapshot.repository
Still takes upwards of an hour to run the command to completion. You may want to mention that in the README.
When trying to run the curl samples against the Word docs from Chapter 3 a Lazy Loading error occurs.
curl "http://localhost:8983/solr/update/extract?&extractOnly=true" \
-F "myfile=@src/test/resources/sample-word.doc"
Causes error similar to the following...
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 500 lazy loading error
org.apache.solr.common.SolrException: lazy loading error
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:260)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extraction.ExtractingRequestHandler'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:394)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:419)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:455)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:251)
... 21 more
Caused by: java.lang.ClassNotFoundException: solr.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:378)
... 24 more
</title>
</head>
when running command
curl "http://localhost:8983/solr/update/extract?&extractOnly=true" -F "myfile=@src/test/resources/sample-word.doc"
i got solr exception
C:\_Work\_git\book>curl "http://localhost:8983/solr/update/extract?&extractOnly= true" -F "myfile=@src/test/resources/sample-word.doc" <title>Error 500 lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWra ppedHandler(RequestHandlers.java:260) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 99) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 66) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j ava:582) Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extra ction.ExtractingRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:394) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:419) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:455) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWra ppedHandler(RequestHandlers.java:251) ... 21 more Caused by: java.lang.ClassNotFoundException: solr.extraction.ExtractingRequestHa ndler at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:378) ... 24 more </title>
HTTP ERROR 500
Problem accessing /solr/update/extract. Reason:
lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWra ppedHandler(RequestHandlers.java:260) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 99) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 66) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j ava:582) Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.extra ction.ExtractingRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:394) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:419) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:455) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWra ppedHandler(RequestHandlers.java:251) ... 21 more Caused by: java.lang.ClassNotFoundException: solr.extraction.ExtractingRequestHa ndler at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:378)
readme is not very windows friendly
Some of us are stuck with windows but the readme does not event give a hint on how to download opennlp-models and wordnet on a windows machine.
At least a hint to GnutWget (or a link to https://eternallybored.org/misc/wget/ ) would be niceError: Could not find or load main class com.tamingtext.frankenstein.Frankenstein
Hi,
i got this problem on mac osx when trying to run the shell frankenstein.sh on the bin folder.
i tried mvn package after that but it didn't work, i thought it was a dependency or something.thanks in advance.
making things work with jdk 1.8
I checked out the code did a mvn eclipse:eclipse and imported the project into a workspace. Now
Running into this with eclipse,
Unbound classpath container: 'JRE System Library [JavaSE-1.6]' in project 'taming-text'
is it possible to build the bits with 1.8?UnsupportedOperationException from SplitInput
Exception in thread "main" java.lang.UnsupportedOperationException at java.util.Collections$SingletonSet$1.remove(Collections.java:3087) at java.util.AbstractCollection.clear(AbstractCollection.java:396) at org.apache.mahout.common.IOUtils.close(IOUtils.java:137) at com.tamingtext.util.SplitInput.countLines(SplitInput.java:583)
The semantics of
IOUtils.close(..)
have changed slightly betwen mahout 0.4 and 0.6.close()
must be passed a mutable Collection because it removes elements from the collection as it successfully closes them. As suchCollections.singletonSet(writable)
is no longer a valid parameter and results in an UnsupportedOperationExceptiondata import command on page 149 does not complete
I followed the directions on the github main page to download the code, ran mvn package, and went to the bin directory and ran ./start-solr.sh solr-clustering &. So far so good.
But when I went to http://localhost:8983/solr/dataimport?command=full-import, but the data import could not complete. The text in the status message was "Indexing failed. Rolled back all changes." In the console, I found this error message:
SEVERE: Exception thrown while getting data
java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.startribune.com/sports/index.rss2
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
...For whatever reason, the script seems unable to get to the url http://www.startribune.com/sports/index.rss2. However, I can get to the url from the browser window. Is there a known issue with getting to this page from the solr example?
I am trying to do Mahout clustering and have been getting errors when trying to cluster some other documents (the docs that came with solr 4.6.1) so now I am trying to follow the book's examples exactly so I could guarantee that the clustering process will run correctly.
“NoSuchMethodErrors” due to multiple versions of commons-codec:commons-codec:jar
Issue description
Hi, there are multiple versions of commons-codec:commons-codec in book-master. As shown in the following dependency tree, according to Maven "nearest wins" strategy, only commons-codec:commons-codec:1.6 can be loaded, commons-codec:commons-codec:1.2, commons-codec:commons-codec:1.5 and commons-codec:commons-codec:1.4 will be shadowed.
However, several methods defined in shadowed version commons-codec:commons-codec:1.2, commons-codec:commons-codec:1.5 and commons-codec:commons-codec:1.4 are referenced by client project via org.apache.mahout:mahout-core:0.6, org.apache.mahout:mahout-integration:0.6, org.apache.solr:solr-solrj:3.6.0, org.apache.tika:tika-parsers:0.10 and org.carrot2:carrot2-core:3.6.0 but missing in the actually loaded version commons-codec:commons-codec:1.6.
For instance, the following missing method(defined in commons-codec:commons-codec:1.2, commons-codec:commons-codec:1.5 and commons-codec:commons-codec:1.4) are actually referenced by book-master, which will introduce a runtime error(i.e., "NoSuchMethodError") into book-master.
- <org.apache.commons.codec.binary.Base64: java.lang.String encodeToString(byte[])> is invoked by book-master via the following path:
path-- <com.tamingtext.util.SplitInput: void splitFile(org.apache.hadoop.fs.Path)> com.tamingtext:taming-text:0.1-SNAPSHOT; <org.apache.hadoop.fs.FileSystem: org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.Path)> org.apache.hadoop:hadoop-core:0.20.204.0; <org.apache.hadoop.hdfs.HftpFileSystem: org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.Path,int)> org.apache.hadoop:hadoop-core:0.20.204.0; <org.apache.hadoop.hdfs.HftpFileSystem: java.net.HttpURLConnection openConnection(java.lang.String,java.lang.String)> org.apache.hadoop:hadoop-core:0.20.204.0; <org.apache.hadoop.hdfs.HftpFileSystem: java.lang.String updateQuery(java.lang.String)> org.apache.hadoop:hadoop-core:0.20.204.0; <org.apache.hadoop.security.token.Token: java.lang.String encodeToUrlString()> org.apache.hadoop:hadoop-core:0.20.204.0; <org.apache.hadoop.security.token.Token: java.lang.String encodeWritable(org.apache.hadoop.io.Writable)> org.apache.hadoop:hadoop-core:0.20.204.0; <org.apache.commons.codec.binary.Base64: java.lang.String encodeToString(byte[])>
Suggested fixing solutions:
- Change direct dependency commons-codec:commons-codec from 1.6 to 1.4. Because version 1.4 includes the above missing methods and is compatible with other versions of commons-codec:commons-codec in the project.
- Use configuration to unify the version of library commons-codec:commons-codec to be 1.4 in the pom file.
Please let me know which solution do you prefer? I can submit a PR to fix it.
Thank you very much for your attention.
Best regards,Dependency tree----
[INFO] | | \- (commons-codec:commons-codec:jar:1.2:compile - omitted for conflict with 1.6) [INFO] | | \- (commons-codec:commons-codec:jar:1.6:compile - omitted for duplicate) [INFO] | +- commons-codec:commons-codec:jar:1.6:compile [INFO] | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.6) [INFO] | | \- (commons-codec:commons-codec:jar:1.5:compile - omitted for conflict with 1.6) [INFO] | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.6) [INFO] | | +- (commons-codec:commons-codec:jar:1.2:compile - omitted for conflict with 1.6) [INFO] | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.6) [INFO] | | \- (commons-codec:commons-codec:jar:1.4:compile - omitted for conflict with 1.6)
Unable to search queries which use synonyms.txt
After following up the solr-qa application steps, I was able to run the application and get answers to my questions.
However I wasn't able to get answers if the questions had synonyms instead of the original words. I populated the synonyms.txt file like this -Profile,account
edit,change,Configure,setup,create,establishFrankenstein run error
Hi,
I've downloaded source code and received the following error on attempting to run the frankenstein.sh script.
Initializing Frankenstein Exception in thread "main" java.io.FileNotFoundException: ../../opennlp-models at com.tamingtext.frankenstein.Frankenstein.init(Frankenstein.java:226) at com.tamingtext.frankenstein.Frankenstein.main(Frankenstein.java:72)
- opennlp-models is located in the TT_HOME folder and was downloaded as instructed
- I'd look myself and see what problem is at lines 76 and 226 but I can't seem to find the frankenstein.java file (just the compiled .class file)
Really looking forward to the digging into the book and thanks in advance for your help.
Use velocity from target/dependencies instead of apache-solr/contrib
I think that everything we need is in target/dependencies, so there should be no need to maintain a separate copy of apache-solr-velocity and friends in apache-solr/contrib/velocity as well.
Recommend Projects
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow
An Open Source Machine Learning Framework for Everyone
Django
The Web framework for perfectionists with deadlines.
Laravel
A PHP framework for web artisans
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
Recommend Topics
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web
Some thing interesting about web. New door for the world.
server
A server is a program made to process requests and deliver data to clients.
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization
Some thing interesting about visualization, use data art
Game
Some thing interesting about game, make everyone happy.
Recommend Org
We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba
Alibaba Open Source for everyone
D3
Data-Driven Documents codes.
Tencent
China tencent open source team.