germanattanasio / answer-retrieval Goto Github PK

Sample application that shows you how to create your own answer retrieval application for StackExchange, using custom features from the Retrieve and Rank service.

Home Page: https://answer-retrieval.mybluemix.net/

License: Apache License 2.0

JavaScript 4.56% Shell 2.44% Python 27.20% Jupyter Notebook 43.95% CSS 18.62% HTML 3.23%

answer-retrieval's Introduction

German Attanasio

CTO & software dev w/ expertise in distributed apps, microservices, cloud, & container orchestration.

⤷ I am a highly skilled software engineer with extensive experience building and maintaining distributed applications. With a solid software development and architecture background, I can lead teams to deliver robust and scalable systems.

⚐ Based in New York

ϟ Currently CTO @ Moveo.ai

ϟ Recently IBM Watson, IBM Research, UNICEN

Skills

answer-retrieval's People

Contributors

Stargazers

Watchers

answer-retrieval's Issues

Error with Notebook #2

In the second notebook, step 2 of step 5 tells you to do something to a file, but not how to do it.
The whole of the second notebook cannot be run. When the second step is run, it immediately comes back with:
“Directory '.' is not installable. File 'setup.py' not found.
Requirement 'retrieve_and_rank_scorer-1.0-py2-none-any.whl' looks like a filename, but the file does not exist
Processing ./retrieve_and_rank_scorer-1.0-py2-none-any.whl
Exception:
Traceback (most recent call last):
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/commands/install.py", line 310, in run
wb.build(autobuilding=True)
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/wheel.py", line 750, in build
self.requirement_set.prepare_files(self.finder)
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/req/req_set.py", line 370, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/req/req_set.py", line 587, in _prepare_file
session=self.session, hashes=hashes)
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/download.py", line 798, in unpack_url
unpack_file_url(link, location, download_dir, hashes=hashes)
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/download.py", line 705, in unpack_file_url
unpack_file(from_path, location, content_type, link)
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/utils/init.py", line 599, in unpack_file
flatten=not filename.endswith('.whl')
File "/Users/mwassel/miniconda2/lib/python2.7/site-packages/pip/utils/init.py", line 482, in unzip_file
zipfp = open(filename, 'rb')

IOError: [Errno 2] No such file or directory: '/Users/mwassel/Downloads/answer-retrieval-master/notebooks/retrieve_and_rank_scorer-1.0-py2-none-any.whl’”
in the output terminal that is running the notebook. After that step, all other steps come back with similar responses.

README missing instructions

Should make it clear that you must go through notebooks in order for the application to have any functionality. Without them, the app still starts, but hangs on “Retrieving and ranking answers…”

SolrError: Populating the default RR collection causes error HTTP 400: unknown field 'username'

Hello
in the AnswerRetrieval notebook I want to populate the RR Collection with the default solrDocuments.json containing stackexchange travel data. I get an error which points me to a missing attribute 'username' in schema.xml, but it is there. Below is the schema.xml and the error trace. Maybe it helps to say that before I couldn't create the RR collection via notebook, I did it in Bluemix. Indicates an auth problem, but I don't know. Thanks for any help in advance.

-----------------------------------schema.xml---------------------------

<fieldType name="watson_text_en" indexed="true" stored="true" class="com.ibm.watson.hector.plugins.fieldtype.WatsonTextField">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
</fieldType>

<!-- field type definitions. The "name" attribute is
   just a label to be used by field definitions.  The "class"
   attribute and any other attributes determine the real
   behavior of the fieldType.
     Class names starting with "solr" refer to java classes in a
   standard package such as org.apache.solr.analysis
-->

<!-- The StrField type is not analyzed, but indexed/stored verbatim.
   It supports doc values but in that case the field needs to be
   single-valued and either required or have a default value.
  -->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

<!-- sortMissingLast and sortMissingFirst attributes are optional attributes are
     currently supported on types that are sorted internally as strings
     and on numeric types.
     This includes "string","boolean", and, as of 3.5 (and 4.x),
     int, float, long, date, double, including the "Trie" variants.
   - If sortMissingLast="true", then a sort on this field will cause documents
     without the field to come after documents with the field,
     regardless of the requested sort order (asc or desc).
   - If sortMissingFirst="true", then a sort on this field will cause documents
     without the field to come before documents with the field,
     regardless of the requested sort order.
   - If sortMissingLast="false" and sortMissingFirst="false" (the default),
     then default lucene sorting will be used which places docs without the
     field first in an ascending sort and last in a descending sort.
-->    

<!--
  Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.

  These fields support doc values, but they require the field to be
  single-valued and either be required or have a default value.
-->
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

<!--
 Numeric field types that index each value at various levels of precision
 to accelerate range queries when the number of values between the range
 endpoints is large. See the javadoc for NumericRangeQuery for internal
 implementation details.

 Smaller precisionStep values (specified in bits) will lead to more tokens
 indexed per value, slightly larger index size, and faster range queries.
 A precisionStep of 0 disables indexing at different precision levels.
-->
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>

<!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
     is a more restricted form of the canonical representation of dateTime
     http://www.w3.org/TR/xmlschema-2/#dateTime    
     The trailing "Z" designates UTC time and is mandatory.
     Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
     All other components are mandatory.

     Expressions can also be used to denote calculations that should be
     performed relative to "NOW" to determine the value, ie...

           NOW/HOUR
              ... Round to the start of the current hour
           NOW-1DAY
              ... Exactly 1 day prior to now
           NOW/DAY+6MONTHS+3DAYS
              ... 6 months and 3 days in the future from the start of
                  the current day
                  
     Consult the TrieDateField javadocs for more information.

     Note: For faster range queries, consider the tdate type
  -->
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>

<!-- A Trie based date field for faster date range queries and date faceting. -->
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>


<!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
<fieldType name="binary" class="solr.BinaryField"/>

<!-- The "RandomSortField" is not used to store or search any
     data.  You can declare fields of this type it in your schema
     to generate pseudo-random orderings of your docs for sorting 
     or function purposes.  The ordering is generated based on the field
     name and the version of the index. As long as the index version
     remains unchanged, and the same field name is reused,
     the ordering of the docs will be consistent.  
     If you want different psuedo-random orderings of documents,
     for the same version of the index, use a dynamicField and
     change the field name in the request.
 -->
<fieldType name="random" class="solr.RandomSortField" indexed="true" />

<!-- solr.TextField allows the specification of custom text analyzers
     specified as a tokenizer and a list of token filters. Different
     analyzers may be specified for indexing and querying.

     The optional positionIncrementGap puts space between multiple fields of
     this type on the same document, with the purpose of preventing false phrase
     matching across fields.

     For more info on customizing your analyzer chain, please see
     http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 -->

<!-- One can also specify an existing Analyzer class that has a
     default constructor via the class attribute on the analyzer element.
     Example:
<fieldType name="text_greek" class="solr.TextField">
  <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
</fieldType>
-->

<!-- A text field that only splits on whitespace for exact matching of words -->
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  </analyzer>
</fieldType>

<!-- A general text field that has reasonable, generic
     cross-language defaults: it tokenizes with StandardTokenizer,
 removes stop words from case-insensitive "stopwords.txt"
 (empty by default), and down cases.  At query time only, it
 also applies synonyms. -->
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>




<!-- A text field with defaults appropriate for English, plus
 aggressive word-splitting and autophrase features enabled.
 This field is just like text_en, except it adds
 WordDelimiterFilter to enable splitting and matching of
 words on case-change, alpha numeric boundaries, and
 non-alphanumeric chars.  This means certain compound word
 cases will work, for example query "wi fi" will match
 document "WiFi" or "wi-fi".
    -->
<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<!-- Less flexible matching, but less false matches.  Probably not ideal for product names,
     but may be good for SKUs.  Can insert dashes in the wrong place and still match. -->
<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.EnglishMinimalStemFilterFactory"/>
    <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
         possible with WordDelimiterFilter in conjuncton with stemming. -->
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

<!-- Just like text_general except it reverses the characters of
 each token, to enable more efficient leading wildcard queries. -->
<fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
       maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<!-- This is an example of using the KeywordTokenizer along
     With various TokenFilterFactories to produce a sortable field
     that does not include some properties of the source text
  -->
<fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
  <analyzer>
    <!-- KeywordTokenizer does no actual tokenizing, so the entire
         input string is preserved as a single token
      -->
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <!-- The LowerCase TokenFilter does what you expect, which can be
         when you want your sorting to be case insensitive
      -->
    <filter class="solr.LowerCaseFilterFactory" />
    <!-- The TrimFilter removes any leading or trailing whitespace -->
    <filter class="solr.TrimFilterFactory" />
    <!-- The PatternReplaceFilter gives you the flexibility to use
         Java Regular expression to replace any sequence of characters
         matching a pattern with an arbitrary replacement string, 
         which may include back references to portions of the original
         string matched by the pattern.
         
         See the Java Regular Expression documentation for more
         information on pattern and replacement string syntax.
         
         http://docs.oracle.com/javase/7/docs/api/java/util/regex/package-summary.html
      -->
    <filter class="solr.PatternReplaceFilterFactory"
            pattern="([^a-z])" replacement="" replace="all"
    />
  </analyzer>
</fieldType>

<!-- lowercases the entire field value, keeping it as a single token.  -->
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

<!-- since fields of this type are by default not stored or indexed,
     any data added to them will be ignored outright.  --> 
<fieldType name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

<!-- This point type indexes the coordinates as separate fields (subFields)
  If subFieldType is defined, it references a type, and a dynamic field
  definition is created matching *___<typename>.  Alternately, if 
  subFieldSuffix is defined, that is used to create the subFields.
  Example: if subFieldType="double", then the coordinates would be
    indexed in fields myloc_0___double,myloc_1___double.
  Example: if subFieldSuffix="_d" then the coordinates would be indexed
    in fields myloc_0_d,myloc_1_d
  The subFields are an implementation detail of the fieldType, and end
  users normally should not need to know about them.
 -->
<fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>

<!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

<!-- An alternative geospatial field type new to Solr 4.  It supports multiValued and polygon shapes.
  For more information about this and other Spatial fields new to Solr 4, see:
  http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
-->
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
    geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />

<!-- Spatial rectangle (bounding box) field. It supports most spatial predicates, and has
 special relevancy modes: score=overlapRatio|area|area2D (local-param to the query).  DocValues is recommended for
 relevancy. -->
<fieldType name="bbox" class="solr.BBoxField"
           geo="true" distanceUnits="kilometers" numberType="_bbox_coord" />
<fieldType name="_bbox_coord" class="solr.TrieDoubleField" precisionStep="8" docValues="true" stored="false"/>

<fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" />

-----------------------------------error trace ---------------------------- SolrError Traceback (most recent call last) in () 40 with open(SOLR_DOCUMENTS_PATH) as data_file: 41 data = json.load(data_file) ---> 42 output = pysolr_client.add(data) 43 44 #Running command that index documents

C:\Users\IBM_ADMIN\Anaconda2\lib\site-packages\pysolr-3.6.0-py2.7.egg\pysolr.pyc in add(self, docs, boost, fieldUpdates, commit, softCommit, commitWithin, waitFlush, waitSearcher, overwrite, handler)
889 self.log.debug("Built add request of %s docs in %0.2f seconds.", len(message), end_time - start_time)
890 return self._update(m, commit=commit, softCommit=softCommit, waitFlush=waitFlush, waitSearcher=waitSearcher,
--> 891 overwrite=overwrite, handler=handler)
892
893 def delete(self, id=None, q=None, commit=True, softCommit=False, waitFlush=None, waitSearcher=None, handler='update'):

C:\Users\IBM_ADMIN\Anaconda2\lib\site-packages\pysolr-3.6.0-py2.7.egg\pysolr.pyc in _update(self, message, clean_ctrl_chars, commit, softCommit, waitFlush, waitSearcher, overwrite, handler)
476 message = sanitize(message)
477
--> 478 return self._send_request('post', path, message, {'Content-type': 'text/xml; charset=utf-8'})
479
480 def _extract_error(self, resp):

C:\Users\IBM_ADMIN\Anaconda2\lib\site-packages\pysolr-3.6.0-py2.7.egg\pysolr.pyc in _send_request(self, method, path, body, headers, files)
391 'request_body': bytes_body,
392 'request_headers': headers}})
--> 393 raise SolrError(error_message % (resp.status_code, solr_message))
394
395 return force_unicode(resp.content)

SolrError: Solr responded with an error (HTTP 400): [Reason: ERROR: [doc=3] unknown field 'username']

Notebook 1 errors

All of the code responses in the notebook were already populated upon opening the notebook
It is not clear if one has to wait for the cluster to be available to continue past the checking the status of the cluster step?
There are lots of pointers to the README. Why not make those links to the relevant section, or include the referenced information in the notebook?
Second sentence under the Upload a Solr Cluster needs double check. I don’t think that’s an example?
How does one know when Generate Training data step is completed?
No information given about the format of the ground truth file. The notebook says it should be in the README, and it is there, but under a different name. The script mentioned works very VERY slowly, there should be some note of this somewhere.
When attempting to train using the output of the bin/python/extract_stackexchange_dump.py script run on the cs.stackexchange.com dump:
"Error encountered during training: Training Data Quality Standards Not Met: detected 2996 out of total 2999 queries as having no label variety. This exceeds the maximum allowed of 2249 (75%)."

Sample App error

The Watson icon that shows in the retrieve and rank results is appears with an odd low resolution. It almost looks blurry

AttributeError: 'HTTPError' object has no attribute 'code'

This app is running on localhost currently,But if i put the any question on the search box i got the following issues,
ERROR in server: Exception : HTTPError(u'500 Server Error: Internal Server Error for url: https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/81aacex30-rank-1546/rank',)
[2017-04-26 08:17:45,550] ERROR in app: Exception on /api/ranker_select [GET]
Traceback (most recent call last):
File "/home/hduser/anaconda2/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/home/hduser/anaconda2/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/hduser/anaconda2/lib/python2.7/site-packages/flask/app.py", line 1518, in handle_user_exception
return handler(e)
File "server.py", line 107, in handle_error
code = e.code
AttributeError: 'HTTPError' object has no attribute 'code'

Notebook 2 env file

Notebook 2 should mention the need to update the .env file with cluster id & collection name before the generation of training data. Then with the custom ranker id before the experiment step. Also need to restart the flask server after changing the .env file.

UI issue with sample app

When hitting random, if the pre-generated question is longer than the textbox user doesn’t know what was the question

Sample App error

Some gibberish answers produce results in retrieve and rank, or standard search, but not both.

Refactor server.py for Diego

Remove occurrence of VCAP environment variables for host and port.

Answer Retrieval Scripts not supported on Windows PCs

The scripts aren't running on Windows. KPMG had issues on their windows laptop.

The hyperlink is incorrect when clicking Node.js.

Prerequisites

You will need the following in order to use this SK:

A Unix-based OS (or Cygwin)
Git
Node.js     >>>>>>> when click this link, it will go to Anaconda download page.

https://www.continuum.io/downloads

unable to deploy the sample code to bluemix using devops.

When click "Deploy to Bluemix", it will be failed, there is log.

Downloading artifacts...DOWNLOAD SUCCESSFUL
Target: https://api.ng.bluemix.net
Using manifest file /home/pipeline/57acff8b-fa8f-4695-8910-f3464d0edfa2/manifest.yml

Creating app answer-retrieval-kaihong-2157 in org [email protected] / space CDL as [email protected]...
OK

Using route answer-retrieval-kaihong-2157.mybluemix.net
Binding answer-retrieval-kaihong-2157.mybluemix.net to answer-retrieval-kaihong-2157...
OK

Uploading answer-retrieval-kaihong-2157...
Uploading app files from: /home/pipeline/57acff8b-fa8f-4695-8910-f3464d0edfa2
Uploading 314.9K, 90 files


Done uploading
OK
Binding service retrieve-and-rank-service to app answer-retrieval-kaihong-2157 in org [email protected] / space CDL as [email protected]...
OK

Starting app answer-retrieval-kaihong-2157 in org [email protected] / space CDL as [email protected]...
-----> Downloaded app package (15M)
-------> Buildpack version 1.5.5
-----> Installing python-2.7.10
Downloaded [file:///var/vcap/data/dea_next/admin_buildpacks/69221686-9e15-442c-bdec-fd0fb5fc5470_ca60152b78be3f93067cd4f03d01e2db810ac2a8/dependencies/https___pivotal-buildpacks.s3.amazonaws.com_concourse-binaries_python_python-2.7.10-linux-x64.tgz]
     $ pip install -r requirements.txt
DEPRECATION: --allow-all-external has been deprecated and will be removed in the future. Due to changes in the repository protocol, it no longer has any effect.
       Processing ./custom-scorer
       Collecting requests==2.10.0 (from -r requirements.txt (line 2))
         Downloading requests-2.10.0-py2.py3-none-any.whl (506kB)
       Collecting spacy==0.101.0 (from -r requirements.txt (line 3))
         Downloading spacy-0.101.0-cp27-cp27m-manylinux1_x86_64.whl (5.7MB)
       Collecting numpy==1.11.1 (from -r requirements.txt (line 4))
         Downloading numpy-1.11.1-cp27-cp27m-manylinux1_x86_64.whl (15.3MB)
       Collecting futures>=3.0.5 (from -r requirements.txt (line 5))
         Downloading futures-3.0.5-py2-none-any.whl
       Collecting flask==0.11.1 (from -r requirements.txt (line 8))
         Downloading Flask-0.11.1-py2.py3-none-any.whl (80kB)
       Collecting python-dotenv (from -r requirements.txt (line 9))
         Downloading python_dotenv-0.6.0-py2.py3-none-any.whl
       Collecting watson-developer-cloud (from -r requirements.txt (line 10))
         Downloading watson-developer-cloud-0.19.0.tar.gz
         Downloading cf-deployment-tracker-1.0.3.tar.gz
       Collecting preshed<0.47,>=0.46.1 (from spacy==0.101.0->-r requirements.txt (line 3))
         Downloading preshed-0.46.4-cp27-cp27m-manylinux1_x86_64.whl (223kB)
       Collecting cymem<1.32,>=1.30 (from spacy==0.101.0->-r requirements.txt (line 3))
         Downloading cymem-1.31.2-cp27-cp27m-manylinux1_x86_64.whl (66kB)
       Collecting thinc<5.1.0,>=5.0.0 (from spacy==0.101.0->-r requirements.txt (line 3))
         Downloading thinc-5.0.8-cp27-cp27m-manylinux1_x86_64.whl (1.4MB)
         Downloading murmurhash-0.26.4-cp27-cp27m-manylinux1_x86_64.whl
       Collecting sputnik<0.10.0,>=0.9.2 (from spacy==0.101.0->-r requirements.txt (line 3))
         Downloading sputnik-0.9.3-py2.py3-none-any.whl
       Collecting cloudpickle (from spacy==0.101.0->-r requirements.txt (line 3))
         Downloading cloudpickle-0.2.1-py2.py3-none-any.whl
       Collecting plac (from spacy==0.101.0->-r requirements.txt (line 3))
         Downloading plac-0.9.6-py2.py3-none-any.whl
       Collecting six (from spacy==0.101.0->-r requirements.txt (line 3))
         Downloading six-1.10.0-py2.py3-none-any.whl
       Collecting click>=2.0 (from flask==0.11.1->-r requirements.txt (line 8))
         Downloading click-6.6.tar.gz (283kB)
       Collecting itsdangerous>=0.21 (from flask==0.11.1->-r requirements.txt (line 8))
         Downloading itsdangerous-0.24.tar.gz (46kB)
       Collecting Werkzeug>=0.7 (from flask==0.11.1->-r requirements.txt (line 8))
         Downloading Werkzeug-0.11.11-py2.py3-none-any.whl (306kB)
       Collecting Jinja2>=2.4 (from flask==0.11.1->-r requirements.txt (line 8))
         Downloading Jinja2-2.8-py2.py3-none-any.whl (263kB)
       Collecting pysolr<4.0,>=3.3 (from watson-developer-cloud->-r requirements.txt (line 10))
         Downloading pysolr-3.5.0-py2.py3-none-any.whl
       Collecting semver (from sputnik<0.10.0,>=0.9.2->spacy==0.101.0->-r requirements.txt (line 3))
         Downloading semver-2.6.0.tar.gz
       Collecting MarkupSafe (from Jinja2>=2.4->flask==0.11.1->-r requirements.txt (line 8))
         Downloading MarkupSafe-0.23.tar.gz
       Installing collected packages: requests, cymem, preshed, numpy, murmurhash, thinc, semver, sputnik, cloudpickle, plac, six, spacy, futures, click, itsdangerous, Werkzeug, MarkupSafe, Jinja2, flask, python-dotenv, pysolr, watson-developer-cloud, cf-deployment-tracker, retrieve-and-rank-scorer
         Running setup.py install for semver: started
           Running setup.py install for semver: finished with status 'done'
         Running setup.py install for click: started
           Running setup.py install for click: finished with status 'done'
         Running setup.py install for itsdangerous: started
           Running setup.py install for itsdangerous: finished with status 'done'
         Running setup.py install for MarkupSafe: started
           Running setup.py install for MarkupSafe: finished with status 'done'
         Running setup.py install for watson-developer-cloud: started
           Running setup.py install for watson-developer-cloud: finished with status 'done'
         Running setup.py install for cf-deployment-tracker: started
           Running setup.py install for cf-deployment-tracker: finished with status 'done'
         Running setup.py install for retrieve-and-rank-scorer: started
           Running setup.py install for retrieve-and-rank-scorer: finished with status 'done'
       Successfully installed Jinja2-2.8 MarkupSafe-0.23 Werkzeug-0.11.11 cf-deployment-tracker-1.0.3 click-6.6 cloudpickle-0.2.1 cymem-1.31.2 flask-0.11.1 futures-3.0.5 itsdangerous-0.24 murmurhash-0.26.4 numpy-1.11.1 plac-0.9.6 preshed-0.46.4 pysolr-3.5.0 python-dotenv-0.6.0 requests-2.10.0 retrieve-and-rank-scorer-0.0.1 semver-2.6.0 six-1.10.0 spacy-0.101.0 sputnik-0.9.3 thinc-5.0.8 watson-developer-cloud-0.19.0
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
-----> Uploading droplet (70M)

0 of 1 instances running, 1 starting
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 down
0 of 1 instances running, 1 down
0 of 1 instances running, 1 failing
FAILED
Start unsuccessful

TIP: use 'cf logs answer-retrieval-kaihong-2157 --recent' for more information

Finished: FAILED

typo in Custom Scorer instructions

Before you start creating custom scorers, make sure you have created and configured a Solr cluster and trained a ranker by either following the steps on the notebook "Enhanced Retrieval.ipynb" or by using some other tooling

==> this should reference the "Answer-Retrieval.ipynb" notebook

UI issue with sample app

Ending a query with "" results in an infinite loading screen, same result if there are an odd number of “ (quotes)

UI issue

There is no limit to the amount of text you can put into the textbox, if the text is too long, the app will not return any results or an error message, it will continue to show the loading icon. Sometimes, the original string was lost that produced this result and can't replicate it.

Sample App error

Entering text of any kind and then clicking random query results in a search using the text currently in the field. Only occurs on the main page

Error with Notebook #1

After attempting to build using the files from the stack exchange dump script, this error appears:
“---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
30 with open(SOLR_DOCUMENTS_PATH) as data_file:
31 data = json.load(data_file)
---> 32 output = pysolr_client.add(data)
33
34 #Running command that index documents

/Users/mwassel/miniconda2/lib/python2.7/site-packages/pysolr.pyc in add(self, docs, boost, fieldUpdates, commit, softCommit, commitWithin, waitFlush, waitSearcher,     overwrite, handler)
    863 
    864         for doc in docs:
--> 865             message.append(self._build_doc(doc, boost=boost, fieldUpdates=fieldUpdates))
    866 
    867         # This returns a bytestring. Ugh.

/Users/mwassel/miniconda2/lib/python2.7/site-packages/pysolr.pyc in _build_doc(self, doc, boost, fieldUpdates)
    788         doc_elem = ElementTree.Element('doc')
    789 
--> 790         for key, value in doc.items():
    791             if key == 'boost':
    792                 doc_elem.set('boost', force_unicode(value))

AttributeError: 'unicode' object has no attribute 'items’”
Going back to the standard, provided data

When trying to load the experiment for analysis, I get:
“---------------------------------------------------------------------------
IOError Traceback (most recent call last)
in ()
17 # Solr experiment
18 solr_experiment_path = os.path.join(experiments_directory, 'exp_solr_only.json')
---> 19 solr_experiment = au.RetrieveAndRankExperiment(experiment_file_path=solr_experiment_path)
20 solr_entries = solr_experiment.experiment_entries
21

/Users/mwassel/Downloads/answer-retrieval-master/notebooks/analysis_utils.py in __init__(self, experiment_file_path)
    141                 experiment_file_path (str): Path to the experiment file
    142         """
--> 143         obj = json.load(open(experiment_file_path, 'rt'))
    144         self.experiment_entries = obj['experiment_entries']
    145         self.base_url = obj['experiment_metadata']['url']

IOError: [Errno 2] No such file or directory: '/Users/mwassel/Downloads/answer-retrieval-master/notebooks/../data/experiments/exp_solr_only.json’”

Is that file being created in the previous step?

Can't run the rest of the notebook because of previous problem.

Running Python "server.py" Fails

After following the steps as directed in the markdown, getting to and running the Python server "python server.py" fails and displays an error.

XXX-XXX-XXX:answer-retrieval hmaal$ python server.py
Traceback (most recent call last):
File "server.py", line 26, in
from retrieve_and_rank_scorer.scorers import Scorers
File "/Users/hmaal/anaconda3/lib/python3.5/site-packages/retrieve_and_rank_scorer/scorers.py", line 77
except futures.TimeoutError, e:
^
SyntaxError: invalid syntax

Barebones:

Non-existent create method used in R&R node SDK samples

https://www.ibm.com/watson/developercloud/retrieve-and-rank/api/v1/?node#query_standard says:

// Get a Solr client for indexing and searching documents.
// See https://github.com/watson-developer-cloud/node-sdk/blob/master/services/retrieve_and_rank/v1.js
solrClient = retrieve_and_rank.create(params);

There is no method create() - the sample should probably say createSolrClient()

germanattanasio / answer-retrieval Goto Github PK

answer-retrieval's Introduction

German Attanasio

CTO & software dev w/ expertise in distributed apps, microservices, cloud, & container orchestration.

Skills

answer-retrieval's People

Contributors

Stargazers

Watchers

Forkers

answer-retrieval's Issues

Recommend Projects

Recommend Topics

Recommend Org