Giter VIP home page Giter VIP logo

gransk's People

Contributors

oaeide avatar pcbje avatar sente avatar shura1oplot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gransk's Issues

Cannot Programmatically add files

As mentioned in the README.md of this project, I'm using the following code to to add files:

import io

import gransk.api as api
import gransk.core.document as document

gransk = api.API(config_path=u'config.yml')

doc = document.get_document(u'C:\\Users\\mohit.motwani\\Desktop\\EY Projects\\testing.txt')
doc.tag = u'demo'

content= io.BytesIO(b'Data buffer')

gransk.add_file(doc, content)
gransk.stop()

Running this script returns

FileNotFoundError: [WinError 2] The system cannot find the file specified

This is the track trace:

[2019-01-08 13:01:33] [INFO] gransk.plugins.storage.es_index: {'error': {'root_cause': [{'type': 'mapper_parsing_exception', 'reason': 'No handler for type [string] declared on field [entity_value]'}], 'type': 'mapper_parsing_exception', 'reason': 'Failed to parse mapping [in_doc]: No handler for type [string] declared on field [entity_value]', 'caused_by': {'type': 'mapper_parsing_exception', 'reason': 'No handler for type [string] declared on field [entity_value]'}}, 'status': 400}
[2019-01-08 13:01:33] [ERROR] MAIN: could not process C:\Users\mohit.motwani\Desktop\EY Projects\testing.txt: [WinError 2] The system cannot find the file specified
Traceback (most recent call last):
  File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\api.py", line 72, in consume
    self.produce(helper.EXTRACT_META, doc, file_object)
  File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\core\abstract_subscriber.py", line 97, in produce
    self.pipeline.produce(topic, doc, payload)
  File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\core\pipeline.py", line 100, in produce
    callback(doc, payload)
  File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\core\abstract_subscriber.py", line 61, in time_consume
    self.consume(doc, payload)
  File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\plugins\extractors\file_meta.py", line 132, in consume
    mime_type = self.__get_mime_type(doc, payload)
  File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\plugins\extractors\file_meta.py", line 106, in __get_mime_type
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

My test.py file is in the main Gransk folder. And I'm 100% sure the file exists.

I have tried with relative paths, absolute paths, specified different files and folders but it always returns the same error. Can some one tell me what is wrong? I don't know what the script is supposed to return because I haven't been able to run it. But I'd like to know any possible fixes to this problem. Thank you.

Incorrect ip's when using docker for windows

Networking is handled differently when using docker for windows. The current config.yml defines tika and elasticsearch ips as localhost, which does not work on windows. To my knowledge, the direct ip (172.17.x.x) needs to be defined (preferably dynamically)

Issues while installing dependencies for Gransk

I have followed the instruction as mentioned in the read me to use gransk with the Python implementation.
I ran:
python -m setup.py download

It returns that polygot is not recognizes as a command.

So I try to install polyglot using
pip install polyglot and apparently it needs pycld2 and pyicu
When I try to install pyciu(pip isntall pyicu) it returns an error
Failed to build pyicu polyglot 16.7.4 requires pycld2>0.3, which is not installed

When I try to install pycld2(pip install pycld2) it returns an error
polyglot 16.7.4 requires pyICU>1.8, which is not installed

I have referred other github issues too, to install icu, pyicu and pycld2 but with no success.

I don't know how to proceed from here to install polyglot and start working with Gransk. Any help or guidance is appreciated.

docker image doesn't exist - docker.io/pcbje/gransk

Following the install instructions using docker results in failing to find the tagged image.

This is because https://www.docker.com/pcbje/gransk no longer exists.

~ [0] # sh ./docker-quickstart.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   257  100   257    0     0    274      0 --:--:-- --:--:-- --:--:--   274
Pulling tika (logicalspark/docker-tikaserver:latest)...
latest: Pulling from logicalspark/docker-tikaserver
6bbedd9b76a4: Pull complete
fc19d60a83f1: Pull complete
de413bb911fd: Pull complete
2879a7ad3144: Pull complete
668604fde02e: Pull complete
b5e75da9a0c7: Pull complete
Digest: sha256:e8ee854773a32ccd1fd41d6df0cfacc1985e096b3c0aaac5bde97f912e55146d
Status: Downloaded newer image for logicalspark/docker-tikaserver:latest
Pulling elasticsearch (elasticsearch:2.3.5)...
2.3.5: Pulling from library/elasticsearch
386a066cd84a: Downloading [=================================================> ] 50.86 MB/51.36 MB
75ea84187083: Download complete
386a066cd84a: Pull complete
75ea84187083: Pull complete
3e2e387eb26a: Pull complete
eef540699244: Pull complete
1624a2f8d114: Pull complete
7018f4ec6e0a: Pull complete
6ca3bc2ad3b3: Pull complete
424638b495a6: Pull complete
2ff72d0b7bea: Pull complete
9d25542ccc02: Pull complete
35456bfab3fd: Pull complete
c206de1a2db8: Pull complete
3fd5839fafc9: Pull complete
e11632209e5b: Pull complete
Digest: sha256:336e82bf4a8edee630efcd112ee388fd52b1dd04b0c47300f3efa60ed67a266e
Status: Downloaded newer image for elasticsearch:2.3.5
Creating root_elasticsearch_1
Creating root_tika_1
Pulling repository docker.io/pcbje/gransk
Tag v0.1 not found in repository docker.io/pcbje/gransk
Go to: http://localhost:8084
Unable to find image 'pcbje/gransk:v0.1' locally
Pulling repository docker.io/pcbje/gransk
docker: Tag v0.1 not found in repository docker.io/pcbje/gransk.
See 'docker run --help'.

Gransk UI won't accept the added files

I'm trying to run this command in the main Gransk terminal:

python -m gransk.boot.ui

This returns a link which open the UI as expected. But when I add a file, there is no response from the UI. It just shows 100% and refreshes and there is no result or dashboard on the UI. I have tried psts, txt, zip and other files and there is no response.

The log in the terminal is difficult to understand. Please tell me if there is a fix to this problem or how I can proceed from here. Thank you.

ImportError: cannot import name 'format_exc'

I'm trying to run the main.py file and I get an Import error:

ImportError: cannot import name 'format_exc'

After some research from stackoverflow and other similar github issue, I find that this could be a file name conflict in the project. Here is the Exception Traceback:

Traceback (most recent call last):
  File "C:/Users/mohit.motwani/AppData/Local/Continuum/anaconda3/Lib/site-packages/polyglot-16.7.4-py3.6.egg/polyglot/__main__.py", line 13, in <module>
    import logging
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\logging\__init__.py", line 26, in <module>
    import sys, os, time, io, traceback, warnings, weakref, collections
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\traceback.py", line 5, in <module>
    import linecache
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\linecache.py", line 11, in <module>
    import tokenize
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\Lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\tokenize\__init__.py", line 4, in <module>
    from .base import WordTokenizer, SentenceTokenizer
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\Lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\tokenize\base.py", line 7, in <module>
    from polyglot.base import Sequence
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\__init__.py", line 12, in <module>
    from .base import Sequence, TokenSequence
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\base.py", line 9, in <module>
    from concurrent.futures import ProcessPoolExecutor
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\concurrent\futures\__init__.py", line 8, in <module>
    from concurrent.futures._base import (FIRST_COMPLETED,
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\concurrent\futures\_base.py", line 8, in <module>
    import threading
  File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\threading.py", line 7, in <module>
    from traceback import format_exc as _format_exc
ImportError: cannot import name 'format_exc'

I can see that there is only one traceback.py in my anaconda folder and I can't find any other traceback file to confirm if this is a name conflict. I don't know where the problem is and how to fix this Import Error. Any help will be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.