pcbje / gransk Goto Github PK
View Code? Open in Web Editor NEWDocument processing for investigations
Home Page: https://gransk.com
License: Apache License 2.0
Document processing for investigations
Home Page: https://gransk.com
License: Apache License 2.0
I'm trying to run this command in the main Gransk terminal:
python -m gransk.boot.ui
This returns a link which open the UI as expected. But when I add a file, there is no response from the UI. It just shows 100% and refreshes and there is no result or dashboard on the UI. I have tried psts, txt, zip and other files and there is no response.
The log in the terminal is difficult to understand. Please tell me if there is a fix to this problem or how I can proceed from here. Thank you.
I'm trying to run the main.py file and I get an Import error:
ImportError: cannot import name 'format_exc'
After some research from stackoverflow and other similar github issue, I find that this could be a file name conflict in the project. Here is the Exception Traceback:
Traceback (most recent call last):
File "C:/Users/mohit.motwani/AppData/Local/Continuum/anaconda3/Lib/site-packages/polyglot-16.7.4-py3.6.egg/polyglot/__main__.py", line 13, in <module>
import logging
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\logging\__init__.py", line 26, in <module>
import sys, os, time, io, traceback, warnings, weakref, collections
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\traceback.py", line 5, in <module>
import linecache
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\linecache.py", line 11, in <module>
import tokenize
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\Lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\tokenize\__init__.py", line 4, in <module>
from .base import WordTokenizer, SentenceTokenizer
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\Lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\tokenize\base.py", line 7, in <module>
from polyglot.base import Sequence
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\__init__.py", line 12, in <module>
from .base import Sequence, TokenSequence
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\site-packages\polyglot-16.7.4-py3.6.egg\polyglot\base.py", line 9, in <module>
from concurrent.futures import ProcessPoolExecutor
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\concurrent\futures\__init__.py", line 8, in <module>
from concurrent.futures._base import (FIRST_COMPLETED,
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\concurrent\futures\_base.py", line 8, in <module>
import threading
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\threading.py", line 7, in <module>
from traceback import format_exc as _format_exc
ImportError: cannot import name 'format_exc'
I can see that there is only one traceback.py in my anaconda folder and I can't find any other traceback file to confirm if this is a name conflict. I don't know where the problem is and how to fix this Import Error. Any help will be appreciated.
Research, investigation and ediscovery meet hard files in a wild. You may harden your service against these https://gitlab.com/dzmitry-lahoda/ediscovery-files/tree/master/assets .
I have followed the instruction as mentioned in the read me to use gransk with the Python implementation.
I ran:
python -m setup.py download
It returns that polygot
is not recognizes as a command.
So I try to install polyglot using
pip install polyglot
and apparently it needs pycld2
and pyicu
When I try to install pyciu(pip isntall pyicu
) it returns an error
Failed to build pyicu polyglot 16.7.4 requires pycld2>0.3, which is not installed
When I try to install pycld2(pip install pycld2
) it returns an error
polyglot 16.7.4 requires pyICU>1.8, which is not installed
I have referred other github issues too, to install icu, pyicu and pycld2 but with no success.
I don't know how to proceed from here to install polyglot and start working with Gransk. Any help or guidance is appreciated.
Following the install instructions using docker results in failing to find the tagged image.
This is because https://www.docker.com/pcbje/gransk no longer exists.
~ [0] # sh ./docker-quickstart.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 257 100 257 0 0 274 0 --:--:-- --:--:-- --:--:-- 274
Pulling tika (logicalspark/docker-tikaserver:latest)...
latest: Pulling from logicalspark/docker-tikaserver
6bbedd9b76a4: Pull complete
fc19d60a83f1: Pull complete
de413bb911fd: Pull complete
2879a7ad3144: Pull complete
668604fde02e: Pull complete
b5e75da9a0c7: Pull complete
Digest: sha256:e8ee854773a32ccd1fd41d6df0cfacc1985e096b3c0aaac5bde97f912e55146d
Status: Downloaded newer image for logicalspark/docker-tikaserver:latest
Pulling elasticsearch (elasticsearch:2.3.5)...
2.3.5: Pulling from library/elasticsearch
386a066cd84a: Downloading [=================================================> ] 50.86 MB/51.36 MB
75ea84187083: Download complete
386a066cd84a: Pull complete
75ea84187083: Pull complete
3e2e387eb26a: Pull complete
eef540699244: Pull complete
1624a2f8d114: Pull complete
7018f4ec6e0a: Pull complete
6ca3bc2ad3b3: Pull complete
424638b495a6: Pull complete
2ff72d0b7bea: Pull complete
9d25542ccc02: Pull complete
35456bfab3fd: Pull complete
c206de1a2db8: Pull complete
3fd5839fafc9: Pull complete
e11632209e5b: Pull complete
Digest: sha256:336e82bf4a8edee630efcd112ee388fd52b1dd04b0c47300f3efa60ed67a266e
Status: Downloaded newer image for elasticsearch:2.3.5
Creating root_elasticsearch_1
Creating root_tika_1
Pulling repository docker.io/pcbje/gransk
Tag v0.1 not found in repository docker.io/pcbje/gransk
Go to: http://localhost:8084
Unable to find image 'pcbje/gransk:v0.1' locally
Pulling repository docker.io/pcbje/gransk
docker: Tag v0.1 not found in repository docker.io/pcbje/gransk.
See 'docker run --help'.
Networking is handled differently when using docker for windows. The current config.yml defines tika and elasticsearch ips as localhost, which does not work on windows. To my knowledge, the direct ip (172.17.x.x) needs to be defined (preferably dynamically)
As mentioned in the README.md of this project, I'm using the following code to to add files:
import io
import gransk.api as api
import gransk.core.document as document
gransk = api.API(config_path=u'config.yml')
doc = document.get_document(u'C:\\Users\\mohit.motwani\\Desktop\\EY Projects\\testing.txt')
doc.tag = u'demo'
content= io.BytesIO(b'Data buffer')
gransk.add_file(doc, content)
gransk.stop()
Running this script returns
FileNotFoundError: [WinError 2] The system cannot find the file specified
This is the track trace:
[2019-01-08 13:01:33] [INFO] gransk.plugins.storage.es_index: {'error': {'root_cause': [{'type': 'mapper_parsing_exception', 'reason': 'No handler for type [string] declared on field [entity_value]'}], 'type': 'mapper_parsing_exception', 'reason': 'Failed to parse mapping [in_doc]: No handler for type [string] declared on field [entity_value]', 'caused_by': {'type': 'mapper_parsing_exception', 'reason': 'No handler for type [string] declared on field [entity_value]'}}, 'status': 400}
[2019-01-08 13:01:33] [ERROR] MAIN: could not process C:\Users\mohit.motwani\Desktop\EY Projects\testing.txt: [WinError 2] The system cannot find the file specified
Traceback (most recent call last):
File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\api.py", line 72, in consume
self.produce(helper.EXTRACT_META, doc, file_object)
File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\core\abstract_subscriber.py", line 97, in produce
self.pipeline.produce(topic, doc, payload)
File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\core\pipeline.py", line 100, in produce
callback(doc, payload)
File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\core\abstract_subscriber.py", line 61, in time_consume
self.consume(doc, payload)
File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\plugins\extractors\file_meta.py", line 132, in consume
mime_type = self.__get_mime_type(doc, payload)
File "C:\Users\mohit.motwani\Desktop\EY Projects\gransk\gransk\plugins\extractors\file_meta.py", line 106, in __get_mime_type
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\mohit.motwani\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
My test.py file is in the main Gransk folder. And I'm 100% sure the file exists.
I have tried with relative paths, absolute paths, specified different files and folders but it always returns the same error. Can some one tell me what is wrong? I don't know what the script is supposed to return because I haven't been able to run it. But I'd like to know any possible fixes to this problem. Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.