KISS
davidfrigola.github.io
Python Processing Engine. See master branch for stable version
Create a try/catch section for request
Return no result if timeout/other error occurs (log warn)
This will allow a chain processor continue with next element.
See sample : http://stackoverflow.com/questions/10247054/http-post-and-get-with-cookies-for-authentication-in-python
Other interesting documentation link : http://docs.python-requests.org:8000/en/latest/user/advanced/
Intended usage : session with cookie for auth sites
Same philosophy of FileProcessor: hability to store/retrieve elements to/from redirs.
Will base on a RedisDatasource implementation
If possible should store the whole object serialized, if it's too much tricky, just store the value.
Metadata is a MUST?
Take a look to
http://docs.travis-ci.com/user/build-configuration/#Specify-branches-to-build
in order to white - black list branches to be in the CI.
Unit tests and mocks for python tests (need to learn how to do it)
Code coverage : at least happy path for this milestone
Use core feature (from #4 ) for all existing processors.
is it @decorator pattern in python? Need to investigate
Typo in L43
htmlBS = BeautifulSoup(request.get(item.getValue()).text)
must be
htmlBS = BeautifulSoup(requests.get(item.getValue()).text)
Create a sender processor to send any item to a statsd running server
Processor for log item values
Two output modes available for fist implementation: logger and stdout
Add useragent as HEADER
Configurable (see http://docs.python-requests.org/en/latest/api/ for request and optional parameters in the get method)
Add a constants list for the available agents
Default user agent (mozilla or whatever)
Random pick?
When storing into a file, append if the file already exists.
Processor to add items to the stream.
Add items before / after / both the items list (or a single item)
Coveralls not working.
Created travis config in .travis-ci.yaml file and using https://github.com/coagulant/coveralls-python configuration instructions.
See https://travis-ci.org/davidfrigola/pype/builds/14570939
$ coveralls
Submitting coverage to coveralls.io...
Coverage submitted!
Build processing error.
Increase code coverage
Use best practices instead of current approach from M-0.2
Change travis-ci yaml file for CI check of this new code coverage
Install python components in a virtual environment.
http://www.virtualenv.org/en/latest/
Alternatives?
Generate egg file
Create setup.py to generate egg file
When stable, configure it to publish to CheeseShop just for fun?
Add configuration from file feature:
Processor({CONFIG_FROM_FILE:<path_to_config_file>})
See https://pypi.python.org/pypi/selenium
And then for downloading purposes : http://stackoverflow.com/questions/18439851/downloading-file-using-selenium
Headless running : http://www.realpython.com/blog/python/headless-selenium-testing-with-python-and-phantomjs/#.Uy7PkN_gEUQ
It causes the exception in the str(value) for logger.debug in setMetadata method of model class
Create a condition, based on data-storage (mongodb?) to decide if an item has been processed.
Options:
Using condition, create a processor (DRYProcessor?) so pype has this feature out-of-the-box
Code samples and usage.
Wiki page for samples
Define milestone roadmap for main features
Add fixes when appropiate
Complete wiki documentation
Take a look to this https://kafka.apache.org/
Investigate how it works
Push automatically on each milestone end. Possible add this as step in the milestone close issue task.
Update samples readme.
Add description/feature demostration for each sample file
Possibly rename to README.md?
Define Torrent processor py file with Transmission Add processor
The Transmission Add Processor must be able to add magnet links to a configured running transmission-daemon service.
Configuration needed:
Generate Tag m0.3
Move in Processors from process and processList to a unique process method with a "stream" object
Initially this stream object can be just an array of BaseItem
Stream object can be more convenient in order to add complementary data as Context or StreamData
Datasource for rethinkdb http://rethinkdb.com/
Provide a datasource for Cassandra
http://cassandra.apache.org/
See:
http://pycassa.github.io/pycassa/
https://github.com/datastax/python-driver
Investigate how to configure travis ci for python project.
Probably a fixed structure is needed (maven like?) or a detailed configuration file.
Can't go further without this working
Implement a processor that runs a script (linux)
HeadersProvider injected in config at processor startup.
This is related with #20
If none provided, add a default one
If provided, use it
Some util providers:
RegexCondition and ContainsTextCondition should only work on str item values.
See extra_conditions.py lines L28-29
AND - : the L28 log output SHOULD NOT evaluate twice the regex, so the result must be stored in temp var and then used for logging purposes
Add debug to all processors
Take a look at : http://stackoverflow.com/questions/18707821/how-to-know-if-my-python-tests-are-running-in-coverage-mode
Basically:
Based on text files
A processor that:
Intended to:
Get the test coverage over 60%
Currently the timestamp is generated after the filename , i.e. :
filename.ext.timestamp
Feature will give you:
filename.timestamp.ext
See http://www.cyberciti.biz/faq/python-execute-unix-linux-command-examples/ for a better usage of subprocess instead of os
Processor to start/add items within the pype
Will use external datasource (we have mongo ds implemented atm) to obtain items and add them to the pype
Add an elasticsearch datasource for DRY processor
See http://www.elasticsearch.org/blog/unleash-the-clients-ruby-python-php-perl/#python
Retrieve / store item values from/into a file
Use Lucene as datasource for processing.
Take a look to http://lucene.apache.org/pylucene/
A processor to log events and whatever we want to trace in statsD
https://github.com/etsy/statsd/
Maybe a more generic TraceProcessor with an injected client?
Create a Validator abstract class and implementations for each processor.
Default validator should be used when none provided ( defaul parameter value in constructor init(self,config, validator=ProcessorDefaultValidatorImplementation??) if such a thing exist, or something programatically driven
Example :
http://nvie.com/posts/a-successful-git-branching-model/
Wiki documentation haha!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.