voxel51 / eta Goto Github PK
View Code? Open in Web Editor NEWETA: Extensible Toolkit for Analytics
Home Page: https://voxel51.com
License: Apache License 2.0
ETA: Extensible Toolkit for Analytics
Home Page: https://voxel51.com
License: Apache License 2.0
This will allow us to, for example, write a pipeline with multiple instances of the same module in it in different places.
These "custom" names would be used when setting parameters and defining the module connections in the pipeline metadata file.
We have a samples directory now that has examples code that seems to be intended for developers.
We also need to create samples for every module. Right now, I am putting such examples in the same place, but it is not clear to me that this is the right thing to do. Having examples running pipelines will make using and extending eta much easier.
Also: I do not like the word samples here. These are examples.
Options to support, in order of precedence:
ETA_CONFIG
environment variable~/.etac
<eta>/config.json
Painful to keep rerunning the detector to fix config bugs in my pipeline only to have it reprocess frames :)
I think the pipeline can check for some state/status in the output as another way of doing this, but I like the notion of a generalizing VideoFeaturizer that maintains a backing store.
eta.core.serial now imports dill. It should be added to requirements.txt. e.g.
dill==0.2.7.1
For example:
max_size
argument of the resize_videos
module, but perhaps this is a general enough need that we should provide formal support for it.resize_videos
module by symlinking the outputs to the inputs, but perhaps this is a general enough need that we should provide formal support for it.Need to add a Serializable.write_json
method. We really shouldn't be calling serial.write_json
directly. Data I/O to disk should almost always be done through a "data class" that implements Serializable
Now that vgg is in the repo, we should have the install scripts install tensorflow.
On my mac, I got this after running the install script and then running embed_image.
jcorso@newbury-2 /voxel51/w/eta
$ cd examples/embed_vgg16
/voxel51/w/eta/examples/embed_vgg16
jcorso@newbury-2 /voxel51/w/eta/examples/embed_vgg16
$ python embed_image.py
Traceback (most recent call last):
File "embed_image.py", line 20, in <module>
import tensorflow as tf
ImportError: No module named tensorflow
Ah, after digging a bit deeper, this is actually a problem with the install script. It got up to the install python bits, but then quit (without message) because they failed. My suspicion is that those bits did not get executed as sudo and my python requires sudo for installing for some reason that escapes me. (this is on a mac).
So, something needs to be changed/improved, even if it is the doc on how to run the install_externals as sudo.
Thoughts?
And any other related meta-settings
We should generalize the eta.core.diagram
module to support more full-featured reporting capabilities such as automatic LaTeX generation and diagramming with tikz
so that the diagrams look nicer.
The need to pass around sets of numbers like [1, 5, 6, 7, 10]
or "1,5-7,10"
is pretty general. We should upgrade the eta.core.video.FramesRanges
class to provide this general functionality.
It should accept strings (including "*"
) and lists (including []
)
eta.core.config.Config
should also understand how to accept fields of this new type.
Our public versioning can be something more like ubuntu then. What do you think?
eta.core.builder.PipelineBuildRequest
should allow the user to optionally specify paths to write copies of the pipeline outputs.
We should also add flags to eta build
to control whether the intermediate outputs are retained.
BaseDataRecord
has this capability, but all Serializable
objects should support it.
Is unnecessary if not transparent and waste memory and cycles.
Not critical.
There are many types of objects that we will want to store in Frame-like classes. We should have a BaseFrame
class that defines all the common functionality and then subclasses like DetectedFrame
, EmbeddedFrame
, TrackedFrame
, etc. that are thin-wrappers over BaseFrame
that specify what type of objects are in the list.
Use environment variables/config fields to choosing between the following options:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
Reference
https://www.tensorflow.org/programmers_guide/using_gpu
Exceptions raised and outputted when running pipelines need to be more informative, such as pwd and module name, etc...
This failure is during the opencv build. Fix would be to have a sudo apt-get install cmake
in the script.
We have been putting the sample data into the repository, but this will quickly bloat the repository if we add any sizable amount making it hard to work with. We need to establish a separate data dump that can be fetched if the user wants to run the examples, etc.
It is awkward that VideoProcessor
currently tries to support both clips-based and video-based processing. We should probably refactor the clips processing into a new VideoClipsProcessor
class.
Also, inspect the computation graph and only run the modules necessary to generate the requested outputs.
This will allow us to define more flexible pipelines that can be used for various purposes.
Probably not best practice to rely on system-wide installs.
Also: the mac parts rely on brew. Some of us use port (macports) instead of brew. How to reconcile? (Virtualenv?)
It would be useful to have an eta.core.config.Config.parse_enum()
method that works like this:
class MyConfig(Config):
def __init__(self, d);
self.value = self.parse_enum(d, "value", Choices)
where the "enum" can be defined either as a class:
class Choices(Enum):
A = valA
B = valB
or a dict:
Choices = {
"A": valA,
"B": valB
}
A common pattern will be to use this mechanism when the user needs to choose between one or classes or functions to use.
Methods like eta.core.utils.parse_dir_pattern
and eta.core.utils.parse_bounds_from_dir_pattern
should be converted into builder methods of eta.core.data.DataFileSequence
, which should be our one-stop shop for all file-sequence-related operations.
(I like eta.core.data.DataFileSequence
--- this idea has been sorely missing)
One can currently do this via VideoProcessor
with an appropriately chosen frames
string, but this doesn't give the user full control over the output filenames, which is undesirable
We currently build OpenCV from source during our external installs, but it is causing us pain every time we re-install ETA on a new machine (new developers, production deployments, etc). Moreover, the only customization we currently do is setting the WITH_CUDA
flag.
Should we continue building OpenCV from source, or would pip install opencv-python
suffice for us?
Options:
(A) support this only at the pipeline metadata level by adding a "pipelines" field that allows access to I/O of other pipelines. When a pipeline is built, a single pipeline config would be populated based on this information
(B) support this at the pipeline config level by allowing pipeline configs to point to other pipeline configs.
I'm leaning towards (A).
If I have a pipeline with a dozen modules and they all require a "frames" setting because they are all working with the same video. It would be far easier to have a setting like this set in the top-level config and then inherited. And, less room for error.
This would be harder if there are multiple videos. But, even less room for error.
(This is a thought I had while working with the pipeline bits. Up for discussion, of course, but wanted to get it down.)
__iter__
tensorflow 1.7.0 has requirement numpy>=1.13.3, but you'll have numpy 1.13.1 which is incompatible.
A new-to-ETA developer will want to get acquainted with the available functionality out of the box. A seasoned-ETA developer will want to learn what new modules or pipelines may have been added recently. A pipeline developer will need to list available modules.
ETA needs an apt-cache
-like functionality to navigate the module and pipeline space.
eta.core
is getting quite large, so we should split it out into more wieldy sub-packages.
This seems natural me, as when I am developing in an eta project, I normally have local modules
and pipelines
directories.
4 "module_dirs": ["/voxel51/j/eta/eta/modules", "./modules"],
5 "pipeline_dirs": ["/voxel51/j/eta/eta/pipelines", "./pipelines"],
Currently modules is just a set of executable python code that uses the eta codebase. It is not a package (it has no "init.py" file). But, it is inside of eta within the repo. I'd suggest either moving it outside of the eta directory or turning it into a package.
Is there a fundamental reason why we would not want to allow modules to import other modules. It would not be possible just to "import modulename" because the actual code may be executing somewhere else.
Currently if we use eta.core.vgg16.VGG16Featurizer
without explicitly calling start()
and stop()
, it will silently load and destroy a huge CNN every time featurize()
is called. This is never what the user really wants.
I can see why Featurizer
allows this to silently happen (setup/tear-down could be cheap), but VGG16Featurizer
should raise an error here.
The other option is to set keep_alive=True
, but then the naive user would be carrying around a CNN in memory, which also deserves an error.
This would not be hard to do. We could even support both simultaneously with
https://pythonhosted.org/six/
Some resources
https://docs.python.org/3/howto/pyporting.html
http://python-future.org/automatic_conversion.html
We need everything in eta that is written via json to be reflective. This would enhance and simplify overall functionality.
I also think we should deprecate from_json
and write_json
to just read
and write
.
As discussed, this seems to be the direction we want to move in.
Request to add pipeline support for dry-run
case that removes all configs and output files for the case that the user only wants to see the stdout
.
Return the correct stream based on codec_type
(video) in get_stream_info
in eta.core.video
.
It assumes the first stream, currently.
The or_die
functionality will be more generally useful.
Either add to install, create a second install script, or just add information to install.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.