psychoinformaticslab / pliers Goto Github PK

View Code? Open in Web Editor NEW

289.0 22.0 67.0 17.71 MB

Automated feature extraction in Python

Home Page: https://pliers.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 99.40% Dockerfile 0.60%

python audio video feature-extraction

pliers's Introduction

pliers: a python package for automated feature extraction

Pliers is a Python package for automated extraction of features from multimodal stimuli. It provides a unified, standardized interface to dozens of different feature extraction tools and services--including many state-of-the-art deep learning-based models and content analysis APIs. It's designed to let you rapidly and flexibly extract all kinds of useful information from videos, images, audio, and text.

You might benefit from pliers if you need to accomplish any of the following tasks (and many others!):

Identify objects or faces in a series of images
Transcribe the speech in an audio or video file
Apply sentiment analysis to text
Extract musical features from an audio clip
Apply a part-of-speech tagger to a block of text

Each of the above tasks can typically be accomplished in 2 - 3 lines of code with pliers. Combining them all--and returning a single, standardized DataFrame--might take a bit more work. Say maybe 5 or 6 lines.

In a nutshell, pliers provides a high-level, unified interface to a large number of feature extraction tools spanning a wide range of modalities.

Documentation

The official pliers documentation on ReadTheDocs is comprehensive, and contains a quickstart, API Reference, and more.

Pliers overview (with application to naturalistic fMRI)

Pliers is a general purpose tool, this is just one domain where it's useful.

Tutorial Video

The above video is from a tutorial as a part of a course about naturalistic data.

How to cite

If you use pliers in your work, please cite both the pliers and the following paper:

McNamara, Q., De La Vega, A., & Yarkoni, T. (2017, August). Developing a comprehensive framework for multimodal feature extraction. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1567-1574). ACM.

pliers's People

Contributors

Stargazers

Watchers

pliers's Issues

Add new text dictionaries

FeatureX has a PredefinedDictionaryExtractor class that takes a block of text as input and returns values for each word. For example, via an affective norms database, one can get the valence and arousal of the words in one's text.

Adding new dictionaries is as simple as adding new JSON dictionaries to the dictionaries.json file bundled with the package. Any file added there can subsequently be used in the PredefinedDictionaryExtractor. Since there are potentially hundreds of usable and useful text feature dictionaries on the web, it would be great to expand the current list of supported resources.

List of potential libraries and API services to add

This is a standing issue for tracking potential libraries and API services to wrap in pliers.

Multimodal/major APIs

Amazon Rekognition
Microsoft Azure APIs

Audio

Librosa
A-weighting of AudioStims
madmom

Image

face_recognition (this one should be trivial)
neural-doodle and neural-enhance
OpenPose (with Python API)

Language

Create visual representation of Graphs

We can use pygraphviz or something.

Require explicit permission to run a large set of queries against API extractors?

At the moment, the graph API doesn't do anything to prevent a user from trying to run a full-length movie file through an image extractor, which could result in a very large number of queries (1 per frame) to an API extractor if users aren't careful. It might be a good idea to at minimum issue a warning when a large set of queries (e.g., > 100) to an API Extractor is detected, and possibly even require the user to set an explicit flag (e.g., large_jobs=True). Alternatively, we could disallow automatic VideoToImageStim conversion in cases where the resulting video frame set is very large.

Implement some method of specifying default Converter to use in implicit conversions when there are multiple candidates

E.g., when converting AudioStim --> ComplexTextStim, there are several possible candidates. The get_converter method will get the first match, but we should have some way of specifying a default.

Test don't run

When I try to execute nose tests fail with:

from .stimuli import VideoStim, AudioStim, TextStim, ImageStim
ImportError: No module named stimuli

This probably means that some modifications were not pushed to github yet.

Extractor registry

There's no centralized tracking of Extractors at the moment, which makes it difficult to search for specific extractors, properly attribute credit, etc. We should add some tools for annotating Extractors with information like author, purpose, description, citation, tags, etc.

Add memoization of Converters

There will be a lot of overhead calling Converters repeatedly if implicit Stim conversion is required. We can address this by memoizing the conversion functions with joblib or something similar.

Update requirements.txt for everything!

GoogleVisionAPIFaceExtractor unusable output when multiple faces

The flattened output structure of the extractor does not contain an indicator that allows for binding individual features to a face in the case of multiple faces being detected. For every additional face a set of new columns will be added that have identical names. It seems that column order cannot be used at present to infer the start of a set of features for an additional face.

It looks as if a per-face column name prefix could be a solution.

Automatic Stim adapters

Consider a situation where a user wants to take a VideoStim as input and apply the STFTExtractor (i.e., short-time Fourier transform) to the audio track. Currently, an exception will be raised, because the STFTExtractor only handles AudioStim inputs. However, since most movies have an audio track, featurex should be smart enough to attempt to automatically extract an AudioStim from a VideoStim and apply the audioextractor to the result (i.e., basically building an implicit graph) before it raises an exception. This isn't a high priority, but would be nice to have at some point.

StimCollection class

At the moment the standard way to apply extractors to a Stim is via an .extract call to the Stim--e.g.,

stim = ImageStim('my_image.jpg')
extractors = [ExtractorA(), ExtractorB(), ExtractorC()]
stim.extract(extractors)

This allows multiple Extractors to be applied at once to a single Stim, but it would be useful to do multiple stims at once. Some kind of StimCollection container that implicitly loops over Stims might be worth adding. Thoughts?

Consolidated list of all optional dependencies

It's getting hard to keep track of all the optional dependencies; we should add an optional_dependencies.txt file in the package root that users can pip install -r with if they want everything.

Support other audio formats besides .wav when initializing AudioStim

add sphinx support

Add 'columns' field to dictionaries in dictionaries.json

It's a bit annoying that there's no way to know what the columns are in the lookup dictionaries supported in datasets/dictionaries.json without fetching them. We should add a mandatory 'column_names' field to the JSON objects that lists all valid column names (even if all columns in the target file are valid for use). This way users can easily scan dictionaries.json (and eventually, we can dynamically generate a table inside the docs). We could even extend this eventually to include an optional 'column_descriptions' that describes each column.

Add Wit.ai extractor

The Wit.ai API has stellar speech recognition, and has no strict rate limit. It would be great to add a feature extractor for it. It's supported by the SpeechRecognition package, so we could either wrap SR, or implement our own interface (see https://wit.ai/docs/http/20160330#get-intent-via-speech-link).

Filters vs. extractors

I'm implementing A-weighting, which filters the audio timeseries, and I was thinking about differentiating filters and extractors. It seems almost wasteful to create an event for every frame in an audio stream, and filters seem like they'd be used to preprocess data rather than to generate timelines.

If filters are sufficiently different, they may merit another submodule along with extractors and stimuli.

Thoughts?

Add economy config setting at package level

Some Extractors now create intermediate files en route to generating feature values. Since some of these are movies or images of same dimension as the original Stims, we could end up consuming a lot of memory. At some point we should add an economy config variable that determines how intermediate files are handled/stored. We'll then need to go over all existing Extractors and make sure they condition properly on that setting.

Add multi-step Converters

To really unlock the potential of the graph API, we need to support implicit conversion between Stim types that involve multiple steps--e.g., VideoStim to ComplexTextStim via an extracted AudioStim. There are (at least) two ways we could go about this:

Recursively try to construct valid paths from the input Stim to the output Stim, and stop as soon as one is found. E.g., suppose we pass a VideoStim to a TextExtractor. Then get_converter would search all possible paths from VideoStim to TextStim until it found VideoStim --> AudioStim --> ComplexTextStim.
Manually add Converter classes for all valid paths, which explicitly call the full chain internally. E.g., we would write a new VideoToComplexTextStimConverter with a _convert method that explicitly uses a VideoToAudioConverter class, then an AudioToTextConverter.

In principle, (1) is the cleaner and more extensible approach. But it introduces completely unnecessary computation when the number of valid paths between Stims is small (as it currently is). The main disadvantage of (2) is if we add many more Stim types, we could end up with combinatorial explosion.

I guess for now I favor (2), and if it starts to get unwieldy, we can move to (1).

This is a high-priority issue that we should try to get done before revamping the README, because it would be nice to be able to show a Graph example where the user only has to worry about the leaf nodes (all of which are Extractors), and doesn't have to explicitly think about the Converters.

Identify key frames in videos based on magnitude of difference between frames

Many of the APIs only work on images, but we want to process videos by passing in individual frames. To keep processing efficient (and costs low for paid services), we want to pass in as few frames as we can get away with. Rather than processing every Nth frame, we could take the diff between every two frames and identify frames where the scene changes to a significant degree. This could be a method implemented in VideoStim that could be called by any API-based extractor that loops over frames.

Should implicit conversion output CollectionStimMixins?

Say we are passing an AudioStim through a LengthExtractor, which takes TextStim inputs. The implicit conversion will look for converters that go audio->text. However, most of the converters will instead have AudioStim->ComplexTextStim specified.

Should the implicit conversion also look for conversions to collection stimuli who's elements are of LengthExtractor's input type? Either way it may be a good idea to put an element_type specification in all CollectionStimMixins.

Alternatively, which we coincidentally have implemented now, we could just have converters specify AudioStim->TextStim, (even though they actually output ComplexTextStim) and have the logic in transformers.py take over from there.

Multiple stims in API requests

A few of the API's impose a request limit, with no penalty for including several stimuli in one request. Therefore it is much more efficient to chunk stimuli into single API calls. Currently, each API converter/extractor is written to request using one stimulus at a time. This may be resolved by improving the graph module to automatically handle collections of stimuli.

Improve test coverage

We now have working continuous integration testing via travis-ci; the coveralls report is here. We're not doing too badly, but we should be able to get to 95%+ coverage without too much work. Additionally, as a secondary priority, many of the earliest tests I wrote are overly broad, and could stand to be refactored.

Improved docs: examples, tutorial and/or user guide

Currently the quickstart doc only provides the bare minimum of information about what the package does and how it runs. Pretty much any doc contributions would be great at this point. The easiest place to start might be by adding example Jupyter notebooks illustrating usage for different stimuli. A more comprehensive tutorial would also be nice. Ultimately we want to have a comprehensive user guide, but that can probably wait on #4.

Rename target to _input_type

For consistency and clarity, we should use _input_type and _output_type attributes to identify the expected types of all Stim inputs (and for Converters, the expected returned type).

Distinguish between Stim source and name

There's some ambiguity over what a Stim name means. Right now it defaults to the filename, but it's probably a good idea to separately track the source file and name. This becomes an issue mainly in the context of graphs, where we might want to propagate the initial source file to a Stim as it flows through the graph (e.g., annotated text extracted from a VideoStim should retain some indication of the original video file).

Some ideas to add music based feature extraction using music information retrieval tools

Here are some examples:

https://github.com/jsawruk/pymir

http://www.ifs.tuwien.ac.at/~schindler/lectures/MIR_Feature_Extraction.html

Generalize Indico API extractor to handle lists of models

The current implementation only takes a single name of a model as input; we should be able to pass in, e.g., ['sentiment', 'emotion'], and have the extractor return features for all valid models.

add SRT support

It would be useful to support text feature extraction from subtitle files.

Update all Extractors to use the new ExtractorResult class for output

Most of the Extractors haven't been updated yet to reflect the move from Value/Event/Timeline to a single ExtractorResult class. We should finish this ASAP.

Add api extractors for Google Vision and Speech APIs

A high priority (perhaps the highest?) for new extractors should be the Google Cloud Vision API and Cloud Speech API. These will probably deliver much better performance than most of the other APIs we currently interface with or are considering. We may want to consider creating a separate module just for Google APIs, since they share a common Python interface (the Google API Python client, which we can wrap).

dummy code nominal values when exporting Timelines

Some extractors return string values. Users should have the option of automatically having these dummy-coded as binary columns when exporting or converting to pandas DFs.

Stop using opencv for image/movie loading

Currently ImageStims and VideoStims are loaded via opencv, which imposes an unnecessary (and difficult-to-install) dependency. OpenCV should only be imported when running extractors that depend on it; we should find an alternative solution for reading in stimuli. For images we could use scipy.misc.imread. Not sure about movies, but I think MoviePy might be the way to go.

include data with package

Many data files useful for extraction/annotation can be repackaged under their current license. This is particularly true of word norms (e.g., frequency, emotional valence and intensity, etc.), which can be included in the package to make text feature extraction much more useful out of the box. Key data files should be bundled with the package (or maintained in a separate submodule).

Add tests of all remote dictionaries

We need some testing code that iterates over all dictionaries listed in dictionaries.json and makes sure they're all still available and work properly.

Quickstart needs update

from featurex.stims import VideoStim
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-fd7d648ab17b> in <module>()
----> 1 from featurex.stims import VideoStim

ImportError: No module named stims

Wishlist: Movie frame cropping before content labeling

While exporing the Google vision API I found that it makes a big difference if movie frames are cropped (freed of any horizontal bars) before labeling. Without cropping they get "Screenshot" labels, but after cropping more of the actual content is tagged.

add option to retain original result dictionary in API extractors

For the Google extractors (and possibly other API extractors), we currently flatten the returned JSON object into a one-level dictionary. This makes life easy when working with pandas DFs, but users could potentially want direct access to the original result. This will require adding a new attribute to ExtractorResult, maybe called something like response, that can optionally be set when the instance is initialized.

Alternatively, we could have a generic metadata attribute on ExtractorResult that is itself a dictionary, which would allow different kinds of Extractors to set different kinds of metadata.

Add pipelines / chained extractors

A fairly common potential use case involves chaining multiple extractors--e.g., transcribing the audio track from a movie, and then feeding it into a DictionaryExtractor. Currently there's no automatic way to convert the results returned from one extractor and converting them into a Stim to feed into another. We should add a scikit-learn-like pipeline module that allows easy chaining of extractors.

Switch to py.test for all testing

Switch to py.test
Simplify tests--we probably don't need the wrapper classes
Drop all unittest assertions in favor of just assert

add API keys for extractors to travis-ci

Fix OpenCV dependency in Python 3

Some tests currently fail because OpenCV was difficult to install on Python 3 until recently. There now appears to be a conda installer, so we should fix the travis config to properly install OpenCV on both Python 2 and 3 (and make sure the tests pass).

Allow inputing multiple API keys via a CSV file?

Pro: easy to use
Con: security liability

add opencv to travis-ci

OpenCV (and/or its Python bindings) doesn't install properly on the travis env, so cv2-dependent tests fail.

wide format dataframes

Durations in wide format data frames repeat if multiple values are extracted (e.g., from indico API). For srt file types, text is not provided.

add part-of-speech tagging

Wrap nltk's part-of-speech tagging and return a set of binary column features for, e.g., the universal part-of-speech tagset.

Implement new _transform methods in BatchTransformerMixins

Now that we have added the functionality thanks to #69 we should implement batch processing for transformers that can gain from it. This includes a fair chunk of the API and Google transformers.

Values shouldn't be nested

When multiple Transformers are applied to a single Stim, the returned Value objects are nested, such that the keys in the top-level Value.data dict are Transformer names, and the values are other Value instances (whose data attribute is a normal dictionary of values). This is counter-intuitive and kind of horrendous. The returned top-level object should probably be either a plain dict, or some new container class (e.g., ValueList).

document existing feature extractors

Most of the existing feature extractors have no docstrings. Add them.