Giter VIP home page Giter VIP logo

skflow's Introduction

Python PyPI DOI CII Best Practices OpenSSF Scorecard Fuzzing Status Fuzzing Status OSSRank Contributor Covenant TF Official Continuous TF Official Nightly

Documentation
Documentation

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working within the Machine Intelligence team at Google Brain to conduct research in machine learning and neural networks. However, the framework is versatile enough to be used in other areas as well.

TensorFlow provides stable Python and C++ APIs, as well as a non-guaranteed backward compatible API for other languages.

Keep up-to-date with release announcements and security updates by subscribing to [email protected]. See all the mailing lists.

Install

See the TensorFlow install guide for the pip package, to enable GPU support, use a Docker container, and build from source.

To install the current release, which includes support for CUDA-enabled GPU cards (Ubuntu and Windows):

$ pip install tensorflow

Other devices (DirectX and MacOS-metal) are supported using Device plugins.

A smaller CPU-only package is also available:

$ pip install tensorflow-cpu

To update TensorFlow to the latest version, add --upgrade flag to the above commands.

Nightly binaries are available for testing using the tf-nightly and tf-nightly-cpu packages on PyPi.

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'

For more examples, see the TensorFlow tutorials.

Contribution guidelines

If you want to contribute to TensorFlow, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

We use GitHub issues for tracking requests and bugs, please see TensorFlow Forum for general questions and discussion, and please direct specific questions to Stack Overflow.

The TensorFlow project strives to abide by generally accepted best practices in open-source software development.

Patching guidelines

Follow these steps to patch a specific version of TensorFlow, for example, to apply fixes to bugs or security vulnerabilities:

  • Clone the TensorFlow repo and switch to the corresponding branch for your desired TensorFlow version, for example, branch r2.8 for version 2.8.
  • Apply (that is, cherry-pick) the desired changes and resolve any code conflicts.
  • Run TensorFlow tests and ensure they pass.
  • Build the TensorFlow pip package from source.

Continuous build status

You can find more community-supported platforms and configurations in the TensorFlow SIG Build community builds table.

Official Builds

Build Type Status Artifacts
Linux CPU Status PyPI
Linux GPU Status PyPI
Linux XLA Status TBA
macOS Status PyPI
Windows CPU Status PyPI
Windows GPU Status PyPI
Android Status Download
Raspberry Pi 0 and 1 Status Py3
Raspberry Pi 2 and 3 Status Py3
Libtensorflow MacOS CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Linux CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Linux GPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Windows CPU Status Temporarily Unavailable Nightly Binary Official GCS
Libtensorflow Windows GPU Status Temporarily Unavailable Nightly Binary Official GCS

Resources

Learn more about the TensorFlow community and how to contribute.

Courses

License

Apache License 2.0

skflow's People

Contributors

anprox avatar bnaul avatar cbonnett avatar dansbecker avatar dfd avatar dgboy2000 avatar dustindorroh avatar dvbuntu avatar elqursh avatar frol avatar gramhagen avatar ilblackdragon avatar ivallesp avatar liyongsea avatar lopuhin avatar makseq avatar mihaimaruseac avatar mrry avatar nicolasfauchereau avatar okoriko avatar suryabhupa avatar terrytangyuan avatar thepelkus avatar ziky90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skflow's Issues

Model Persistence

Is there any way of storing a model? I'm trying to pickle it with pickle and dill but it does not work...

Thank you

How could I adapt the CNN model for 1 - dimension input datasets?

Hi,

I have a dataset with 24 inputs and 1 categorical output, so I am trying to adapt the example https://github.com/google/skflow/blob/master/examples/text_classification_character_cnn.py to my case.

However, in the example, I saw

byte_list = tf.reshape(skflow.ops.one_hot_matrix(X, 256), 
        [-1, MAX_DOCUMENT_LENGTH, 256, 1])

which I do not know how should I adapt to my code? Could you please help?

My data looks like:

input1 input2 ... input_n  output
2 1.2 ... -0.44 "b"
1 0.2 ... 3.2 "f"
3 1 ... 2.1 "a"

missing import

in your example for multioutput regression, it seems that you should add the following import command, so the code will work properly

import skflow.ops

MP

Can't import skflow or run tutorials

Great effort by Google to simplify using TensorFlow.
I use command
pip install git+git://github.com/google/skflow.git
to install it on my Mac OS 10.11 inside a VirtualEnv where the TF is also installed there, but I can't import the skflow
I checked V.Env python site-packages and the skflow folder is there.

Will be happy if I can test it, because I'm also eager to provide a wrapper for TF and I want to join this Library.

>>> import skflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "skflow.py", line 15, in <module>
    classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10],
AttributeError: 'module' object has no attribute 'TensorFlowDNNClassifier'

Add support for validation sets

It would be nice if skflow had some support for validation sets to be used for early stopping and monitor validation set loss during training. This could be realized failry easily by adding a fraction_validationset to the TensorFlowEstimator. Within fit, the given training set could then be split into two parts.

Early stopping , epoch and unsupervised learning.

Hi, this is not an issue but wanted to get clear on some points or may be dumb questions.
I couldn't see parameters for early stopping and epoch (maybe missed it) But I saw steps. Is that equivalent to epoch or iterations.

Also, is unsupervised learning supported like in clustering , word2vec, self-taught type feature learning. Or there are future plans.

Add Neural Translation example

Show an example of Neural Translation (Sequence to Sequence) model, which can showcase:

  • Work with text and embeddings
  • Streaming of inputs
  • Multi-dimensional outputs.

Turn off dropout at predict time

Currently, dropout is still applied at test time, which leads to incorrect results.

The proposed solution is to gather all dropout probability nodes and feed dict the probability 0 when run predict.

python setup.py install broke other packages in os X

When I run the python setup.py install the first time it went well but after I ran the python setup.py installl all the modules that skflow depends to were broken. The following is the detail information.

Python 2.7.11 (default, Dec 23 2015, 12:23:20) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/__init__.py", line 180, in <module>
    from . import add_newdocs
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/core/__init__.py", line 58, in <module>
    from numpy.testing import Tester
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/testing/__init__.py", line 14, in <module>
    from .utils import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/numpy/testing/utils.py", line 15, in <module>
    from tempfile import mkdtemp
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/tempfile.py", line 32, in <module>
    import io as _io
  File "build/bdist.macosx-10.11-x86_64/egg/io/__init__.py", line 16, in <module>

  File "skflow/__init__.py", line 17, in <module>
    import tensorflow as tf
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 43, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.core.framework.graph_pb2 import *
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 8, in <module>
    from google.protobuf import reflection as _reflection
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/google/protobuf/reflection.py", line 58, in <module>
    from google.protobuf.internal import python_message as message_impl
  File "/Users/tsangbosco/.pyenv/versions/2.7.11/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 53, in <module>
    from io import BytesIO
ImportError: cannot import name BytesIO

Error while running text_classification*

Hi,

I'm trying to run the text_classification examples and I get the following error.

TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.

The stack trace is below:

Traceback (most recent call last):
File "text_classification.py", line 82, in
classifier.fit(X_train, y_train, logdir='/tmp/tf_examples/word_rnn')
File "/usr/local/lib/python2.7/site-packages/skflow/estimators/base.py", line 186, in fit
self._setup_training()
File "/usr/local/lib/python2.7/site-packages/skflow/estimators/base.py", line 119, in _setup_training
tf.histogram_summary("X", self._inp)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/summary_ops.py", line 39, in histogram_summary
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_summary_ops.py", line 34, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 405, in apply_op
(prefix, types_lib.as_dtype(input_arg.type).name))
TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.

Am I doing something wrong? I installed tensorflow and skflow as mentioned in the medium tutorial.

Get probabilities for ALL the classes

I'm looking at the text_classification.py.

classifier.predict(X-test) gets the class number with the highest probability. But I wonder how to get the probabilities for all the classes per input.

Thanks in advance!

Error raised when importing skflow with tensorflow 0.6.0 installed

Tensorflow 0.6.0 is not officially released, but importing skflow with tensorflow 0.6.0 installed raises an error.


AttributeError Traceback (most recent call last)
in ()
----> 1 import skflow

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/init.py in ()
27
28 from skflow.trainer import TensorFlowTrainer
---> 29 from skflow import models, data_feeder
30 from skflow import preprocessing
31

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/models.py in ()
16 import tensorflow as tf
17
---> 18 from skflow.ops import mean_squared_error_regressor, softmax_classifier, dnn
19
20

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/ops/init.py in ()
16
17 from skflow.ops.conv_ops import *
---> 18 from skflow.ops.dnn_ops import *
19 from skflow.ops.embeddings_ops import *
20 from skflow.ops.losses_ops import *

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/skflow/ops/dnn_ops.py in ()
17
18 import tensorflow as tf
---> 19 from tensorflow.models.rnn import linear
20
21

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/models/rnn/linear.py in ()
24 import tensorflow as tf
25
---> 26 linear = tf.nn.linear

AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'linear'

TensorFlowEstimator.restore Error: tensorflow.python.pywrap_tensorflow.StatusNotOK

Restore is still not working for the text_classification.py example. I am getting the following exception:

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
Traceback (most recent call last):
  File "/Users/harsimranb/PycharmProjects/TensorFlowTest/TextRNN.py", line 103, in <module>
    runner.run_classification("amigo what time you close?")
  File "/Users/harsimranb/PycharmProjects/TensorFlowTest/TextRNN.py", line 56, in run_classification
    classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 332, in restore
    estimator._restore(path)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 314, in _restore
    self._saver.restore(self._session, checkpoint_path)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 891, in restore
    sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 368, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 446, in _do_run
    six.reraise(e_type, e_value, e_traceback)
  File "/Users/harsimranb/Library/Python/2.7/lib/python/site-packages/tensorflow/python/client/session.py", line 428, in _do_run
    target_list)
tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.

Related to #40

Training hidden_units of DNN with skflow

Hello all

As I know, there are several ways to optimize DNN. I think the most important parameters for DNN should be hidden_units.

Is it possible to somehow find best hidden_units automatically?

how to install scipy, scikit and skflow (using pip) with proper dependencies to python 2.7 and how to compile skflow examples using the bazel

very good tutorial. I tried installing tensoflow using the pip and i successfully installed and i tried to run the rnn examples but that folder will not be installed when we use pip. So i got suggestion to install bazel to compile and build the rnn examples.Obviously i installed the python and numpy inorder to run the tensorflow examples and i am able to compile and run the rnn examples successfully.
To follow your tutorial obviously i need to install scipy, scikit and and finally skflow. I am using python 2.7. Could you please tell us how to install scipy , scikit and skflow with proper dependencies (using pip) and finally after installation, how to run skflow examples using bazel (command for compiling and running skflow examples)

Error while using *early_stopping*

Hello

Today I tried to use early_stopping feature (as here), and I found that my current skflow cannot find the keyword early_stopping_rounds.

I tried to reinstall skflow but it cannot upgrade because the version are the same (0.0.1). I have to first uninstall skflow then install it again.

So I think it would be good to update the version of skflow to update easier.

Loss scores are different for contiguous run of fit() for 200 steps and 4 runs of fit() for 50 steps

I am doing regression with DNN.

Final MSE for contiguous run of 200 steps: 1.45781016655
Final MSE for 4 runs with 50 steps each: 1.44524233948

Score for contiguous run:
Step #1, epoch #1, avg. loss: 27.95941
Step #21, epoch #21, avg. loss: 5.64051
Step #41, epoch #41, avg. loss: 1.78990
Step #61, epoch #61, avg. loss: 1.53639
Step #81, epoch #81, avg. loss: 1.49865
Step #101, epoch #101, avg. loss: 1.48255
Step #121, epoch #121, avg. loss: 1.47312
Step #141, epoch #141, avg. loss: 1.46747
Step #161, epoch #161, avg. loss: 1.46394
Step #181, epoch #181, avg. loss: 1.46122

Score for 4 runs 50 steps each:
Step #1, epoch #1, avg. loss: 27.95941
Step #6, epoch #6, avg. loss: 13.49244
Step #11, epoch #11, avg. loss: 4.11436
Step #16, epoch #16, avg. loss: 2.69326
Step #21, epoch #21, avg. loss: 2.26197
Step #26, epoch #26, avg. loss: 2.02976
Step #31, epoch #31, avg. loss: 1.79997
Step #36, epoch #36, avg. loss: 1.71287
Step #41, epoch #41, avg. loss: 1.61699
Step #46, epoch #46, avg. loss: 1.56702

Step #51, epoch #1, avg. loss: 1.52925
Step #56, epoch #6, avg. loss: 1.52344
Step #61, epoch #11, avg. loss: 1.51318
Step #66, epoch #16, avg. loss: 1.50661
Step #71, epoch #21, avg. loss: 1.50114
Step #76, epoch #26, avg. loss: 1.49584
Step #81, epoch #31, avg. loss: 1.49099
Step #86, epoch #36, avg. loss: 1.48698
Step #91, epoch #41, avg. loss: 1.48371
Step #96, epoch #46, avg. loss: 1.48097

Step #101, epoch #1, avg. loss: 1.47760
Step #106, epoch #6, avg. loss: 1.47609
Step #111, epoch #11, avg. loss: 1.47386
Step #116, epoch #16, avg. loss: 1.47201
Step #121, epoch #21, avg. loss: 1.47048
Step #126, epoch #26, avg. loss: 1.46914
Step #131, epoch #31, avg. loss: 1.46795
Step #136, epoch #36, avg. loss: 1.46686
Step #141, epoch #41, avg. loss: 1.46591
Step #146, epoch #46, avg. loss: 1.46506

Step #151, epoch #1, avg. loss: 1.46384
Step #156, epoch #6, avg. loss: 1.46348
Step #161, epoch #11, avg. loss: 1.46276
Step #166, epoch #16, avg. loss: 1.46212
Step #171, epoch #21, avg. loss: 1.46144
Step #176, epoch #26, avg. loss: 1.46086
Step #181, epoch #31, avg. loss: 1.46028
Step #186, epoch #36, avg. loss: 1.45976
Step #191, epoch #41, avg. loss: 1.45914
Step #196, epoch #46, avg. loss: 1.45857

Feedback on tutorial 3 - titanic embarked embedding

Thanks for the tutorial. It was very easy to follow. Like many might do, I tested my new knowledge by playing with the other categorical variables but this led to a difficult to understand stack trace.

To reproduce, just replace "Embarked" with "Pclass" in the lines where X and `embarked_classes" get assigned. When trying to fit the model this will throw index out of range errors.

I eventually fixed it by increasing n_classes to unique classes plus 1. I think the error stems from the embedding wanting a row for nan (the Embarked column has missing values but Pclass does not) but I could not get my head around the code to confirm the actual source.

I am only mentioning this as it is a beginners tutorial and the most basic fix is a comment in the code to explain that n_classes needs to account for nan (assuming I am understanding the error correctly). I also wonder if n_classes should just get handled inside categorical_variable().

Input 'Values' of HistogramSummary Op Type Mismatch

Reproducible example: (master branch with update for dask stuff)

from skflow.io import *
import skflow
from sklearn import datasets
import random

random.seed(42)
iris = datasets.load_iris()
data = pd.DataFrame(iris.data)
data = dd.from_pandas(data, npartitions=2)
labels = pd.DataFrame(iris.target)
labels = dd.from_pandas(labels, npartitions=2)
classifier = skflow.TensorFlowLinearClassifier(n_classes=3)
classifier.fit(data, labels)

gives:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-8e0c6d4b2deb> in <module>()
     11 labels = dd.from_pandas(labels, npartitions=2)
     12 classifier = skflow.TensorFlowLinearClassifier(n_classes=3)
---> 13 classifier.fit(data, labels)

/Library/Python/2.7/site-packages/skflow-0.0.1-py2.7.egg/skflow/estimators/base.pyc in fit(self, X, y, logdir)
    185         if not self.continue_training or not self._initialized:
    186             # Sets up model and trainer.
--> 187             self._setup_training()
    188             # Initialize model parameters.
    189             self._trainer.initialize(self._session)

/Library/Python/2.7/site-packages/skflow-0.0.1-py2.7.egg/skflow/estimators/base.pyc in _setup_training(self)
    118             # Add histograms for X and y if they are floats.
    119             if self._data_feeder.input_dtype in (np.float32, np.float64):
--> 120                 tf.histogram_summary("X", self._inp)
    121             if self._data_feeder.output_dtype in (np.float32, np.float64):
    122                 tf.histogram_summary("y", self._out)

/Library/Python/2.7/site-packages/tensorflow/python/ops/summary_ops.pyc in histogram_summary(tag, values, collections, name)
     37   with ops.op_scope([tag, values], name, "HistogramSummary") as scope:
     38     val = gen_summary_ops._histogram_summary(
---> 39         tag=tag, values=values, name=scope)
     40     _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
     41   return val

/Library/Python/2.7/site-packages/tensorflow/python/ops/gen_summary_ops.pyc in _histogram_summary(tag, values, name)
     32   """
     33   return _op_def_lib.apply_op("HistogramSummary", tag=tag, values=values,
---> 34                               name=name)
     35
     36

/Library/Python/2.7/site-packages/tensorflow/python/ops/op_def_library.pyc in apply_op(self, op_type_name, g, name, **keywords)
    403             if input_arg.type != types_pb2.DT_INVALID:
    404               raise TypeError("%s expected type of %s." %
--> 405                               (prefix, types_lib.as_dtype(input_arg.type).name))
    406             else:
    407               raise TypeError(

TypeError: Input 'values' of 'HistogramSummary' Op has type float64 that does not match expected type of float32.

Not sure exactly what happened but I tried to add more supported types in https://github.com/tensorflow/skflow/blob/master/skflow/estimators/base.py#L119
-- still getting this error.

Anything I missed?

simple linear classification example results in AttributeError in pandas_io.py

I just installed Scikit Flow and tried to execute the simple linear classification example. Unfortunately this does not work:

/home/chris/anaconda3/lib/python3.4/site-packages/skflow/io/pandas_io.py in extract_pandas_data(data)
     27 def extract_pandas_data(data):
     28     """Extract data from pandas.DataFrame for predictors"""
---> 29     if not isinstance(data, pd.DataFrame):
     30         return data
     31 

AttributeError: 'module' object has no attribute 'DataFrame'

I'm using Python 3.4, Tensorflow 0.6, Sckit-Learn 0.17 and Pandas 0.16.2

TensorFlowLinearRegressor unable to recover weights?

I'm hoping that I just overlooked something but I can't seem recover the correct weights for simple linear test cases using TensorFlowLinearRegressor. Here's a gist of the full test file (with pytest) and matching test case using scikit-learn LinearRegression. As the comment in the gist says, the scikit module recovers the weights faithfully while skflow returns odd weights.

Package details:
Python 3.5
tensorflow 0.6.0
skflow 0.0.1

I coded up a quick linear regression module myself using tensorflow as backend (using the SGD optimizer) and it was able to recover the weights using the same iterations, learning rate, etc... as default TensorFlowLinearRegressor so I'm curious what I'm doing wrong with skflow (or the test case) or if there's an issue somewhere.

Thanks for the comprehensive work on the library. Looking forward to using it a bunch.

OSX Multi threading

Is there any way of determining the number of cores in OSX? I think it calculates it automatically but in my case, I see in the system Monitor a Python process with a %CPU of 150% while other algorithms like XGBOOST mark over 300%. Could I set it manually?

Note that I'm executing the iris_custom_model algorithm with a larger dataset

Thank you

How to use Doc2Vec instead of default document preprocessing of skflow

Hi,

In the text classification example with RNN, I think the document is represented by bag-of-words method.

I want to apply Doc2Vec method, then I have X_train.shape is (20000,500) and X_test shape is (5000,500) with the values are float.

Then I applied

def rnn_model(X, y):
    """Recurrent neural network model to predict from sequence of words
    to a class."""
    # Convert indexes of words into embeddings.
    # This creates embeddings matrix of [n_words, EMBEDDING_SIZE] and then
    # maps word indexes of the sequence into [batch_size, sequence_length,
    # EMBEDDING_SIZE].
    word_vectors = skflow.ops.categorical_variable(X, n_classes=n_words,
        embedding_size=EMBEDDING_SIZE, name='words')
    # Split into list of embedding per word, while removing doc length dim.
    # word_list results to be a list of tensors [batch_size, EMBEDDING_SIZE].
    word_list = skflow.ops.split_squeeze(1, MAX_DOCUMENT_LENGTH, word_vectors)
    # Create a Gated Recurrent Unit cell with hidden size of EMBEDDING_SIZE.
    cell = rnn_cell.GRUCell(EMBEDDING_SIZE)
    # Create an unrolled Recurrent Neural Networks to length of
    # MAX_DOCUMENT_LENGTH and passes word_list as inputs for each unit.
    _, encoding = rnn.rnn(cell, word_list, dtype=tf.float32)
    # Given encoding of RNN, take encoding of last step (e.g hidden size of the
    # neural network of last step) and pass it as features for logistic
    # regression over output classes.
    return skflow.models.logistic_regression(encoding[-1], y)

classifier = skflow.TensorFlowEstimator(model_fn=rnn_model, n_classes=15,
    steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)

# Continuesly train for 1000 steps & predict on test set.
while True:
    classifier.fit(X_train, y_train, logdir='/tmp/tf_examples/word_rnn')
    score = metrics.accuracy_score(classifier.predict(X_test), y_test)
    print('Accuracy: {0:f}'.format(score))

then I met an error

TypeError: DataType float32 for attr 'Tindices' not in list of allowed values: int32, int64

What is a better way to apply Doc2Vec with RNN using skflow?

DataFeeder to read serialized numpy arrays

Numpy arrays can be serialized to disk and it's possible to do random seeks into them.
Implementing a DataFeeder for such data format will remove requirement to have full dataset in the memory and still do random seeks for sampling of batches.

TensorFlowEstimator.restore error for text_classification.py example

I am using the text_classification.py example. I have added the following code to save and restore a model:

To save:

classifier = skflow.TensorFlowEstimator(model_fn=self.rnn_model, n_classes=15,
                                                    steps=1000, optimizer='Adam', learning_rate=0.01,
                                                    continue_training=True

To restore:

classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)

On restore, I'm getting the following error:

Traceback (most recent call last):
  File "/Users/me/PycharmProjects/TensorFlowTest/TextRNN.py", line 56, in run_classification
    classifier = skflow.TensorFlowEstimator.restore(self.trained_model_path)
  File "/Users/me/Library/Python/2.7/lib/python/site-packages/skflow/__init__.py", line 317, in restore
    estimator = eval(model_def) 
  File "<string>", line 2
    model_fn=<bound method TextRNN.rnn_model of <__main__.TextRNN object at 0x10ef3f450>>,
             ^
SyntaxError: invalid syntax

Within the restore method, model_def looks like this:

TensorFlowEstimator(batch_size=32, continue_training=True, learning_rate=0.01,
          model_fn=<bound method TextRNN.rnn_model of <__main__.TextRNN object at 0x10ef3f450>>,
          n_classes=15, num_cores=4, optimizer='Adam', steps=1000,
          tf_master='', tf_random_seed=42, verbose=1)

GridSearchCV is not work in TensorFlowDNNClassifier

Hello

I try DNNMetaParameter optimization using GridSearchCV..

but not wok it

Statck Trace..

File "/usr/local/lib/python2.7/site-packages/sklearn/grid_search.py", line 804, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/usr/local/lib/python2.7/site-packages/sklearn/grid_search.py", line 553, in _fit
for parameters in parameter_iterable
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 804, in call
while self.dispatch_one_batch(iterator):
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 662, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 570, in _dispatch
job = ImmediateComputeBatch(batch)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 183, in init
self.results = batch()
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in call
return [func(_args, *_kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib/python2.7/site-packages/skflow/init.py", line 163, in fit
self._setup_training()
File "/usr/local/lib/python2.7/site-packages/skflow/init.py", line 113, in _setup_training
self._inp, self._out)
File "/usr/local/lib/python2.7/site-packages/skflow/models.py", line 73, in dnn_estimator
layers = dnn(X, hidden_units)
File "/usr/local/lib/python2.7/site-packages/skflow/ops/dnn_ops.py", line 41, in dnn
for i, n_units in enumerate(hidden_units):
TypeError: 'NoneType' object is not iterable

Problem

"hidden_units" is None Type in GridSearchCV

"hidden_units" is member of TensorFlowEstimator but not member of TensorFlowDNNClassifier

https://github.com/google/skflow/blob/master/skflow/__init__.py#L415

GridSearchCV can'find "hidden_units"

Implement fit/predict to support iterators

Many datasets can't fit into memory, and TF doesn't actually requires the whole dataset in the memory.
Instead of loading the dataset into memory, it's possible to stream it using iterators and feed data that way.

Fit/predict functions in Estimators should be able to take X and y as iterators and read from them while processing data.

'module' object has no attribute 'rnn_cell'

I'm trying the tutorial on Medium (https://medium.com/@ilblackdragon/tensorflow-tutorial-part-2-9ffe47049c92#.608vwpu2a) and the first DNN have raised this error:

Traceback (most recent call last):
  File "skflow_test.py", line 32, in <module>
    deep.fit(X_train, y_train)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/base.py", line 189, in fit
    self._setup_training()
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/base.py", line 128, in _setup_training
    self._inp, self._out)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/estimators/dnn.py", line 76, in _model_fn
    models.logistic_regression)(X, y)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/models.py", line 91, in dnn_estimator
    layers = dnn(X, hidden_units)
  File "/Users/metjush/anaconda/lib/python2.7/site-packages/skflow/ops/dnn_ops.py", line 39, in dnn
    tensor_in = tf.nn.rnn_cell.linear(tensor_in, n_units, True)
AttributeError: 'module' object has no attribute 'rnn_cell'

Is this a problem with my tensorflow installation? Or is there something different amiss?
I have an up-to-date installation of TF, this is what happens when I run pip install tensorflow --upgrade:

Requirement already up-to-date: tensorflow in /Users/metjush/anaconda/lib/python2.7/site-packages
Requirement already up-to-date: six>=1.10.0 in /Users/metjush/anaconda/lib/python2.7/site-packages (from tensorflow)
Requirement already up-to-date: numpy>=1.9.2 in /Users/metjush/anaconda/lib/python2.7/site-packages (from tensorflow)

This is my Python setup:

Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, Sep 15 2015, 14:29:08) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin

DataFeeder: Sampling without replacement

Hi,

  1. I suspect there is a error in _feed_dict_fn.
    Why don't you save used sample indices? They could be repeated when you call _feed_dict_fn() again in the same epoch.
    sample = random.randint(0, self.X.shape[0] - 1)
    inp[i, :] = self.X[sample, :]

  2. It is not an epoch but only one step for one batch.
    for step in xrange(steps):
    feed_dict = feed_dict_fn()
    global_step, loss, _ = sess.run([self.global_step, self.loss, self.trainer], feed_dict=feed_dict)

I think, it must be something like this:
for step in xrange(steps):
for i in xrange(X.shape[0]/batch_size):
feed_dict = feed_dict_fn()
global_step, loss, _ = sess.run([self.global_step, self.loss, self.trainer], feed_dict=feed_dict)

Am I right? Or I have an incorrect guess?

Plotting neural network built by skflow

Hi,

Sorry I asked too much.

I think plotting is always a nice feature. Is it possible right now for skflow (or can we do that through tensorflow directly)?

Multi-threaded feed dict

Currently, feed dicts are working in the same thread as main training thread, which slows down the training loop by the time it takes to process and sample record.

A better option would be to run sampling in the thread and feed into Queue and then main thread will just take full batches out of the queue.

As a performance test, would be interesting to get a match between speed of sklearn.linear_model.LogisticRegression and skflow.TensorFlowLinearClassifier.

An error following TensorFlow Tutorial — Part 3

Hi, I love the idea of skflow and am trying to learn it :)
When I was running the exactly same code on the blog (https://medium.com/@ilblackdragon/tensorflow-tutorial-part-3-c5fc0662bc08#.jsxv1w8n9) in my local, I got the error below. I thought that the class variables would need to be float instead of integer and tried that but it didn't solve. Could someone help?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-62de22c2cf78> in <module>()
     15 classifier = skflow.TensorFlowEstimator(model_fn=categorical_model,
     16     n_classes=2)
---> 17 classifier.fit(X_train, y_train)
     18 
     19 print("Accuracy: {0}".format(metrics.accuracy_score(classifier.predict(X_test), y_test)))

/Users/a/anaconda/lib/python2.7/site-packages/skflow/estimators/base.pyc in fit(self, X, y, logdir)
    166         if not self.continue_training or not self._initialized:
    167             # Sets up model and trainer.
--> 168             self._setup_training()
    169             # Initialize model parameters.
    170             self._trainer.initialize(self._session)

/Users/a/anaconda/lib/python2.7/site-packages/skflow/estimators/base.pyc in _setup_training(self)
    102 
    103             # Add histograms for X and y.
--> 104             tf.histogram_summary("X", self._inp)
    105             tf.histogram_summary("y", self._out)
    106 

/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/summary_ops.pyc in histogram_summary(tag, values, collections, name)
     37   with ops.op_scope([tag, values], name, "HistogramSummary") as scope:
     38     val = gen_summary_ops._histogram_summary(
---> 39         tag=tag, values=values, name=scope)
     40     _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
     41   return val

/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/gen_summary_ops.pyc in _histogram_summary(tag, values, name)
     32   """
     33   return _op_def_lib.apply_op("HistogramSummary", tag=tag, values=values,
---> 34                               name=name)
     35 
     36 

/Users/a/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.pyc in apply_op(self, op_type_name, g, name, **keywords)
    403             if input_arg.type != types_pb2.DT_INVALID:
    404               raise TypeError("%s expected type of %s." %
--> 405                               (prefix, types_lib.as_dtype(input_arg.type).name))
    406             else:
    407               raise TypeError(

TypeError: Input 'values' of 'HistogramSummary' Op has type int64 that does not match expected type of float32.

Default random seed is 42 rather than defaulting to a random random seed

Hi,

I've been working through some of the tutorials and they often use random.seed() at the beginning. I tried playing with this value to see how it effected the output of a DNN and it doesn't change anything.

A little digging found that by default tf_random_seed in dnn.py is 42 and it must be specified when TensorFlowDNNClassifier() in created if you want anything other than 42.

I found this somewhat confusing and, unless there are other reasons for setting a default random seed in dnn.py (as opposed to the user doing this in their code), I would argue that the default behaviour should leave tf_random_seed undefined. eg tf_random_seed=None instead of tf_random_seed=42.

Thanks for making this awesome project and tutorials, i'm finding them really helpful in my exploration of machine learning!

Class weight support

Hi,

I am using skflow.ops.dnn to classify two - classes dataset (True and False). The percentage of True example is very small, so I have an imbalanced dataset.

It seems to me that one way to resolve the issue is to use weighted classes. However, when I look to the implementation of skflow.ops.dnn, I do not know how could I do weighted classes with DNN.

Is it possible to do that with skflow, or is there another technique to deal with imbalanced dataset problem in skflow?

Thanks

Error on TensorFlowEstimator.restore (tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.)

There's an error on restoring model when I executing a code below.

import skflow
from sklearn import datasets, metrics

iris = datasets.load_iris()
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=3)
classifier.fit(iris.data, iris.target)
classifier.save('test_model')
new_clf = skflow.TensorFlowEstimator.restore('test_model')
score = metrics.accuracy_score(new_clf.predict(iris.data), iris.target)
print("Accuracy: %f" % score)

Traceback (most recent call last):
File "dnn_save.py", line 8, in
new_clf = skflow.TensorFlowEstimator.restore('test_model')
File "/usr/local/lib/python2.7/dist-packages/skflow/init.py", line 353, in restore
estimator._restore(path)
File "/usr/local/lib/python2.7/dist-packages/skflow/init.py", line 325, in _restore
self._saver.restore(self._session, checkpoint_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 864, in restore
sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 404, in _do_run
target_list)
tensorflow.python.pywrap_tensorflow.StatusNotOK: Internal: Unable to get element from the feed.

Add Saver support

Support Saver:

  • Setting logdir for saving checkpoints.
  • Restoring model if checkpoints already exist.
  • Update examples like text_classification to use this.

Support categorical variables out-of-the-box

Currently, if a dataset has categorical variables like gender or education - it requires additional processing to get it into one-hot form. Embeddings allow to use IDs to lookup distributed representations for categories. It should be easy to use and combine with regular features.

Support reading HDF5

HDF5 is a popular format to store complicated datasets. It also supports different random seekings.
It would be awesome to have a full support for it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.