Giter VIP home page Giter VIP logo

dxlearn's People

Contributors

alex10151 avatar chengaoyu avatar hong-xiang avatar jyker avatar threebegetsallthings avatar twj2417 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

alex10151

dxlearn's Issues

Clean deprecated code and add more comments are necessary

More and more code were marked as deprecated, if we run

tree -if --noreport -I  "*.pyc" | xargs cat | wc -l

on <project directory>/src/python/dxl/learn, it will report 14794 lines, and we have 211 files.

if we run cloc, we have

     197 text files.
     194 unique files.                                          
      16 files ignored.

http://cloc.sourceforge.net v 1.60  T=1.17 s (164.8 files/s, 12702.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                         189           2522           1569          10675
YAML                             3              1              0             35
-------------------------------------------------------------------------------
SUM:                           192           2523           1569          10710
-------------------------------------------------------------------------------

As a comparison
cloc on kears from master(993a701498a4ac288b12dceb105f10b7fc60c14f) we got

     227 text files.
     227 unique files.                                          
      84 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.88 s (209.2 files/s, 74465.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                         180          10524          13615          41152
YAML                             3             14             23            149
make                             1              7              0             22
-------------------------------------------------------------------------------
SUM:                           184          10545          13638          41323

comparing to larger projects:
lines of large python projects

we need

  • clean deprecate code to reduce total number of code
  • add more comments

Decide Graph API

Substitute implementation of one Graph should be easy. (GOAL of this library).
As a result, we choose to use DI to inject sub-graphs.
The problem is that in most cases, we can not construct a sub-graph completely outside its parent graph, thus we need to consider one possible solution of unified interface to Graph and GraphBuilder.

Things become more complicated when introduce Model, thus reusable Graph, which support __call__ method and will actually produce a new Graph with same variables.

  • Unified Graph and GraphBuilder interface?
  • Unified Graph and Model behavior?

Separate model and train/summary e.t.c?

Boundary between Model and Network is not clear, it seems it was always preferable to define one Model and wraps it with a Network to get train, summary and other methods.

Define a Network class for each Model seems not a good idea, it might be a wrong design.

Instead, should we use some methods to "automatically" generate network for us, thus, using:

m = Model()
trainer = Trainer()
x = trainer.bind(m) 

Question:

  • What's trainable protocol for x, by one additional train method or a tensors['train_step']?
  • Type of x, should it be Model, Graph, or even Tensor?
  • If Model, should x has full ability like m (something like a alias) ๏ผŸ

How to design config?

Although explicit is better than implicit, but passing both config object and individual configs may introduce conflict:

class A:
def __init__(self, a, b, config):
    pass

or if we use implicit:

class A(WithConfig)
def __init__(self, a, b):
    pass

however there might be still problem, for config with name:

class A(ConfigurableWithName)
def __init__(self, name, a, b, config):
    pass

name will couple with config.

Better naming?

Should we use Function for normal operations on tensors, and Model for "trainable" Function?

Function should have methods:

  • inputs
  • outputs

Model should have additional methods:

  • parameters

Add zeros_like OP

zeros_like(x: Tensor[a]) -> Tensor[a]

return a Tensor with same backend (a), shape and dtype as x.

Move data columns to dxdata?

DataColumns is designed to be a OO representation of raw data, which is much more close to data than compute graph, and in most cases, we need to write data processing code in dxl.data. Should we move this class to dxl.data package?

Make `Model` pure in construction?

Delete inputs from arguments of __init__, and forbid binding inputs via self.tensors[key] = x, all inputs should be bind when using __call__ method. If we need to partially bind inputs, use standard python way: partial.

Add drop out support

Required global context manager, thus drop out may depends on some global scalar

is anonymous Cview necessary when Model name is None?

def __init__(self, name, hidden): super().__init__(name)
when name is None add a separate cview without name for temporary config(not link to ConfigProxy)?
def __init__(self, name, hidden): config_node = Cview() if name is None else super().__init__(name)

Add auto calling context detect and attach to father model

Currently defining kernel of Model requires explicit save internal model into self.graphs, in the following syntax:

x = self.graphs.get('sub', SomeModel())

If one model is designed to work internally, we may do not want to save it to self.graphs, also the gamma noise of current implementation might be too high.

Can we implement Model to make it support auto link father and child model, thus:

  1. auto record all model created during call of kernel
  2. auto select and apply corresponding model when applying them to some tensors.

Should we add content manager to Model.kernel method, and add a global method get_current_kernel_context() to return all context information.

Tensor protocol?

A TensorType protocol which provide the following properties:

  • dtype
  • ndim
  • shape (maybe with dynamic axis [None, 128, 128, 3]

A Tensor protocol provide the following properties:

  • [Union[slice, int, None], ...] slice method
  • transpose(List[int])

Just like abstract base class in collections.abc

Notes:

slice demo

class A:
    def __getitem__(self, *args, **kwargs):
        return args, kwargs
>>> A()[:,2,3]

results:

 (((slice(None, None, None), 2, 3),), {})

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.