Giter VIP home page Giter VIP logo

tensor4ml's Introduction

Tensor4ML

MIT License Python 3.7 GitHub stars

Made by Xinyu Chen • 🌐 https://xinychen.github.io

Tensor Decomposition for Machine Learning (Tensor4ML). This article summarizes the development of tensor decomposition models and algorithms in the literature, offering comprehensive reviews and tutorials on topics ranging from matrix and tensor computations to tensor decomposition techniques across a wide range of scientific areas and applications. Since the decomposition of tensors is often formulated as an optimization problem, this article also provides a preliminary introduction to some classical methods for solving convex and nonconvex optimization problems. This work aims to offer valuable insights to both the machine learning and data science communities by drawing strong connections with the key concepts related to tensor decomposition. To ensure reproducibility and sustainability, we provide resources such as datasets and Python implementations, primarily utilizing Python’s numpy library.


In a hurry? Please check out our contents as follows.

  • Introduction
    • Tensor decomposition in the past 10-100 years
    • Tensor decomposition in the past decade
  • What Are Tensors?
    • Tensors in algebra & machine learning
    • Tensors in data science
  • Foundation of Tensor Computations
    • Norms
    • Matrix trace
    • Kronecker product
    • Khatri-Rao product
    • Modal product
    • Outer product
    • Derivatives
  • Foundation of Optimization
    • Gradient descent methods
    • Power iteration
    • Alternating minimization
    • Alternating direction method of multipliers
    • Greedy methods for -norm minimization
    • Bayesian optimization

Our Research

▴ Back to top

We conduct extensive experiments on some real-world data sets:

  • Middle-scale data sets:

    • PeMS (P) registers traffic speed time series from 228 sensors over 44 days with 288 time points per day (i.e., 5-min frequency). The tensor size is 228 x 288 x 44.
    • Guanghzou (G) contains traffic speed time series from 214 road segments in Guangzhou, China over 61 days with 144 time points per day (i.e., 10-min frequency). The tensor size is 214 x 144 x 61.
    • Electricity (E) records hourly electricity consumption transactions of 370 clients from 2011 to 2014. We use a subset of the last five weeks of 321 clients in our experiments. The tensor size is 321 x 24 x 35.
  • Large-scale PeMS traffic speed data set registers traffic speed time series from 11160 sensors over 4/8/12 weeks (for PeMS-4W/PeMS-8W/PeMS-12W) with 288 time points per day (i.e., 5-min frequency) in California, USA. You can download this data set and place it at the folder of ../datasets.

    • Data size:
      • PeMS-4W: 11160 x 288 x 28 (contains about 90 million observations).
      • PeMS-8W: 11160 x 288 x 56 (contains about 180 million observations).
    • Data path example: ../datasets/California-data-set/pems-4w.csv.
    • Open data in Python with Pandas:
import pandas as pd

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)

mats

mats is a project in the tensor learning repository, and it aims to develop machine learning models for multivariate time series forecasting. In this project, we propose the following low-rank tensor learning models:

We write Python codes with Jupyter notebook and place the notebooks at the folder of ../mats. If you want to test our Python code, please run the notebook at the folder of ../mats. Note that each notebook is independent on others, you could run each individual notebook directly.

The baseline models include:

We write Python codes with Jupyter notebook and place the notebooks at the folder of ../baselines. If you want to test our Python code, please run the notebook at the folder of ../baselines. The notebook which reproduces algorithm on large-scale data sets is emphasized by Large-Scale-xx.

📖 Reproducing Literature in Python

▴ Back to top

We reproduce some tensor learning experiments in the previous literature.

Year Title PDF Authors' Code Our Code Status
2015 Accelerated Online Low-Rank Tensor Learning for Multivariate Spatio-Temporal Streams ICML 2015 Matlab code Python code Under development
2016 Scalable and Sound Low-Rank Tensor Learning AISTATS 2016 - xx Under development

📖 Tutorial

▴ Back to top

We summarize some preliminaries for better understanding tensor learning. They are given in the form of tutorial as follows.

  • Foundations of Python Numpy Programming

  • Foundations of Tensor Computations

    • Kronecker product
  • Singular Value Decomposition (SVD)

If you find these codes useful, please star (★) this repository.

Helpful Material

▴ Back to top

We believe that these material will be a valuable and useful source for the readers in the further study or advanced research.

  • Vladimir Britanak, Patrick C. Yip, K.R. Rao (2006). Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Academic Press. [About the book]

  • Ruye Wang (2010). Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis. Cambridge University Press. [PDF]

  • J. Nathan Kutz, Steven L. Brunton, Bingni Brunton, Joshua L. Proctor (2016). Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM. [About the book]

  • Yimin Wei, Weiyang Ding (2016). Theory and Computation of Tensors: Multi-Dimensional Arrays. Academic Press.

  • Steven L. Brunton, J. Nathan Kutz (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press. [PDF] [data & code]

Quick Run

▴ Back to top

  • If you want to run the code, please
    • download (or clone) this repository,
    • open the .ipynb file using Jupyter notebook,
    • and run the code.

Citing

▴ Back to top

This repository is from the following paper, please cite our paper if it helps your research.

Acknowledgements

▴ Back to top

This research is supported by the Institute for Data Valorization (IVADO).

License

▴ Back to top

This work is released under the MIT license.

tensor4ml's People

Contributors

xinychen avatar vadermit avatar yxnchen avatar

Stargazers

 avatar Tobias Schnabel avatar Rhett avatar  avatar HandsomeLee avatar  avatar 喻洁 avatar Huan Xu avatar  avatar  avatar  avatar Hayes avatar  avatar Siobhan avatar Linki avatar  avatar Ian Chen avatar  avatar aishijiedexiao7 avatar  avatar yutianchen avatar  avatar  avatar WYZ avatar  avatar  avatar Feng Tan avatar Evgeny Noi avatar Tianzong avatar  avatar wzy-sjtu avatar  avatar yuuu avatar Zichen Wang avatar  avatar Zichen Wang avatar KatMiaaao avatar T_Y_H avatar  avatar Ray Zhang avatar wangling avatar Spider Man avatar Wang wen avatar VjanH avatar lixixibj avatar Jiaxing Ye avatar  avatar  avatar YuXiangLin avatar  avatar  avatar Chris Kennedy avatar Jason Poulos avatar Yaqi Wang avatar  avatar Lmdd-Noob avatar Ryan Grosskopf avatar yangyang zhao avatar Tianbo Diao avatar xnnjw avatar  avatar S.Wei avatar yechafengyun avatar  avatar  avatar Yang He avatar Jian Kang avatar ZiyangXu avatar lshenae avatar Shierting avatar  avatar Yibo avatar  avatar  avatar Yin Shuo avatar Neonjoker avatar  avatar Fan avatar Tong YUAN avatar  avatar  avatar  avatar  avatar  avatar David Parada avatar carlos avatar  avatar Debut avatar  avatar  avatar  avatar  avatar  avatar  avatar Huaiyi Zhao avatar  avatar KhaledAlkilane avatar fulowl avatar Krisha avatar Xuan Tong avatar

Watchers

James Cloos avatar  avatar Lor avatar  avatar BruceLee avatar paper2code - bot avatar

tensor4ml's Issues

also 404

Remote HTTP 404: part-03/chapter-01.ipynb not found among 138 files

a bug in LRTC-TNN.ipynb

In the svt_tnn code:

def svt_tnn(mat, alpha, rho, theta):
    tau = alpha / rho
    [m, n] = mat.shape
    if 2 * m < n:
        u, s, v = np.linalg.svd(mat @ mat.T, full_matrices = 0)
        s = np.sqrt(s)
        idx = np.sum(s > tau)
        mid = np.zeros(idx)
        mid[:theta] = 1
        mid[theta:idx] = (s[theta:idx] - tau) / s[theta:idx]
        return (u[:, :idx] @ np.diag(mid)) @ (u[:, :idx].T @ mat)
    elif m > 2 * n:
        return svt_tnn(mat.T, tau, theta).T # this svt_tnn lack an argument. :( It only has 3 aurgements. 
    u, s, v = np.linalg.svd(mat, full_matrices = 0)
    idx = np.sum(s > tau)
    vec = s[:idx].copy()
    vec[theta:idx] = s[theta:idx] - tau
    return u[:, :idx] @ np.diag(vec) @ v[:idx, :]

The error shows:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 7
      5 epsilon = 1e-4
      6 maxiter = 200
----> 7 x = LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
      8 # end = time.time()
      9 # print('Running time: %d seconds'%(end - start))

Cell In[8], line 17, in LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
     15 rho = min(rho * 1.05, 1e5)
     16 for k in range(len(dim)):
---> 17     X[k] = mat2ten(svt_tnn(ten2mat(Z - T[k] [/](https://file+.vscode-resource.vscode-cdn.net/) rho, k), alpha[k], rho, int(np.ceil(theta * dim[k]))), dim, k)
     18 Z[pos_missing] = np.mean(X + T [/](https://file+.vscode-resource.vscode-cdn.net/) rho, axis = 0)[pos_missing]
     19 T = T + rho * (X - np.broadcast_to(Z, np.insert(dim, 0, len(dim))))

Cell In[6], line 13, in svt_tnn(mat, alpha, rho, theta)
     11     return (u[:, :idx] @ np.diag(mid)) @ (u[:, :idx].T @ mat)
     12 elif m > 2 * n:
---> 13     return svt_tnn(mat.T, tau, theta).T
     14 u, s, v = np.linalg.svd(mat, full_matrices = 0)
     15 idx = np.sum(s > tau)

TypeError: svt_tnn() missing 1 required positional argument: 'theta'

404

part1 is no where

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.