Giter VIP home page Giter VIP logo

torchdata's Introduction

torchdata Logo


Version Docs Tests Coverage Style PyPI Python PyTorch Docker Roadmap
Version Documentation Tests Coverage codebeat PyPI Python PyTorch Docker Roadmap

torchdata is PyTorch oriented library focused on data processing and input pipelines in general.

It extends torch.utils.data.Dataset and equips it with functionalities known from tensorflow.data like map or cache (with some additions unavailable in aforementioned).

All of that with minimal interference (single call to super().__init__()) in original PyTorch's datasets.

Functionalities overview:

  • Use map, apply, reduce or filter
  • cache data in RAM/disk/your own method (even partially, say first 20%)
  • Full PyTorch's Dataset and IterableDataset support (including torchvision)
  • General torchdata.maps like Flatten or Select
  • Extensible interface (your own cache methods, cache modifiers, maps etc.)
  • Concrete torchdata.datasets designed for file reading and other general tasks

Quick examples

  • Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels:
import torchdata
import torchvision

class Images(torchdata.Dataset): # Different inheritance
    def __init__(self, path: str):
        super().__init__() # This is the only change
        self.files = [file for file in pathlib.Path(path).glob("*")]

    def __getitem__(self, index):
        return Image.open(self.files[index])

    def __len__(self):
        return len(self.files)


images = Images("./data").map(torchvision.transforms.ToTensor()).cache()

You can concatenate above dataset with another (say labels) and iterate over them as per usual:

for data, label in images | labels:
    # Do whatever you want with your data
  • Cache first 1000 samples in memory, save the rest on disk in folder ./cache:
images = (
    ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor())
    # First 1000 samples in memory
    .cache(torchdata.modifiers.UpToIndex(1000, torchdata.cachers.Memory()))
    # Sample from 1000 to the end saved with Pickle on disk
    .cache(torchdata.modifiers.FromIndex(1000, torchdata.cachers.Pickle("./cache")))
    # You can define your own cachers, modifiers, see docs
)

To see what else you can do please check torchdata documentation

Installation

pip

Latest release:

pip install --user torchdata

Nightly:

pip install --user torchdata-nightly

Docker

CPU standalone and various versions of GPU enabled images are available at dockerhub.

For CPU quickstart, issue:

docker pull szymonmaszke/torchdata:18.04

Nightly builds are also available, just prefix tag with nightly_. If you are going for GPU image make sure you have nvidia/docker installed and it's runtime set.

Contributing

If you find any issue or you think some functionality may be useful to others and fits this library, please open new Issue or create Pull Request.

To get an overview of thins one can do to help this project, see Roadmap

torchdata's People

Contributors

imgbotapp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.