lezcano / geotorch Goto Github PK

Constrained optimization toolkit for PyTorch

Home Page: https://geotorch.readthedocs.io

License: MIT License

Python 100.00%

pytorch constrained-optimization manifold-optimization orthogonality invertible-neural-networks positive-definite-matrices positive-semi-definite low-rank

geotorch's People

Contributors

Stargazers

Watchers

geotorch's Issues

flattened_orthogonal consumes a huge amount of memory

#10 I just tried this on WideResNet 28-10 which originally needs ~4.5GB GPU memory, but after using flattened_orthogonal,

RuntimeError: CUDA out of memory. Tried to allocate 2.47 GiB (GPU 0; 11.91 GiB total capacity; 10.13 GiB already allocated; 534.94 MiB free; 10.81 GiB reserved in total by PyTorch)

CUDA_VISIBLE_DEVICES=0 python train_geo.py --dataset cifar10

# wideresnet_geo.py
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
import geotorch
from numpy import prod

def size_flattened(size, dim):
    size = list(size)
    size_dim = size[dim]
    size[dim] = 1
    return (size_dim, prod(size))


class FlattenedStiefel(geotorch.Stiefel):
    def __init__(self, size, triv="expm"):
        super().__init__(size_flattened(size, 0), triv)
        self.size = size

    def forward(self, X):
        X = X.flatten(1)
        X = super().forward(X)
        X = X.view(self.size)
        return X

    def initialize_(self, X, check_in_manifold=True):
        X = X.flatten(1)
        X = super().initialize_(X, check_in_manifold)
        X = X.view(self.size)
        return X

    def sample(self, distribution="uniform", init_=None):
        X = super().sample(distribution, init_)
        X = X.view(self.size)
        return X


def flattened_orthogonal(module, tensor_name="weight", triv="expm"):
    return geotorch.constraints._register_manifold(module, tensor_name, FlattenedStiefel, triv)

class BasicBlock(nn.Module):
    def __init__(self, in_planes, out_planes, stride, dropRate=0.0):
        super(BasicBlock, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_planes)
        self.relu1 = nn.ReLU(inplace=True)
        self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        if in_planes * 9 >= out_planes:
            flattened_orthogonal(self.conv1, "weight")
        self.bn2 = nn.BatchNorm2d(out_planes)
        self.relu2 = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1,
                               padding=1, bias=False)
        if in_planes * 9 >= out_planes:
            flattened_orthogonal(self.conv2, "weight")
        self.droprate = dropRate
        self.equalInOut = (in_planes == out_planes)
        self.convShortcut = (not self.equalInOut) and nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride,
                               padding=0, bias=False) or None
    def forward(self, x):
        if not self.equalInOut:
            x = self.relu1(self.bn1(x))
        else:
            out = self.relu1(self.bn1(x))
        out = self.relu2(self.bn2(self.conv1(out if self.equalInOut else x)))
        if self.droprate > 0:
            out = F.dropout(out, p=self.droprate, training=self.training)
        out = self.conv2(out)
        return torch.add(x if self.equalInOut else self.convShortcut(x), out)

class NetworkBlock(nn.Module):
    def __init__(self, nb_layers, in_planes, out_planes, block, stride, dropRate=0.0):
        super(NetworkBlock, self).__init__()
        self.layer = self._make_layer(block, in_planes, out_planes, nb_layers, stride, dropRate)
    def _make_layer(self, block, in_planes, out_planes, nb_layers, stride, dropRate):
        layers = []
        for i in range(int(nb_layers)):
            layers.append(block(i == 0 and in_planes or out_planes, out_planes, i == 0 and stride or 1, dropRate))
        return nn.Sequential(*layers)
    def forward(self, x):
        return self.layer(x)

class WideResNet(nn.Module):
    def __init__(self, depth, num_classes, widen_factor=1, dropRate=0.0):
        super(WideResNet, self).__init__()
        nChannels = [16, 16*widen_factor, 32*widen_factor, 64*widen_factor]
        assert((depth - 4) % 6 == 0)
        n = (depth - 4) / 6
        block = BasicBlock
        # 1st conv before any network block
        self.conv1 = nn.Conv2d(3, nChannels[0], kernel_size=3, stride=1,
                               padding=1, bias=False)
        # 1st block
        self.block1 = NetworkBlock(n, nChannels[0], nChannels[1], block, 1, dropRate)
        # 2nd block
        self.block2 = NetworkBlock(n, nChannels[1], nChannels[2], block, 2, dropRate)
        # 3rd block
        self.block3 = NetworkBlock(n, nChannels[2], nChannels[3], block, 2, dropRate)
        # global average pooling and classifier
        self.bn1 = nn.BatchNorm2d(nChannels[3])
        self.relu = nn.ReLU(inplace=True)
        self.fc = nn.Linear(nChannels[3], num_classes)
        self.nChannels = nChannels[3]

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                continue
                # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.bias.data.zero_()
    def forward(self, x):
        out = self.conv1(x)
        out = self.block1(out)
        out = self.block2(out)
        out = self.block3(out)
        out = self.relu(self.bn1(out))
        out = F.avg_pool2d(out, 8)
        out = out.view(-1, self.nChannels)
        return self.fc(out)

# train_geo.py
import argparse
import os
import shutil
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.autograd import Variable

from wideresnet_geo import WideResNet

# used for logging to TensorBoard
from tensorboard_logger import configure, log_value

parser = argparse.ArgumentParser(description='PyTorch WideResNet Training')
parser.add_argument('--dataset', default='cifar10', type=str,
                    help='dataset (cifar10 [default] or cifar100)')
parser.add_argument('--epochs', default=200, type=int,
                    help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int,
                    help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=128, type=int,
                    help='mini-batch size (default: 128)')
parser.add_argument('--lr', '--learning-rate', default=0.1, type=float,
                    help='initial learning rate')
parser.add_argument('--momentum', default=0.9, type=float, help='momentum')
parser.add_argument('--nesterov', default=True, type=bool, help='nesterov momentum')
parser.add_argument('--weight-decay', '--wd', default=5e-4, type=float,
                    help='weight decay (default: 5e-4)')
parser.add_argument('--print-freq', '-p', default=10, type=int,
                    help='print frequency (default: 10)')
parser.add_argument('--layers', default=28, type=int,
                    help='total number of layers (default: 28)')
parser.add_argument('--widen-factor', default=10, type=int,
                    help='widen factor (default: 10)')
parser.add_argument('--droprate', default=0, type=float,
                    help='dropout probability (default: 0.0)')
parser.add_argument('--no-augment', dest='augment', action='store_false',
                    help='whether to use standard augmentation (default: True)')
parser.add_argument('--resume', default='', type=str,
                    help='path to latest checkpoint (default: none)')
parser.add_argument('--name', default='WideResNet-28-10', type=str,
                    help='name of experiment')
parser.add_argument('--tensorboard',
                    help='Log progress to TensorBoard', action='store_true')
parser.set_defaults(augment=True)

best_prec1 = 0

def main():
    global args, best_prec1
    args = parser.parse_args()
    if args.tensorboard: configure("runs/%s"%(args.name))

    # Data loading code
    normalize = transforms.Normalize(mean=[x/255.0 for x in [125.3, 123.0, 113.9]],
                                     std=[x/255.0 for x in [63.0, 62.1, 66.7]])

    if args.augment:
        transform_train = transforms.Compose([
        	transforms.ToTensor(),
        	transforms.Lambda(lambda x: F.pad(x.unsqueeze(0),
        						(4,4,4,4),mode='reflect').squeeze()),
            transforms.ToPILImage(),
            transforms.RandomCrop(32),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
            ])
    else:
        transform_train = transforms.Compose([
            transforms.ToTensor(),
            normalize,
            ])
    transform_test = transforms.Compose([
        transforms.ToTensor(),
        normalize
        ])

    kwargs = {'num_workers': 1, 'pin_memory': True}
    assert(args.dataset == 'cifar10' or args.dataset == 'cifar100')
    train_loader = torch.utils.data.DataLoader(
        datasets.__dict__[args.dataset.upper()]('../data', train=True, download=True,
                         transform=transform_train),
        batch_size=args.batch_size, shuffle=True, **kwargs)
    val_loader = torch.utils.data.DataLoader(
        datasets.__dict__[args.dataset.upper()]('../data', train=False, transform=transform_test),
        batch_size=args.batch_size, shuffle=True, **kwargs)

    # create model
    model = WideResNet(args.layers, args.dataset == 'cifar10' and 10 or 100,
                            args.widen_factor, dropRate=args.droprate)

    # get the number of model parameters
    print('Number of model parameters: {}'.format(
        sum([p.data.nelement() for p in model.parameters()])))

    # for training on multiple GPUs.
    # Use CUDA_VISIBLE_DEVICES=0,1 to specify which GPUs to use
    model = torch.nn.DataParallel(model).cuda()
    # model = model.cuda()

    # optionally resume from a checkpoint
    if args.resume:
        if os.path.isfile(args.resume):
            print("=> loading checkpoint '{}'".format(args.resume))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.resume, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(args.resume))

    cudnn.benchmark = True

    # define loss function (criterion) and optimizer
    criterion = nn.CrossEntropyLoss().cuda()
    optimizer = torch.optim.SGD(model.parameters(), args.lr,
                                momentum=args.momentum, nesterov = args.nesterov,
                                weight_decay=args.weight_decay)

    # cosine learning rate
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, len(train_loader)*args.epochs)

    for epoch in range(args.start_epoch, args.epochs):
        # train for one epoch
        train(train_loader, model, criterion, optimizer, scheduler, epoch)

        # evaluate on validation set
        prec1 = validate(val_loader, model, criterion, epoch)

        # remember best prec@1 and save checkpoint
        is_best = prec1 > best_prec1
        best_prec1 = max(prec1, best_prec1)
        save_checkpoint({
            'epoch': epoch + 1,
            'state_dict': model.state_dict(),
            'best_prec1': best_prec1,
        }, is_best)
    print('Best accuracy: ', best_prec1)

    for p in model.parameters():
        if p.dim() == 4:
            p_2d = p.data.view(p.shape[0], -1)
            print(p.shape, p_2d.mm(p_2d.t()).sub(torch.eye(p_2d.shape[0], device=p.device)).norm())

def train(train_loader, model, criterion, optimizer, scheduler, epoch):
    """Train for one epoch on the training set"""
    batch_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()

    # switch to train mode
    model.train()

    end = time.time()
    for i, (input, target) in enumerate(train_loader):
        target = target.cuda(non_blocking=True)
        input = input.cuda(non_blocking=True)

        # compute output
        output = model(input)
        loss = criterion(output, target)

        # measure accuracy and record loss
        prec1 = accuracy(output.data, target, topk=(1,))[0]
        losses.update(loss.data.item(), input.size(0))
        top1.update(prec1.item(), input.size(0))

        # compute gradient and do SGD step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        scheduler.step()

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        if i % args.print_freq == 0:
            print('Epoch: [{0}][{1}/{2}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})'.format(
                      epoch, i, len(train_loader), batch_time=batch_time,
                      loss=losses, top1=top1))
    # log to TensorBoard
    if args.tensorboard:
        log_value('train_loss', losses.avg, epoch)
        log_value('train_acc', top1.avg, epoch)

def validate(val_loader, model, criterion, epoch):
    """Perform validation on the validation set"""
    batch_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()

    # switch to evaluate mode
    model.eval()

    end = time.time()
    for i, (input, target) in enumerate(val_loader):
        target = target.cuda(non_blocking=True)
        input = input.cuda(non_blocking=True)

        # compute output
        with torch.no_grad():
            output = model(input)
        loss = criterion(output, target)

        # measure accuracy and record loss
        prec1 = accuracy(output.data, target, topk=(1,))[0]
        losses.update(loss.data.item(), input.size(0))
        top1.update(prec1.item(), input.size(0))

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        if i % args.print_freq == 0:
            print('Test: [{0}/{1}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})'.format(
                      i, len(val_loader), batch_time=batch_time, loss=losses,
                      top1=top1))

    print(' * Prec@1 {top1.avg:.3f}'.format(top1=top1))
    # log to TensorBoard
    if args.tensorboard:
        log_value('val_loss', losses.avg, epoch)
        log_value('val_acc', top1.avg, epoch)
    return top1.avg


def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    """Saves checkpoint to disk"""
    directory = "runs/%s/"%(args.name)
    if not os.path.exists(directory):
        os.makedirs(directory)
    filename = directory + filename
    torch.save(state, filename)
    if is_best:
        shutil.copyfile(filename, 'runs/%s/'%(args.name) + 'model_best.pth.tar')

class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

if __name__ == '__main__':
    main()

Can symmetric parameterization used in CNN?

Can symmetric parameterization used in CNN? For example, a symmetric kernel applied on a symmetric matrix through convolution operation.

Thanks for the amazing libary!

Posing orthogonality condition on convolution weights in v0.3.0

Hi,

I am struggling to impose orthogonality constraints on the weights of convolutional layers as was also asked in Issue 10, where we unfold the weight matrix (in, out, k ,k) as as matrix (in, out * k * k) on which we pose orthogonality. If I try to run the code from @lezcano's answer, I get:

line 28, in _register_manifold tensor.copy_(X) RuntimeError: The size of tensor a (3) must match the size of tensor b (20) at non-singleton dimension 3

Trying to run the other answer from @bokveizen, I get:

raise InManifoldError(X, self) geotorch.exceptions.InManifoldError: Tensor not contained in FlattenedStiefel(n=180, k=40, triv=linalg_matrix_exp, transposed). Got: tensor([[[[ ...

I tried rewriting the in_manifold check to run self.forward() first, but the code still gives RuntimeError.

What works is rewinding to the commit from a year ago, however, I would like to reproduce the same functionality in [email protected].

Thanks for help!

Code from the answer of @bokveizen in Issue 10:

from numpy import prod
import torch
import torch.nn as nn
import geotorch


def size_flattened(size, dim):
    size = list(size)
    size_dim = size[dim]
    size[dim] = 1
    return (size_dim, prod(size))


class FlattenedStiefel(geotorch.Stiefel):
    def __init__(self, size, triv="expm"):
        # We asume that you want to flatten the dimensions [1:n]
        # See the comment in forward for why we keep the dim=1 and the dim=0
        super().__init__(size_flattened(size, 0), triv)
        # size = (out, in, k, k) so we transpose it
        self.size = size
        # size = list(size)
        # size[0], size[1] = size[1], size[0]
        # self.size = tuple(size)

    def forward(self, X):
        # The weight of a CNN of with params (in, out, k ,k)
        # is of size (out, in, k, k), so we transpose it before flattening it
        # X = X.T
        X = X.flatten(1)
        X = super().forward(X)
        X = X.view(self.size)
        # return X.T
        return X

    def initialize_(self, X, check_in_manifold=True):
        # X = X.T
        X = X.flatten(1)
        X = super().initialize_(X, check_in_manifold)
        X = X.view(self.size)
        # return X.T
        return X

    def sample(self, distribution="uniform", init_=None):
        X = super().sample(distribution, init_)
        X = X.view(self.size)
        # return X.T
        return X


def flattened_orthogonal(module, tensor_name="weight", triv="expm"):
    return geotorch.constraints._register_manifold(module, tensor_name, FlattenedStiefel, triv)


layer = nn.Conv2d(20, 40, 3, 3)  # Make the kernels orthogonal
flattened_orthogonal(layer, "weight")
print(layer)

W = layer.weight  # W has size (40, 20, 3, 3)
# W = W.T.flatten(1)  # W has size (20, 360) with orthogonal rows
# W = W.T # W has size (360, 20) with orthogonal columns
# # Check that W.T @ W = Id
# print(torch.allclose(W.T @ W, torch.eye(40), atol=1e-4))
W = W.flatten(1)
print(torch.allclose(W @ W.T, torch.eye(40), atol=1e-4))

Use 'geotorch.orthogonal' (and similarly 'torch.nn.utils.parametrizations.orthogonal') on a linear layer but got 'grad = None' of the weight of the layer

when i use the tool 'geotorch.orthogonal' in my code, the debug show that 'linear.weight' will become 'linear.parametrization' and 'linear.parametrization.grad=None'
I wonder if it's normal and whether this will affect the gradient and optimize of the linear layer parmeter?

i supposed it was the usage of 'geotorch' caused the non grad case, because when i don't include that line the layer have grad

torch.qr is deprecated for torch==1.8.0

As per the recent torch=1.8 release, torch.qr is now deprecated to torch.linalg.qr

https://pytorch.org/docs/stable/generated/torch.qr.html#torch.qr

Parameterization of Orthogonal Group

Hello! From my understanding, when using matrix exponential to parameterize orthogonal matrix, it can only realize the special orthogonal group (with determinant = 1). However, I find out that geotorch.orthogonal() can realize orthogonal matrix with determinant = -1. So, I wonder if you can point me to the code/paper to understand the method better.

In general, I am also interested in continuous optimization over disconnected components. Does it mean the solution must lie in the same component as in the initialization.

Thanks!

Jiahao

Fail to apply constraints when layer is on cuda

Hi @lezcano, thanks for open-sourcing this wonderful library!

When I try to optimize a SE(3) transformation, I create it as a torch.nn.Linear layer and apply orthogonal constraint on its 3x3 weight. Everything works well on cpu, but when I put the Linear layer on cuda I got this error:

Could you help me take a look at this? Thanks!

paper about low-rank

Hi, I am doing research based on low-rank manifold. I have difficulty in understanding the idea behind your implementation of low-rank. So I wonder is there any paper explaining it? I know you have paper for orthogonality, but I'm not sure whether you have paper for low-rank. If you have, could you please tell me?

Constraints API doesn't work when OpenCV is imported

Hi, thanks for convenient library. I may have found an issue or unintended behaviour.

Description

Constrains API seems to fail when cv2 is imported.

Minimal example

import torch, geotorch

class Test(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.param = torch.nn.Linear(3, 1)
        geotorch.sphere(self.param)        
        print(self.param.weight)
        
    def forward(self, x):
        pass
    
t = Test()

This code prints out tensor([[ 0.8085, -0.3293, -0.4876]], grad_fn=<MulBackward0>) as expected.

import torch, geotorch
import cv2

class Test(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.param = torch.nn.Linear(3, 1)
        geotorch.sphere(self.param)        
        print(self.param.weight)
        
    def forward(self, x):
        pass
    
t = Test()

This code prints out tensor([[nan, nan, nan]], grad_fn=<MulBackward0>).

Expected behaviour

Code works the same way regardless of whether cv2 is imported or not.

When I remove the line geotorch.sphere(...), the code works as expected, so I assume it's an issue with geotorch.

Environment

I am using Python 3.8.10. Library versions:

cv2 -> 4.5.4
torch -> 1.11.0+cu102
geotorch -> 0.3.0 (installed with pip install git+https://github.com/Lezcano/geotorch/)

Does Euclidean optimizer fail when the matrix is not skew-symmetric or skew-Hermitian?

If the last two dimensions of tensor are not square, torus_init_ cannot be applied to the tensor.
Then the tensor cannot be filled with skew-symmetric matrix, then exponential map fail, then geometry optimization cannot be transformed into Euclidean optimization. Is that the case?

With respect to non-square matrix, how can geometry optimization be transformed into Euclidean optimization？

Can two parametrizations be used in the same tensor?

Hello,

I am wondering whether two parametrizations can be used on a same tensor. In particular, I want to parametrize a matrix to be both positive definite and in the SL group.

In the example below, I create a class with a matrix that should be both PSD and SLN:

class PrNorm(torch.nn.Module):
    def __init__(self, nDim):
        super().__init__()
        self.B = nn.Parameter(torch.eye(nDim, requires_grad=True))
        geotorch.positive_semidefinite(self, "B")
        geotorch.sln(self, "B")

    def forward(self, x):
        quadratic = torch.einsum('i,ij,j->', x, self.B, x)
        return quadratic

prnorm = PrNorm(nDim)

However, I get the following error when I initialize the class:

InManifoldError: Tensor not contained in PSSD(
  n=7
  (0): Stiefel(n=7, k=7, triv=linalg_matrix_exp)
  (1): Rn(n=7)
). Got

I think that the parametrized tensor gets re-initialized by SLN, and so it is no longer PSD, leading to the error. Is there some way to do something as intended here with geotorch?

Could you tell me how I can apply orthogonality to Conv layers not on kernels but on the flatten (C_out, C_in * k_h * k_w) 2D matrix?

As shown in the docs, when we use geotorch.orthogonal on Conv layers, it requires orthogonality on the kernels. For example, the given example will make all the 800 (20x40) 3x3 matrices orthogonal. However, I want to flatten the weight to a 2D (20, 40x3x3) matrix and make it orthogonal. Then what should I do? Thanks!

layer = nn.Conv2d(20, 40, 3, 3)  # Make the kernels orthogonal
geotorch.orthogonal(layer, "weight")

Initial values seem to be overwritten when parameterization is applied, even if the constraint was already satisfied.

I attempted to constrain a tensor using geotorch.sphere(my_model, "my_param_name", my_radius), but found that after registering the parameterization the original values of my_model.my_param_name (which were already on the sphere) had been significantly perturbed. Is this behavior expected or is it possible that I made a mistake in my usage of the sphere function? It is very important that I start my optimization from the initial values I have pre-selected ([x,y,z] coordinates on a polyhedron), so if it's the norm that registering a constraint agitates the initialization, that will really limit its usefulness in some cases.

SDP Constraints: Constrain the Magnitude of Eigenvalues of a Weighting Matrix

Hello, thank you for the awesome Geotorch tools! I am wondering if it is possible to use Geotorch to constrain the magnitude of the eigenvalues of a (symmetric) weighting matrix W in a Neural Network? These constraints would take the following form (if I want to constrain them to be smaller than 1 in magnitude):

If so, how could these constraints be posed in the code? Thank you!

Does geotorch.orthogonal is a mapping from Euclidean space onto the corresponding manifold space?

Hello, lezcano, I am a beginner in the field of manifolds. I am glad to see your open-source implementation code about manifold structures.
I have the following questions to ask you:
Question 1. I have reviewed your two papers and found that both of them apply orthogonalization to linear layers in code implementation.
If our intermediate output value is X, is the output result also on the corresponding manifold after passing through such an orthogonal constrained linear layer? Finally, get Y on manifold.

For example, the following code:

geotorch/examples/sequential_mnist.py

Line 114 in ba38d40

hidden = self.recurrent_kernel(hidden)

Can I assume that geotorch.orthogonal provides a mapping that allows it to project vectors from Euclidean space onto the corresponding manifold space. X ->Y?

Question 2. I see that you are using geotorch.orthogonal to apply in RNN networks. Can it be used for classification tasks in a regular multi-layer perceptron network? If the geotorch.orthogonality in question 1 is a projection operation, should special operations be performed on the projected values? (Because at this point it is already in manifold space). I see that you used a special nonlinearity layer in the ExpRNNCell class. Is its purpose to project points located on the manifold back into Euclidean space?

Code location:

geotorch/examples/sequential_mnist.py

Line 116 in ba38d40

return self.nonlinearity(out)

Because you are using an RNN network, you are using a nonlinearity layer for the conversion. If I am using a regular multi-layer perceptron network for classification tasks, how can I project the parameterized output values of geotorch.orthogonality onto the Euclidean space?

Question 3: Because I want to project the eigenvectors of Euclidean space (such as the output of a simple linear layer) onto a manifold, and then calculate the distance between different output features on the manifold. May I ask if you have any suggestions? Can we directly use some metric methods in Euclidean space to measure the distance between two different Y value.

Thank you very much for answering these questions, and thank you for opening up such an excellent project!

Upstream Constraints to PyTorch 1.9.X+

Now that PyTorch main stream has the awesome parameterization support you've created, will the constraints like orthogonalization be pushed upstream, or kept separate in this repo? Will the usage change with 1.9 compared to before?

Complex numbers: hermitian and unitary

Hello,

Thanks for the great library!

Do you plan to support cases on complex numbers like hermitian, skew-hermitian and unitary?

Thank you

ParametrizedLinear not Recognized as Tensor by Adam

When initializing the optimizer for a single parameter:

R = nn.Linear(X.shape[1], X.shape[1])
geotorch.orthogonal(R, "weight")
optim = torch.optim.Adam([R], lr=1e-1)

I got the following error:

~/.conda/envs/torch_env/lib/python3.8/site-packages/torch/optim/adam.py in __init__(self, params, lr, betas, eps, weight_decay, amsgrad)
     46         defaults = dict(lr=lr, betas=betas, eps=eps,
     47                         weight_decay=weight_decay, amsgrad=amsgrad)
---> 48         super(Adam, self).__init__(params, defaults)
     49 
     50     def __setstate__(self, state):

~/.conda/envs/torch_env/lib/python3.8/site-packages/torch/optim/optimizer.py in __init__(self, params, defaults)
     50 
     51         for param_group in param_groups:
---> 52             self.add_param_group(param_group)
     53 
     54     def __getstate__(self):

~/.conda/envs/torch_env/lib/python3.8/site-packages/torch/optim/optimizer.py in add_param_group(self, param_group)
    228         for param in param_group['params']:
    229             if not isinstance(param, torch.Tensor):
--> 230                 raise TypeError("optimizer can only optimize Tensors, "
    231                                 "but one of the params is " + torch.typename(param))
    232             if not param.is_leaf:

TypeError: optimizer can only optimize Tensors, but one of the params is geotorch.parametrize.ParametrizedLinear

"Close to" constraint?

Hello,

I'd like to implement a nuclear norm minimization approach for matrix completion using your package. This requires a constraint that |X_{ij} - S_{ij}| < eps where X is the incomplete matrix and S is the optimization variable matrix. Is this possible?

Thanks!

Error on initialization Tensor not contained in PSSD

Randomly, I get the below error at initialization. It happens with PyTorch 1.8.1, and using the example code from the docs

self.M = nn.Parameter(torch.rand(size=(256, 256)))
geotorch.positive_semidefinite(self, "M")

is enough to cause a problem within a Module definition. I've not yet had the error happen after initialization and during training.

Tensor not contained in PSSD(
  n=256
  (0): Stiefel(n=256, k=256, triv=matrix_exp)
  (1): Rn(n=256)
). Got
(tensor([ 3.8247e+00,  3.7747e+00,  3.6989e+00,  3.6000e+00,  3.5833e+00,
         3.5031e+00,  3.4670e+00,  3.4062e+00,  3.4000e+00,  3.3106e+00,
         3.2411e+00,  3.1645e+00,  3.0778e+00,  3.0292e+00,  2.9826e+00,
         2.9379e+00,  2.8955e+00,  2.8642e+00,  2.8168e+00,  2.8006e+00,
         2.7573e+00,  2.7106e+00,  2.6791e+00,  2.6573e+00,  2.6322e+00,
         2.5699e+00,  2.5364e+00,  2.5296e+00,  2.4885e+00,  2.4640e+00,
         2.4263e+00,  2.4049e+00,  2.3801e+00,  2.3288e+00,  2.3193e+00,
         2.2720e+00,  2.2427e+00,  2.2351e+00,  2.2044e+00,  2.1853e+00,
         2.1702e+00,  2.1395e+00,  2.1172e+00,  2.0862e+00,  2.0769e+00,
         2.0350e+00,  2.0008e+00,  1.9624e+00,  1.9416e+00,  1.9255e+00,
         1.8966e+00,  1.8455e+00,  1.8381e+00,  1.8143e+00,  1.8043e+00,
         1.7621e+00,  1.7544e+00,  1.7423e+00,  1.7031e+00,  1.6941e+00,
         1.6761e+00,  1.6508e+00,  1.6293e+00,  1.6204e+00,  1.5964e+00,
         1.5703e+00,  1.5632e+00,  1.5491e+00,  1.5058e+00,  1.4713e+00,
         1.4555e+00,  1.4520e+00,  1.4251e+00,  1.4026e+00,  1.3831e+00,
         1.3698e+00,  1.3429e+00,  1.3210e+00,  1.3175e+00,  1.2850e+00,
         1.2790e+00,  1.2654e+00,  1.2511e+00,  1.2399e+00,  1.2246e+00,
         1.1950e+00,  1.1777e+00,  1.1633e+00,  1.1499e+00,  1.1298e+00,
         1.1138e+00,  1.1045e+00,  1.0955e+00,  1.0744e+00,  1.0565e+00,
         1.0474e+00,  1.0334e+00,  1.0240e+00,  1.0008e+00,  9.9861e-01,
         9.9170e-01,  9.6764e-01,  9.5716e-01,  9.4440e-01,  9.2202e-01,
         9.1461e-01,  9.0652e-01,  8.9787e-01,  8.8027e-01,  8.7056e-01,
         8.6487e-01,  8.3175e-01,  8.1901e-01,  8.1444e-01,  8.0160e-01,
         7.9311e-01,  7.8946e-01,  7.6787e-01,  7.5898e-01,  7.4889e-01,
         7.4063e-01,  7.2527e-01,  7.2042e-01,  6.9469e-01,  6.8660e-01,
         6.7460e-01,  6.6354e-01,  6.5471e-01,  6.4855e-01,  6.3990e-01,
         6.3743e-01,  6.3105e-01,  6.2178e-01,  6.1211e-01,  6.0192e-01,
         5.9400e-01,  5.7176e-01,  5.6133e-01,  5.4624e-01,  5.2755e-01,
         5.2498e-01,  5.1198e-01,  5.0673e-01,  5.0041e-01,  4.9502e-01,
         4.8107e-01,  4.7528e-01,  4.6593e-01,  4.6015e-01,  4.4301e-01,
         4.3815e-01,  4.3327e-01,  4.2120e-01,  4.0957e-01,  3.9892e-01,
         3.9058e-01,  3.8309e-01,  3.8147e-01,  3.7352e-01,  3.6457e-01,
         3.5520e-01,  3.5129e-01,  3.3950e-01,  3.3678e-01,  3.2362e-01,
         3.1534e-01,  3.1092e-01,  3.0481e-01,  2.9021e-01,  2.8767e-01,
         2.8236e-01,  2.7318e-01,  2.6519e-01,  2.6043e-01,  2.5385e-01,
         2.4521e-01,  2.3967e-01,  2.3143e-01,  2.2369e-01,  2.2308e-01,
         2.1247e-01,  2.0372e-01,  2.0049e-01,  1.9481e-01,  1.9068e-01,
         1.8485e-01,  1.8031e-01,  1.7281e-01,  1.7046e-01,  1.6684e-01,
         1.6048e-01,  1.5335e-01,  1.5150e-01,  1.4633e-01,  1.3898e-01,
         1.3572e-01,  1.3068e-01,  1.2853e-01,  1.2517e-01,  1.1826e-01,
         1.1534e-01,  1.1302e-01,  1.0915e-01,  1.0521e-01,  9.6949e-02,
         9.3288e-02,  8.9677e-02,  8.3769e-02,  8.2001e-02,  7.8403e-02,
         7.5017e-02,  7.3773e-02,  6.7579e-02,  6.3032e-02,  6.0493e-02,
         5.7282e-02,  5.4498e-02,  5.1430e-02,  5.0613e-02,  4.7330e-02,
         4.4593e-02,  4.0078e-02,  3.9779e-02,  3.7704e-02,  3.5867e-02,
         3.3504e-02,  3.0591e-02,  2.8566e-02,  2.7471e-02,  2.5519e-02,
         2.3299e-02,  2.1866e-02,  2.1535e-02,  1.9136e-02,  1.6720e-02,
         1.5590e-02,  1.3703e-02,  1.2581e-02,  1.1524e-02,  9.6512e-03,
         8.5254e-03,  8.0568e-03,  6.5441e-03,  5.7102e-03,  4.6747e-03,
         4.3863e-03,  3.5337e-03,  3.0264e-03,  1.3910e-03,  1.2277e-03,
         7.2367e-04,  5.6175e-04,  2.4230e-04,  1.8732e-04,  5.9763e-05,
        -1.0149e-06]), tensor([[-0.0169,  0.0209,  0.0364,  ..., -0.0517, -0.0923, -0.0353],
        [-0.1337, -0.0161, -0.0759,  ...,  0.0348, -0.0387,  0.0934],
        [ 0.0157,  0.0989,  0.0694,  ..., -0.0339,  0.0814, -0.0115],
        ...,
        [-0.0137, -0.0050, -0.0050,  ..., -0.0251, -0.0578, -0.0589],
        [-0.0382, -0.0102,  0.1440,  ..., -0.0455, -0.0192,  0.0240],
        [ 0.0857,  0.0659,  0.0028,  ...,  0.0028, -0.0217, -0.0888]]))

An example of using geotorch for output embeddings

Thank you for coming up with the approach and building this fantastic library. Still working through the paper but was interested if you can hint if and how the library can be build for optimizing the output embeddings of an ANN if we have a constraint that those should be on a unit sphere. In other words can I use this library somehow on the output vector of the last Linear layer to normalize it or is it only usable for weights/biases?

A little bit of background: this will be particularly interesting in a number of applications that work with deep metric learning - those often use Siamese networks and are optimized in Euclidean space during training. Later, during inference cosine similarity is used. This sort of gap is often waved off as "not problematic" in literature but would be nice to experiment with make the proper training procedure (esp. for ADAM-based optimization).

Error when loading modules with a geotorch constraint

I am having trouble loading networks with geotorch constraints. The network saves successfully, but when loading I get errors like:

Can't get attribute 'ParametrizedLinear4551587920' on <module 'geotorch.parametrize' from '/Users/jeffreyh/Library/Python/3.7/lib/python/site-packages/geotorch/parametrize.py'>

This appears if you run the following example twice: once to save and another time to load.

## Minimal working example
import torch
import geotorch
import os
import tempfile

fname = 'test_jmh.pt'

class Module(torch.nn.Module):
    def __init__(self, constrain = True):
        super(Module, self).__init__()
        self.layer = torch.nn.Linear(10,10)
        if constrain:
            geotorch.orthogonal(self.layer, 'weight')

    def forward(self, x):
        return self.layer.forward(x)

def test_save(constrain = True):
    m = Module(constrain)
    torch.save(m,fname)
    print("saved!")

def test_load():
    m2 = torch.load(fname)
    print("loaded!")

if __name__ == '__main__':
    # The following code runs twice: once to save the file
    # the second time to try to load this file
    if not os.path.isfile(fname):
        test_save(constrain = True)
    else:
        try:
            test_load()
        except Exception as e:
            print(e)

            os.remove(fname)

seems to fail with large tensors

Hi,
I was testing geotorch to do some SVD.
unfortunately registering orthogonal parametrization on a large embedding layer (30000x50) takes around 20 mins.
and gets killed when the training starts.
FYI: this is on pytorch(1.10.1) CPU

Is there anything that can change this?

Thanks

Support float64?

There is probably a workaround for this, but I can't figure out how to switch from float32 to float64. The following works fine:

import torch
import geotorch
from torch import nn
n = 10
Q = nn.Linear(n, n, bias=False).float()
geotorch.orthogonal(Q, "weight")
Q(torch.ones(10).float())

But switching to double() causes an error.

import torch
import geotorch
from torch import nn
n = 10
Q = nn.Linear(n, n, bias=False).double()
geotorch.orthogonal(Q, "weight")
Q(torch.ones(10).double())

Specifically,

~/.local/lib/python3.8/site-packages/geotorch/so.py in trivialization(self, X)
     58 
     59     def trivialization(self, X):
---> 60         return self.base @ self.triv(X)
     61 
     62     def uniform_init_(self):

RuntimeError: expected scalar type Float but found Double

geotorch.optim

Don't geotorch need to implement a geometry optimizer?
Why Euclidean torch.optim is stilled used to optimize geometry parameters(such as orthogonal parameters)?

Annoying warning in Pytorch 1.12.

Hi. I am using your library using PyTorch 1.10. It works great! One of my colleagues updated to PyTorch 1.12, and the library still works but, it fills the console with logs that are related with geotorch.

Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass /pathtomyenvironment/python3.9/site-packages/geotorch/so.py:83: UserWarning: An output with one or more elements was resized since it had shape [114, 17, 17], which does not match the required output shape [1, 114, 17, 17]. This behaviour is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1659484775609/work/aten/src/ATen/native/Resize.cpp:24.)

I do not know if it is my fault, with this extra tensor dimension that it is complaining about, but the reality is that everything works and with PyTorch 1.10 no warning at all is shown.

redundant computations?

Hi @lezcano - thanks for this awesome library, and for your contributions to the parameterize and parameterizations modules of torch.nn.utils! I'd been looking for tools like these for quite some time.

I am wondering: after adding a parameterization to a parameter (call it "weight"), do repeated module.weight calls cause redundant computations of the transform op when the base parameter has not changed? Or does the transform use some type of intelligent caching mechanism that checks for changes to the base parameter?

My code has many lazy calls like module.weight and I'm wondering if I need to step back through to factor this in.

Usage of geotorch in Hypernetwork

I am trying to use the orthogonal constrain in a hyernetwork. So effectively I have a Linear network that returns another module that uses a weight matrix that needs to be constrained.

class Functional(nn.Module):
    def __init__(self,
                 weights):
        ''' :param weights: Shape: (batch, out_ch, in_ch)
        '''
        super().__init__()
        self.weights = nn.Parameter(weights.squeeze())
    def forward(self, S,L):
        DD = torch.einsum('ij,jk->ik', S, self.weights)
        return torch.einsum('ij,ij,ij->', DD, DD, L)

class FunctionalModel(nn.Module):
      def __init__(self, weights):
          super(FunctionalModel, self).__init__()
          self.functional = Functional(weights)
          geotorch.orthogonal(self.functional, "weights")
      def forward(self, S,L):
          return self.functional(S,L)

The Hypernetwork returns an object of the FunctionalModel class, however when the geotorch constraint is included the Functionalmodel.functional.weights matrix changes each time the forward method of the hypernetwork is called with the same input (the constraint is however still fulfilled). Is there a way to work around this?
cheers and thank you very much.

ObliqueManifold

Could you provide some guidance on how to create ObliqueManifold(n,k) that you mention in the documentation? Thanks for your help!

Custom initialization for class

I am wondering whether I can initialize the constrained objects myself (for example, I would like to initialize the orthogonal matrix as identical mapping).

Best,

Jiahao

can we expect geojax?

hello there, by any chance are you looking to port geotorch to Jax?
I am trying to do tucker factorization of large sparse tensors, and wanted a pluaggable orthogonality constraint.

thanks
renjith

Constrained tensors are immutable

This problem is similar to #13 , but it still appears in the dev version.

Below is a small reproducible example

import torch.nn as nn
import geotorch
class TestModel(nn.Module):
    def __init__(self, inp, out):
        super(TestModel, self).__init__()
        self.linear = nn.Linear(inp, out, bias=False)
        geotorch.grassmannian(self.linear, 'weight')
    def forward(self, x):
        return self.y @ x
    def initialize(self, W):
        self.linear.weight.data = W

model = TestModel(10, 10)
model.initialize(torch.eye(10, 10))
print(model.linear.weight)

This will output

tensor([[ 0.4076, -0.7197,  0.2221, -0.0862,  0.2790,  0.0037,  0.2593, -0.0412,
          0.2679,  0.2014],
        [ 0.3811,  0.1598, -0.4362,  0.1843,  0.4700,  0.2330,  0.3310,  0.1918,
         -0.2535, -0.3453],
        [ 0.2899,  0.0346,  0.0956, -0.3288,  0.2497, -0.1689, -0.5778,  0.4797,
         -0.3147,  0.2090],
        [-0.4819, -0.2143, -0.1475,  0.4359,  0.4765,  0.2715, -0.3822,  0.0710,
          0.1807,  0.1599],
        [ 0.0939, -0.4457,  0.0543,  0.1005, -0.2818,  0.3427, -0.3295, -0.2776,
         -0.5164, -0.3609],
        [-0.0059,  0.2738,  0.2809, -0.0741,  0.4467,  0.0081,  0.0587, -0.6476,
         -0.3588,  0.2990],
        [ 0.1530, -0.0481, -0.4165, -0.1492,  0.1589, -0.4537, -0.3715, -0.4447,
          0.3176, -0.3330],
        [ 0.0087, -0.1113,  0.1804,  0.6291,  0.0027, -0.6791,  0.0835,  0.0940,
         -0.2826, -0.0490],
        [-0.5572, -0.2248,  0.1368, -0.4524,  0.2850, -0.1927,  0.2348,  0.1440,
         -0.1911, -0.4299],
        [-0.1720, -0.2656, -0.6503, -0.1490, -0.1629, -0.1477,  0.1764, -0.0501,
         -0.3458,  0.5031]], grad_fn=<AliasBackward>)

Is it possible to update constrained tensors after constructing the model?

btw, this is such an awesome repository; all of the other components are working beautifully - I'm just testing out different initialization conditions.

EDIT : updated example to showcase this with the identity matrix.

[feature request] "Multidimensional Shape Constraints"

I just stumbled upon this project, and I'm amazed at the simple interface you have come up with! It would be awesome to have the same type of interface for the "shape constraints" of Gupta et al. (2020). For example, let's say that I want to constrain the output of the final layer of a neural net to be monotonic or unimodal with respect to a particular slice of the input features to the net. Would that be within scope for this project?

(I don't have time to work on a PR now, but I thought I'd at least check if this is even relevant to you all.) Gupta et al. have a tensor-flow implementation: https://github.com/tensorflow/lattice

Either way, thanks for the interesting work!

I am very interested in this subject.

Hi lezcano
Could you provide me with some advice If I want to study the topic of optimization of manifolds and Euclid in the long term.
Now, The only deficiency is I have only a basic knowledge of Riemann manifolds, fiber bundle and topology

Initialization error

Thanks for a great package!

I'm having trouble setting the initial value of a constrained variable. Minimum reproducible example (from the README):

>>> import torch
>>> from torch import nn
>>> import geotorch
>>>
>>> torch.__version__
'1.8.0'
>>> geotorch.__version__
'0.1.0'
>>>
>>> linear = nn.Linear(64, 64)
>>> geotorch.orthogonal(linear, "weight")
>>> linear.weight = torch.eye(64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "$HOME/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 995, in __setattr__
    object.__setattr__(self, name, value)
AttributeError: can't set attribute

In case it matters, I am using Python 3.9.2.

lezcano / geotorch Goto Github PK

geotorch's People

Contributors

Stargazers

Watchers

Forkers

geotorch's Issues

Recommend Projects

Recommend Topics

Recommend Org