Giter VIP home page Giter VIP logo

gossipy's People

Contributors

makgyver avatar mohamedlegh avatar purushoath02 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gossipy's Issues

Issue Getting the code to utilize google collab GPU

Hi, I'm Parsa a master's student at the university of Semnan.
I am trying to get gossipy (particularly main_onoszko_2021.py) to run In google collab. although successful; it takes long training times and shows 0 utilization of GPU . that is when CUDA is available for PyTorch. and also GPU is enabled in collab. let me know if you wanna take a look at my code I will email it to you.
btw I have tried solutions like (numba @jit(target_backend='cuda') ). if you can put me in the right path. may tell me how you managed to get it to use GPU? I will be grateful

Are you open to contributions ?

Hello,
I'm a PhD student at Sorbonne University and I've started to look at gossip learning. I've just discovered your simulator and I think I will use it for my work.
If I add new protocols or algorithms to the simulator, are you open that I add them to the project as pull requests ?

Thanks,
Mohamed Amine LEGHERABA

Same code for Linear Regression and Logistic Regression ?

Hello,
I'm new to machine learning but it seems to me that the implementation of Logistic Regression is the same as Linear Regression, it's simply a linear transformation: https://github.com/makgyver/gossipy/blob/main/gossipy/model/nn.py#L166

I would have suppose that we will have a sigmoid transformation, with the torch.sigmoid function, for example like in this tutorial : https://towardsdatascience.com/logistic-regression-with-pytorch-3c8bbea594be

Is it a bug or did I miss something ?

Thanks,
Mohamed

Issue when the network is composed of one node

I know that it doesn't make sense to create a network with one node but I did it to compare results with a centralized solution.
This is the code used (adapted from main_ormandi.py:

from gossipy.gossipy import set_seed
from gossipy.gossipy.core import AntiEntropyProtocol, CreateModelMode, StaticP2PNetwork, UniformDelay
from gossipy.gossipy.node import GossipNode
from gossipy.gossipy.model.handler import PegasosHandler
from gossipy.gossipy.model.nn import AdaLine
from gossipy.gossipy.data import load_classification_dataset, DataDispatcher
from gossipy.gossipy.data.handler import ClassificationDataHandler
from gossipy.gossipy.simul import GossipSimulator, SimulationReport
from gossipy.gossipy.utils import plot_evaluation

set_seed(42)
X, y = load_classification_dataset("spambase", as_tensor=True)
y = 2*y - 1 #convert 0/1 labels to -1/1

data_handler = ClassificationDataHandler(X, y, test_size=.1)
data_dispatcher = DataDispatcher(data_handler, 1, eval_on_user=False, auto_assign=True)
topology = StaticP2PNetwork(1, None)
model_handler = PegasosHandler(net=AdaLine(data_handler.size(1)),
                               learning_rate=.01,
                               create_model_mode=CreateModelMode.MERGE_UPDATE)

# For loop to repeat the simulation
nodes = GossipNode.generate(data_dispatcher=data_dispatcher,
                            p2p_net=topology,
                            model_proto=model_handler,
                            round_len=100,
                            sync=False)

simulator = GossipSimulator(
    nodes=nodes,
    data_dispatcher=data_dispatcher,
    delta=100,
    protocol=AntiEntropyProtocol.PUSH,
    delay=UniformDelay(0,10),
    online_prob=.2, #Approximates the average online rate of the STUNner's smartphone traces
    drop_prob=.1, #Simulate the possibility of message dropping,
    sampling_eval=.1
)

report = SimulationReport()
simulator.add_receiver(report)
simulator.init_nodes(seed=42)
simulator.start(n_rounds=100)

plot_evaluation([[ev for _, ev in report.get_evaluation(False)]], "Overall test results")

And the results I get back are the following :

# INFO     # Sent messages: 95                                                                                   simul.py:248
# INFO     # Failed messages: 79                                                                                 simul.py:249
# INFO     Total size: 5415                                                                                      simul.py:250
# INFO     accuracy: 0.62                                                                                        utils.py:181
# INFO     precision: 0.62                                                                                       utils.py:181
# INFO     recall: 0.62                                                                                          utils.py:181
# INFO     f1_score: 0.61                                                                                        utils.py:181
# INFO     auc: 0.66                                                                                             utils.py:181

In the algorithm from the paper ormandi2013, a node wait for the reception of a model to update the model on the data, and thus the model never learn. This is a bit sad but expected. The part that is very strange for me is the number of "failed "message".
I would have suppose that all message would failed since there is no other nodes in the network.
What do you think ? Is this expected or a bug ?

TPU support for gossipy

I recently attempted to bring TPU support to gossipy. however, when using it it shows 4 hours and 30 minutes to completion of my simulation, significantly slower than the GPU training time of approximately 45 minutes. I was wondering if you could make it work:
I have read that TPUs are supposed to be significantly faster but my attempt does not reflect that!
Here is how I changed stuff :
first installing torch_XLA:

!pip install torch_xla https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl
import torch_xla.core.xla_model as xm

class GlobalSettings(metaclass=Singleton):
    """Global settings for the library.""" 
    
    _device = 'cpu'

    def auto_device(self) -> torch.device:
        """Set device to TPU if available, otherwise cuda if available, otherwise cpu.
        
        Returns
        -------
        torch.device
            The device.
        """
        if xm.xla_device_exists():
            self._device = xm.xla_device()
        elif torch.cuda.is_available():
            self._device = torch.device('cuda')
        else:
            self._device = torch.device('cpu')
        return self._device
    
    def set_device(self, device_name: str) -> torch.device:
        """Set the device.
    
        Parameters
        ----------
        device_name: name of the device to set (possible values are 'auto', 'cuda', 'cpu', and 'tpu').
        When device_name is 'auto', 'cuda' is used if available, otherwise 'cpu'.
        
        Returns
        -------
        torch.device
            The device.
        """

        if device_name == "auto":
            return GlobalSettings().auto_device()
        elif device_name == "tpu" and xm.xla_device():
            self._device = xm.xla_device()
        else:
            self._device = torch.device(device_name)
        
        return self._device
    
    def get_device(self):
        """Get the device.

        Returns
        -------
        torch.device
            The device.
        """
        return self._device

*****TPU
CaptureTPU
***********GPU
CaptureGPU

Questions about the implementation of an algorithm

Hi,
First thank you for your good work.
I have a question regarding the class TorchModelPartition implemented here:

class TorchModelPartition:
def __init__(self, net_proto: TorchModel, n_parts: int):
"""Class that manages the partitioning of a pytorch model.
TorchModelPartition handles how to partition a pytorch model as well as the merge of
partitioned models. The partitioning is deterministic. It divides the parameters of the
model in ``n_parts`` parts of equal size starting from the first layer and going to the
last layer.
The created partition is stored in the ``partitions`` attribute which is a dictionary
containing the indices of the parameters to be sampled in each layer.
Parameters
----------
net_proto : TorchModel
The prototype of the model to be partitioned.
n_parts : int
The number of partitions to be created.
Notes
-----
Partitioning is only supported for neural networks with at most 3D layers.
"""
self._check(net_proto)
self.str_arch = str(net_proto)
self.n_parts = min(n_parts, net_proto.get_size())
self.partitions = self._partition(net_proto, self.n_parts)
def _check(self, net: TorchModel) -> None:
plist = ParameterList(net.parameters())
for t in plist:
if t.dim() > 3:
raise TypeError("Partitioning is only supported for neural "\
"networks with at most 3D layers.")
def _partition(self,
net: TorchModel,
n: int) -> Dict[int, Dict[int, Optional[Tuple[LongTensor, ...]]]]:
plist = ParameterList(net.parameters())
parts = {i : {j : None for j in range(len(plist))} for i in range(n)}
net_size = net.get_size()
mu = math.floor(net_size / n)
rem = net_size % n
ni, ti = 0, 0
diff = mu + (rem > 0)
shift = [0, 0, 0]
ids = [[], [], []]
while ti < len(plist):
tensor = plist[ti]
sizes = tuple(tensor.shape)
cover = min(sizes[0] - shift[0], diff)
diff -= cover
ids[0].extend(range(shift[0], shift[0]+cover))
if tensor.dim() >= 2: ids[1].extend([shift[1]] * cover)
if tensor.dim() >= 3: ids[2].extend([shift[2]] * cover)
shift[0] = (shift[0] + cover) % sizes[0]
if not shift[0] and tensor.dim() >= 2: shift[1] = (shift[1] + 1) % sizes[1]
if not shift[1] and tensor.dim() >= 3: shift[2] = (shift[2] + 1) % sizes[2]
if tensor.dim() == 1:
if diff == 0 or shift[0] == 0:
parts[ni][ti] = (torch.LongTensor(ids[0]),)
ids = [[], [], []]
elif tensor.dim() == 2:
if diff == 0 or shift[1] == 0:
parts[ni][ti] = (torch.LongTensor(ids[0]),
torch.LongTensor(ids[1]))
ids = [[], [], []]
else:#if tensor.dim() == 3:
if diff == 0 or shift[2] == 0:
parts[ni][ti] = (torch.LongTensor(ids[0]),
torch.LongTensor(ids[1]),
torch.LongTensor(ids[2]))
ids = [[], [], []]
if shift[0] == 0:
if tensor.dim() == 1: ti += 1
else:
if shift[1] == 0:
if tensor.dim() == 2: ti += 1
elif shift[2] == 0: ti += 1
if diff == 0:
ni += 1
diff = mu
if ni < rem: diff += 1
return parts
def merge(self, id_part: int,
net1: TorchModel,
net2: TorchModel,
weights: Optional[Tuple[int, int]]=None) -> None:
"""Merges the partition with id ``id_part`` of two models.
Parameters
----------
id_part : int
The index of the partition to be merged.
net1 : TorchModel
The first model to be merged.
net2 : TorchModel
The second model to be merged.
weights : Optional[Tuple[int, int]], default=None
This tuple represents the relative weights of the two models to be merged.
If None, the weights are assumed to be equal, thus the merge is the average of the
parameters.
"""
assert str(net1) == self.str_arch, "net1 is not compatible."
assert str(net2) == self.str_arch, "net2 is not compatible."
id_part = id_part % self.n_parts
plist1 = ParameterList(net1.parameters())
plist2 = ParameterList(net2.parameters())
w = weights if (weights is not None and weights != (0,0)) else (1,1)
mul1, mul2 = w[0] / sum(w), w[1] / sum(w)
with torch.no_grad():
for i in range(len(plist1)):
t_ids = self.partitions[id_part][i]
if t_ids is not None:
plist1[i][t_ids] = mul1 * plist1[i][t_ids] + mul2 * plist2[i][t_ids]

The method _check checks whether a tensor is of dimension 3 or more and _partition suppose that tensors are of dimension less or equal to 3 (as _check is called before). Why this limitation? It looks like the code of _partition can be extended to support tensors of dimension 4 but, maybe I am missing something?.
This current limitation prevents us to use LeNet-5 implemented here for example: https://github.com/lychengrex/LeNet-5-Implementation-Using-Pytorch/blob/master/LeNet-5%20Implementation%20Using%20Pytorch.ipynb

Sincerely,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.