Giter VIP home page Giter VIP logo

Comments (11)

AyeshaNirma avatar AyeshaNirma commented on July 28, 2024 4

The question remains unanswered. How are your numbers comparable to all the approaches you compare against ? Which other approach beside yours, access the ground truth information during inference time ? please address this question, I am asking it for the 3rd time.

Regarding [28], you do not compare against it. No other trajectory forecasting method which you cite uses this evaluation protocol.

Yours and your colleagues, own TrajNet challenge does not allow this kind of evaluation.

Here is a quick test, take a training set, cluster 10 trajectories make a linear model (not even a regressor ), during test time, assign each individual the "best" trajectory. You will see how well Social GAN performs wrt a linear motion model. You will have your answer, if 20 is a small number or not.

from sgan.

luca-rossi avatar luca-rossi commented on July 28, 2024 3

@AyeshaNirma Thanks for pointing out the issues, I'm facing the same problems now. Did you find a better way to evaluate the model or a different solution to the problem of trajectory generation and evaluation?

from sgan.

xieshuaix avatar xieshuaix commented on July 28, 2024 3

This paper has some novelty. But publishing test time metrics conditioned on ground truth (which cannot be obtained in production) poses serious question on reproducibility and effectiveness in production. It also raises an ironic issue: If your model needs ground truth to perform well, why can't we just use the ground truth instead?

Seems like the whole multi-modal trajectory prediction community adopts this practice since this paper. Would appreciate if any conference reviewer and committee could offer any insight on how they think about this.

Glad that someone is willing to stand out with a comparision to classic time series const-velocity baseline: #44 (comment)

from sgan.

trungmanhhuynh avatar trungmanhhuynh commented on July 28, 2024 1

@agrimgupta92 Thanks for the efforts of trying social GAN, it's a great work anyway, at least it brought some interesting thoughts. However, I somehow agree with @AyeshaNirma. Although the problem is multi-modal, when we compare against other methods, we should only produce 1 sample and see if the accuracy is increased.

Or we should a "multi-modal" evaluation tool, something like many acceptable ground truth trajectories, then the results make more sense (at least to me :)).

from sgan.

ashar-ali avatar ashar-ali commented on July 28, 2024

For the second part, they have mentioned that a synthetic dataset was used in the original Social LSTM paper. This synthetic dataset was generated using a simulation. But the Social GAN paper does not use any synthetic data and trains on real data from scratch.

from sgan.

agrimgupta92 avatar agrimgupta92 commented on July 28, 2024

It's unfair to say that this work gets outperformed by Vanilla LSTM (Ref Table 1 in the paper for the discussion below).
LSTM vs SGAN 1V-1: This is basically LSTM vs Encoder-Decoder made from LSTM trained with adversarial loss. Also we discuss this in the paper: SGAN-1V-1 performs worse than LSTM as each predicted sample can be any of the multiple possible future trajectories. The conditional output generated by the model represents one of many plausible future predictions which might be different from ground truth prediction.
LSTM vs SGAN 1V-20: Just sampling more from the model without variety loss does not ensure good sample variety. Again, as discussed in the paper: GANs face mode collapse problem, where the generator resorts to generating a handful of samples which are assigned high probability by the discriminator.
Both the issues you raise have been addressed in the paper.

from sgan.

AyeshaNirma avatar AyeshaNirma commented on July 28, 2024

@ashar-ali I have read the paper in some detail, I am well aware of this sentence which makes it a bit more unjustified. The author of Social LSTM[1] is a co author in this work and synthetic data is generated free of cost (automatically). ** The authors of Social GAN generated synthetic data themselves to demonstrate the effectiveness of pooling in Social GAN** (We would like to highlight that even though these scenarios were created synthetically, we used models trained on real world data.), but they did not use it to compare with social LSTM ? There is absolutely no point in not reproducing the results. I mean guys what is going on ?

In this work Social LSTM is inferior to a trivial Vanilla LSTM. Just go through Social LSTM[1] and have a look at the margin it outperforms LSTM. @agrimgupta92 If you could kindly touch this aspect ? If results reported in Social LSTM are wrong, I am sure you had a discussion with Alahi regarding this aspect?

@agrimgupta92 Not really, the original Vanilla LSTM (Alahi et al.) (baseline), proposed by your colleague basically sampled the next point through a Gaussian. Where as in the case of your work the Vanila LSTM is directly regressing the point, or at least this is the impression one gets out of reading your work. Now, you are giving more samples to your method to sample from and this advantage is not provided to Vanilla LSTM ? Could you perhaps comment ? I mean reading your response one gets the impression that one might stumble upon the parameters by luck but really using of adversarial training in this case is not really motivated, in my humble opinion.

Please take it as a fruitful discussion from somebody who is student of the field. Looking forward to your in depth response especially regarding the results of Social LSTM.

1 Alahi, Alexandre, et al. "Social lstm: Human trajectory prediction in crowded spaces." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

from sgan.

agrimgupta92 avatar agrimgupta92 commented on July 28, 2024

The quoted line is taken out of context (it is possible that on reading the paper it is unclear what we meant by synthetic in that context) . That line was specifically referring to the trajectories in Figure 5 of the paper. If you take a look at Figure 5 the scenarios don't require us to use human trajectory simulators based on Social Forces (it's trivial to make the input shown in Figure 5). The figure shows common "social" scenarios which our model learns. However, in contrast the Social-LSTM paper states they used a synthetic dataset for training which was generated using a simulation that implemented the social forces model.
Our LSTM baseline does not output gaussian parameters so we can't sample multiple times. Let's assume hypothetically we implement an LSTM baseline like social-LSTM paper. Now, even if you output 20 samples around 68% samples will lie within 1 standard deviation of the mean. But in case of our method we are decoupled from the "mean" trajectory. Training using variety loss ensures that the 20 samples drawn cover all possible scenarios which can happen given the input trajectory. This is strictly better than what you could achieve even if we sample multiple trajectories from a LSTM (like Social LSTM paper).
Finally, this repo is about Social-GAN I welcome critique and comments on the code and the paper. However, if you want clarifications regarding Social-LSTM I am not the best person to do so. You should contact the first authors of that paper.

from sgan.

AyeshaNirma avatar AyeshaNirma commented on July 28, 2024

Its strange to say the least.

  1. You are using ground truth during test time (during prediction interval) (Cardinal sin of trajectory forecasting). You use ground truth to take the "best" trajectory. No other approach (LSTM, Social LSTM, LTA, SF etc.) uses ground truth during test time (prediction interval). Its a complete unfair comparison and you are giving unfair advantage to your approach by using ground truth during test time.
    So my argument is the following. Why you are predicting 20 trajectories with SGAN, why not predict 20,000 trajectories with a random regressor and for each next time step use the l2 distance with the ground truth to take the one which gives the lowest error? You will end up either with ground truth trajectory or an extremely as your colleagues say "Social" trajectory, Figure 4 of your own paper illustrate it and still its hard for you to understand why someone is complaining about unfair comparison. I mean I do not want to be overly harsh, I give you the benefit of doubt, probably still there is something I missed or I miss understood. Although, I checked the code as well. There are a few eager students waiting for your response as well.
  2. Regarding the LSTM with the Gaussian, so keep in mind during inference you input the predicted point to the LSTM machine and this dictates your next Gaussian distribution, therefore if you have sampled in the last time step a point that is deviating from the over all direction of the trajectory, you will end up in a extremely high error magnitude. Imagine the classical drifting problem/effect in Machine Learning. Just to give you an estimate (basically qunatifying your hypothesis), you can manage to bring down error roughly **~10-15 % if you run the bivariate gaussian LSTM on your datasets(Zara, UCY etc.) 20 times, replicating what you are doing with your proposed approach and take the best one. We are talking in the literal sense, by doing this experiment. Just to clarify, its the simple trivial Vanilla LSTM ** . Refer to this paper as well [1].

In early 2000s their was a tracker (single object), I do not exactly remember the name and they were doing something running experiment 100 times and taking the best and comparing it with the one ran single time.

  1. I did not take the lines out of context, I did explicitly mention that you guys used synthetic data for demonstrating pooling approach (which basically at the end of your paper did not perform better than no pooling) . My point was and still is, if you can use synthetic data or you say we created a synthetic situation (hard to understand how one should interpret it), one could kindly request to go a step further to get some synthetic data and make sure other approaches are performing to their ability, I mean you use ground truth data for testing for you own approach, perhaps providing a synthetic data for training would have been less tedious.

Regarding the Social LSTM numbers, I understand you cannot comment, actually we were wondering that you perhaps could comment that what authors responded to you when you asked them, since they are/were at Stanford, in anyways I understand you cannot comment.

[1]Ronny Hug et al. On the reliability of LSTM-MDL models for pedestrian trajectory prediction

from sgan.

agrimgupta92 avatar agrimgupta92 commented on July 28, 2024

The focus of the paper is try to explain why it's beneficial to move away from just predicting ground truth trajectory. With that context I would like to clarify the evaluation concern you raised:
Due to the inherent uncertainty in predicting the future, it might be unreasonable to expect the
model to match the ground-truth trajectory with a single sample. We therefore draw a small number of samples from the model and choose the best fit to the ground-truth; [28] used a similar evaluation strategy. In practice any system which consumes the predictions from our model would not know the future ahead of time and hence would be better off drawing many samples and planning according to the distribution of possible futures; this evaluation caters to such usage. Moreover, even when we select the best trajectory from the samples the goal is to show that the sample set is diverse enough that with high likelihood it contains a trajectory close to GT as quantified by low ADE.

from sgan.

MeimingWang avatar MeimingWang commented on July 28, 2024

@agrimgupta92 The work of this paper is well done. @AyeshaNirma Good question! I also have the same confusion that why the author of this paper uses the best trajectory prediction that is nearest to the ground truth trajectory out of so many prediction results. This is just like when you are doing a multiple choice question which has only one correct answer, other models only choose one answer, but you predict a lot of potential answers which are called "socially-acceptable", and finally when judging right or wrong, you pick the one that is closest to the real answer in all your potential answers, and then claim that you make the right choice. This is definitely unfair to other models. I hope the author of this paper can tackle this problem in evaluation.

from sgan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.