Giter VIP home page Giter VIP logo

Comments (6)

mcaceresb avatar mcaceresb commented on July 21, 2024

I do not think it's specific to total, and the issue is not quite what you describe so it affects every command. If gegen is called without a by prefix, then the expression is computed without by. If you want to compute the expression by a set of variables, you need to use the by prefix.

I cannot change the default behavior of gegen. Computations are done in C and I cannot parse Stata's syntax there. However, I can try to print a warning here and in the documentation? The whole point of gtools is that the data needn't be sorted; I didn't realize that egen computed stuff after the sort even when by is not a prefix.

FYI this works:

bys id: gegen gtot = total(cat!=cat[_n-1])

PS: Sorts are not stable by default and the sort order of your data will affect it. You ought to do something like

bys id (subid): gegen gtot = total(cat!=cat[_n-1])

from stata-gtools.

adamreir avatar adamreir commented on July 21, 2024

I see, makes sense that this problem is not specific to gegen total.

If I've understood gtools correctly, it's improvement over egen disappears when gegen is combined with the by-prefix (and not the by()-option), right?

I.e.

bys id: gegen gtot = total(cat!=cat[_n-1])

Isn't necessarily any faster than using egen?

gsort id
by id: egen gtot = total(cat!=cat[_n-1])

I agree: If it's not easily fixed, then a warning might help other users to be aware of this potential problem.

Never mind the mistake in the toy example. I guess the problem here is what happens when gtools tries to call the _n==0 observation when using by().

(Btw: Thanx for a very good package. gtools gives me several hours of extra coding time each week).

from stata-gtools.

mcaceresb avatar mcaceresb commented on July 21, 2024

Right; the by prefix eliminates much of the speed gains. It may still be faster in some cases, but it could also be slower in others.

The issue is that gtools is computing the expression for the whole data, whereas egen is doing it by group. Not really the 0th observation. If cat was all the same, then gtools would give yet a different answer, though egen would not change.

from stata-gtools.

adamreir avatar adamreir commented on July 21, 2024

Hm, haven't read enough about gtools to understand exactly what's going on here. But yeah, if I replace the toy example above with

(...)
replace cat="one" //if id!="1"
(...)

I get a data set where cat is always equal to "one". Then gtools produces

id cat gtot tot
1 one 1 1
1 one 1 1
1 one 1 1
2 one 0 1
2 one 0 1
2 one 0 1
3 one 0 1
3 one 0 1
3 one 0 1

Which is another surprising result.

So, what's the moral here? Subscripting with gtools should be used with caution (or not at all)?

from stata-gtools.

mcaceresb avatar mcaceresb commented on July 21, 2024

@adamreir The lesson is this:

  1. gtools internals are in C and cannot parse Stata syntax. If you are creating variables inside of gegen, they are being created before gtools internals are called.
  2. Therefore you should think of variables created inside gegen functions as equivalent to using gen, because that is what it is doing.
    • If you call by ...: gegen then the variable creation will be equivalent to by ...: gen
    • If you call gegen then the variable creation will be equivalent to simply calling gen.
  3. After variable creation, gtools is called and the function is invoked correctly by group.

from stata-gtools.

adamreir avatar adamreir commented on July 21, 2024

Aha, now I understand what's going on! Will keep this in mind.

Thanx a lot!

from stata-gtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.