Currently we need to do ugly things to make circular references: <div class="snipp

redesign constructors to allow uninitialized fields,about julialang/julia

Comments (31)

StefanKarpinski commented on April 28, 2024

Wouldn't it be possible to generate code to check for uninitialized references only after an "inner" constructor returns? Some checks could of course be eliminated since the compiler can reason that they must be set (like this.num and this.den above), but in cases where complicated logic occurs, explicit checks could be done just before the implied return of this.

Of course, the trouble with that would be when you have something like this:

type MyType
  self::MyType
  MyType() = init(this)
end

Here the init function gets a partially initialized MyType object, so its accesses would need to be checked, implying that in general, all accesses would need to be checked.

But what if we required a special type for uninitialized objects? So that init would need to be declared as init(x::Uninitialized{MyType}) instead of just init(x::MyType). Then the compiler knows when compiling the code for init that it needs to generate uninitialized field checks for the x argument, and only init needs to pay the cost, instead of every piece of code everywhere paying the cost. It also uses the type system to prevent uninitialized objects from getting passed where they don't belong. It is very rare to actually want to pass an uninitialized object to an external function for any reason. This allows it, but requires explicit support.

The uninitialized types would always be immediate shadow parents of their initialized concrete counterparts. So the declaration that ConcreteT <: AbstractT would actually imply a type hierarchy like this:

ConcreteT <: Uninitialized{ConcreteT} <: AbstractT

That's a little tricky, but does seem to solve the problem.

from julia.

JeffBezanson commented on April 28, 2024

This is clever, but I don't think we should do it. This type would have to be hacked in as a special case in various little places in the compiler and it could get ugly. It would also introduce (1) a case where the type of a value changes, currently impossible, and (2) a concrete type that has subtypes (Uninitialized is concrete because there can be a value of that type).
It would also break checks like isa(x,ConcreteT) during init(). I'm also not convinced that you have to initialize all fields within the constructor. There might be cases where it's convenient for an object to hang around for a while with uninitialized fields.
Performance is probably not a big deal, since you're testing a word you had to load anyway. Not as bad as a type check where you need an extra load. And not as bad as arrays which are more likely to involve intense looping. We're already doing the null checks there.
Maybe we can use the SIGSEGV handler to turn null accesses into exceptions!

from julia.

StefanKarpinski commented on April 28, 2024

Ok, these are all fair points. It's a cute idea though. It does feel a little too clever, admittedly — and a little too much like how Ruby handles the object specific parent class, which is confusing but clever. And it's a good point about testing a word that's already loaded not being likely to be all that expensive.

I like the idea of using a SIGSEGV handler to trap null exceptions. If all fields default to all zero bits, that would make bits types default to zero, which seems sensible, and reference fields would default to NULL, which would allow us to do the SIGSEGV trapping business.

from julia.

JeffBezanson commented on April 28, 2024

Oh darn, I forgot: using SEGV would not give us the behavior we have now, which is to throw an error as soon as a null reference arises, not when it is used. For example, assuming o.f is uninitialized:

a = 2
a = o.f   # currently we'd give an error here
...
a.b   # with SEGV behavior we'd get an error here

In general, you'd get the error at some random point in the future. Probably not ideal. On the other hand, when accessing an uninitialized element of an unboxed array you don't get any error, so perhaps this is "consistent".

from julia.

StefanKarpinski commented on April 28, 2024

If we're going to allow uninitialized references to hang out as long as one feels like letting them hang out, then this strikes me as the most consistent behavior, honestly. It kind of sucks though because then you can have uninitialized values potentially propagating all over the place from a single accidentally uninitialized field that gets used, but I feel like that's kind of the corollary of allowing them to persist beyond object construction at all.

This is one of those decisions that it's really hard to predict which choice will in retrospect have been the right one. It feels very much like we could easily go one way and later really regret it as it becomes clear that the choice was either too lax or too strict. If we go lax, we can always get stricter, but we'd end up potentially breaking people's code. If we go strict, we can always relax things later without breaking anything. But implementing strict behavior is probably harder.

from julia.

JeffBezanson commented on April 28, 2024

It's a tough one. But the lax behavior is the one that everybody has done before, and that the designers of Algol said they regret IIRC. A good rule of thumb is to raise errors as soon as possible (i.e. closer to the root cause). I just tried a cell array access microbenchmark, and the null checks seem to cost about 2%.

from julia.

StefanKarpinski commented on April 28, 2024

Honestly, 2% is not bad. Nobody really cares if something is 2% slower (except in benchmarks competitions). These days, no one cares that Hadoop is absurdly inefficient, because it can scale, which is far more important. So maybe we should go strict and fix the error that the Algol folks feel that they made.

Then the question is becomes how strict to be. Is letting null references exist but checking for them on usage and throwing an exception the right trade off? Can you come up with an example of a case where letting an object be constructed with uninitialized references is really handy?

If we allow uninitialized references to persist beyond construction time, we will need to provide intrinsics to check if a field is initialized or not, if for no other reason, than that we would need that in order to display objects with uninitialized references without barfing. If there are a lot of other situations where checks like that would have to be made, it would seem like a fairly compelling argument for not allowing uninitialized fields to escape constructors. Think about how much of a mistake allowing the null value in Java is — it causes no end of corner cases and headaches. We do not want to find ourselves in that position when null is replaced with "uninitialized".

from julia.

StefanKarpinski commented on April 28, 2024

This also interacts with the immutable issue: obviously you never want to construct an immutable object with uninitialized fields, because then it will be permanently uninitialized.

One idea would be to disallow the overriding of default inner constructor for immutable types — then you can only construct immutable objects by giving all field values to the default inner constructor. On the other hand, if inner constructors were never allowed to be return objects with uninitialized fields, then you could safely allow the definition of non-default inner constructors for immutable objects. However, that implies that the this object is mutable and then becomes immutable after the inner constructor returns. That's very weird and seems to create both a mutable and immutable version of every immutable type. I guess I prefer the former approach — disallow inner constructors for immutable types.

from julia.

StefanKarpinski commented on April 28, 2024

Actually, looking at the Rational example above, this is more of an issue than I originally realized since Rational is a perfect candidate for immutability. So what is the type of this in the inner constructor? If it's Rational then it should be immutable, but it's clearly mutable.

from julia.

JeffBezanson commented on April 28, 2024

Exactly; the only way to get rid of "null check hell" is to raise an error as soon as a null reference is accessed.

Checking at the end of a constructor is neither here nor there since you can still call functions during the constructor.

I have thought about the ability to check whether a reference is uninitialized, and I have hesitated. A good example is HashTable, which uses an Array to store the keys and values. Any value can be a hash key, so unused spots must be uninitialized. I use a bitvector to keep track of which spots are used. Sure it would be more efficient to have isassigned(a, i). But then whether something is uninitialized becomes part of your program's behavior, and the next thing you want is the ability to clear a reference back to uninitialized. I guess that's kind of a hybrid approach where you still need null checks, but when you don't do null checks errors are thrown sooner.

What I'm shooting for is to have no null pointers, period. In fact we currently have this with our struct types. But this comes at a cost of too little flexibility in constructing objects. My new proposed contract with the user is that there still may not be any null pointers, but you're on your own to make that happen since the language can no longer guarantee it. From that perspective it might make sense to check at the end of a constructor, but there is no analogous thing you can do for Arrays.

I don't care what happens when printing objects. Printing something as null is kind of dishonest because you can't do anything with that knowledge; accessing the element is still an error. The contract is that an uninitialized object is not allowed to exist, whatever happens is your own fault, but we only check at well-defined points. This is like how you can have prohibited appliances in freshman dorm rooms at Harvard as long as you hide them during vacations when room inspections take place.

from julia.

JeffBezanson commented on April 28, 2024

For immutable objects a.x = y means to make a copy of a with a different value for field x. I guess that copy would have to be a special builtin operation that allows null references. So during the Rational constructor you might actually make 3 Rational objects, but in practice this would be optimized away.

from julia.

StefanKarpinski commented on April 28, 2024

My objection to not being able to print or otherwise check for uninitialized fields is that without that ability, you'll have to completely guess where you have an initialized value. If they exist and you can't check for them, then you're really just sticking your head in the sand and pretending they're not there. And you can always just trap UndefRefErrors and then do something else in the catch block. How is that different from having an isassigned predicate? It's effectively the same but less efficient. Basically, if accessing an undefined field is an error, it does affect the program's behavior, regardless.

from julia.

StefanKarpinski commented on April 28, 2024

I actually really dislike having a.x = y create a new object and replace a with it. I'm not sure what the benefit of that is. It seems to me that it would be much better to just disallow changing the fields of immutable objects altogether.

from julia.

JeffBezanson commented on April 28, 2024

It's true that you can implement isassigned by catching the exception. That's probably enough to justify the predicate, so I will add it. On another level, I feel like you ought to know when things have been initialized. For example, in matlab people use exist('a') to see if they have set one of their local variables yet, and that always seemed really stupid to me.

A construct for "changing" a field of an immutable object is pretty common; even haskell has it. I feel like people will want to do things like assign the fields of a complex number, and do a[i].im=0. Otherwise you have to write a[i] = complex(a[i].re, 0). I don't much care myself but I think users will want this. We also need it for constructors; otherwise constructors for immutable types will have to use a different mechanism.

from julia.

JeffBezanson commented on April 28, 2024

I'm having trouble with the constructor for RopeString. It uses unbounded mutual recursion among multiple constructor definitions. As far as I can tell the only way to do that with this change is to have the inner RopeString constructor do what new did, and do the recursion among outer generic function definitions for RopeString. But then we have lost abstraction, as the constructor that accepts all fields has become publicly visible.
The problem stems from the automatic constructing and returning of this. To be able to do what RopeString does we need to put constructing back under user control. We could make new take no arguments and return an uninitialized object. ?

from julia.

JeffBezanson commented on April 28, 2024

Another possibility: reassign this and hope the unused constructed object is optimized away:

RopeString(h::RopeString, t::RopeString) =
    depth(h.tail) + depth(t) < depth(h.head) ?
        this = RopeString(h.head, RopeString(h.tail, t)) :
        (this.head = h;
         this.tail = t;
         ...)

Or a keyword argument?

RopeString(h.head, RopeString(h.tail, t), this=this)

though that one is a little leaky.

from julia.

StefanKarpinski commented on April 28, 2024

I'm suspicious of hoping that the compiler is going to eliminate the creation of temporary objects. Even if it does manage to do so, imo, it puts too much distance between what's conceptually going on and what is actually going on.

If a constructor can call another constructor to do its work, then it seems to me like these are the options:

don't create a this object to be implicitly passed into the constructor and provide constructors with a way of creating uninitialized objects, e.g. a new thunk
explicitly pass this into constructors, thereby allowing one constructor to pass the buck to another constructor by passing its this instance to it
automatically share this between recursive constructor calls so that the recursive call is operating on the same this as the original call

I'm not really sure which is best :-|

from julia.

JeffBezanson commented on April 28, 2024

Here's a weird one... have this.x=y in a constructor be special syntax that first checks whether this is assigned, and if not assigns it to a new instance. So nothing is allocated until you touch this. Clever because type inference is already able to remove undefined-var checks, but not able to remove object allocations. A bit strange but I'm starting to like this idea.

from julia.

StefanKarpinski commented on April 28, 2024

Can you explain this bit:

Clever because type inference is already able to remove undefined-var checks, but not able to remove object allocations.

Other than that it seems reasonable enough, but I'm a bit hesitant to add magical behaviors...

from julia.

JeffBezanson commented on April 28, 2024

this.x=y would be converted to this?=new(); this.x=y where ?= is a mythical operator that only evaluates the expression and does the assignment if this has not been set yet. Type inference can prove when a variable has been set, so most of the ?= expressions can be removed. For example:

if foo()
    this.x = 0
end
this.y = 1
this.z = 2
this.w = 3

The first one can use this=new(), because we know this has not been set yet. Only the second one requires a check, and the rest need no checks.

So we can have recursive constructors with no redundant allocations and allow uninitialized fields with very little overhead.

As per your recent email on hash tables, ?= could even become a full-fledged language feature. We could support it for any assignable thing. People might even like it for local variables, because sometimes you have complex logic that might end up setting some variable, and then later want something like "set it to this other thing if it hasn't been set yet". I've seen this in matlab code. ?= would let you avoid an extra flag. Since we already go to the trouble of checking for undefined variables, we might as well expose this capability to the user.

from julia.

StefanKarpinski commented on April 28, 2024

So this is the basic plan:

allow undefined values
allow checking for undefined values
allow conditional setting of undefined values via ?=
throw an exception immediately on any use of an undefined value

This seems like a pragmatic combination. The last point is really important because it prevents "null check hell" and prevents undefined values from silently propagating through a program, thereby also avoiding many null checks, since anything that's been used cannot possibly be null.

from julia.

JeffBezanson commented on April 28, 2024

A couple weeks ago Stephan B. observed that you might have a constructor like:

MyType() = return this

So we actually need to convert every occurrence of this to (this?=new(...); this). Might seem extreme, but perfectly doable.

Is the design settled? It seems a bit crazy but I feel like we really want recursive constructors. It's a clever feature. In other languages you have to do things like Class::Create(...) when you need this ability.

from julia.

StefanKarpinski commented on April 28, 2024

This whole thing still feels fishy to me, but I don't have anything better :-(

from julia.

StefanKarpinski commented on April 28, 2024

You know, I'm thinking that having a new() function that creates an uninitialized object feels better than this. It feels really wrong the way this is special here. Providing a new() function is also is the least magical solution, which is nice — it's perfectly understandable and explainable. We could allow optional arguments to new() which could be used to assign fields if they're given. That's frequently a lot clearer and more concise.

from julia.

StefanKarpinski commented on April 28, 2024

Using this approach, the default constructor would just be this:

type Foo
  bar
  baz

  Foo(bar,baz) = new(bar,baz)
end

This would be provided by default, of course, so the above definition would be unnecessary. Here are various examples to try this on and see how it feels...

type MyType
  self::MyType
  MyType() = (this = new(); this.self = this)
end

A little verbose but pretty clear.

type Rational{T<:Int} <: Real
     num::T
     den::T

     Rational(n::Int, d::Int) = (g = gcd(n,d); new(div(n,g),div(d,g)))
 end

 Rational{T}(n::T, d::T) = Rational{T}(n,d)
 Rational(n::Int, d::Int) = Rational(promote(n,d)...)
 Rational(n::Int) = Rational(n,one(n))

Clean, clear, short. I like it.

function RopeString(h::RopeString, t::RopeString) =
    depth(h.tail) + depth(t) < depth(h.head) ?
        RopeString(h.head, RopeString(h.tail, t)) :
        new(h, t, max(h.depth, t.depth)+1, length(h)+length(t))

Also simple, clear and concise.

from julia.

StefanKarpinski commented on April 28, 2024

This would also jive well with immutable objects — you create them in their final form using new() and then can't change them. This would mean that immutable objects cannot be self-referential, which is actually rather nice.

from julia.

JeffBezanson commented on April 28, 2024

It's a good point that new() is actually pretty nice and concise when it works. And it's perfect for immutable objects---just a matter of making the arguments to new required instead of optional. Elegant. Yes, this is the better solution.

My only remaining problem is that I want to rename the clone function to new. clone is a terrible name, since it doesn't clone but makes new uninitialized objects. Can we think of another name, or some other way to get around this conflict?

from julia.

StefanKarpinski commented on April 28, 2024

For posterity, it should be noted that half of this discussion took place in this email thread.

from julia.

StefanKarpinski commented on April 28, 2024

My only remaining problem is that I want to rename the clone function to new. clone is a terrible name, since it doesn't clone but makes new uninitialized objects. Can we think of another name, or some other way to get around this conflict?

I agree that clone is a lousy name for this but I don't really think new is much better for that purpose — and new is clearly a good name for the function that makes new objects. I always do a double-take and have to remember what clone actually does every time I see it. I've always felt that it was weird and slightly fishy that arrays have their own special construction function. It feels like it's begging for a cleaner and/or more general solution. How about calling it make_new or make or new_like or something like that?

from julia.

JeffBezanson commented on April 28, 2024

It's not supposed to be a special constructor for arrays. It's part of the mutable container interface. It lets you write generic container manipulations like

function replace(c, a, b)
  n = make_new(c)
  for k = keys(c)
    n[k] = (c[k]==a ? b : c[k])
  end
  n
end

How about create or alloc?

from julia.

JeffBezanson commented on April 28, 2024

branch merged in commit 587f36b. fixed.

from julia.

redesign constructors to allow uninitialized fields about julia HOT 31 CLOSED

Comments (31)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent