Currently there are two protocols: one for "iterator functions" and one for threads. Threads are fine, I think; it's the iterator function protocol that's.. hmmm.
It's borrowed from Lua, and like many things in Lua it's designed to be lightweight and not impose a lot of structure. It does a good job of letting you iterate over simple things without much overhead (by not having to allocate closures to iterate over simple list-like things), but the problem with it is that it's.. not very intuitive. Pretty much everyone who's used Croc on any large scale has asked me just how the hell to write an iterator. Even I have to really think about it when I write one.
Iterator functions also have one major shortcoming: when the first index is null, iteration ends. This works fine for tables, which is what Lua designed it for, but it limits general-purpose use of iterator functions. What if you want to iterate over the rows of a database query, where the first column is nullable? What if you have a map container that DOES allow null as a key? What if you zip over two arrays, the first of which contains nulls? In all of these cases, you have to either put a dummy index at the beginning (which makes the interface less intuitive and increases cognitive load), or you have to use thread iterators (which are overkill for a lot of use cases).
Thread iterators are really simple to write and use, and use out-of-band signaling to end the loop (when the thread dies), but they're pretty heavyweight. They're great for when you want to do a lot of processing and abstract it all behind a foreach loop, but you wouldn't want to invoke one to iterate over an array or table. So there's still a definite use case for lightweight iteration, it's just.. how do you redesign it to be more straightforward?
An important question is: how bad is it, really, to have to allocate something on the heap to perform an iteration? I don't want to discount a solution JUST because it might force you to allocate something at the beginning of each foreach loop. Of the built-in and stdlib types, really the only ones that DON'T need to allocate any state are the ones that are suuuuuuper simple list-like objects: arrays, memblocks, StringBuffers, Vectors, Regexps. Just about everything else (including tables and even strings, due to the difference between codepoint and byte offsets) requires at least a little more state than what the iterator function protocol gives you -- which is just "the object being iterated" and "the index of the last iteration". So it seems that allocating state to do iteration will be the rule rather than the exception, and maybe that isn't such a big problem.
So let's see what other languages have done to solve this.
Iterator objects
Like in Java. It's a small object, usually only with a couple methods like "hasNext" and "next", and maybe a "remove". While this interface is simple, it's a little more heavyweight than I'd like. You have to make a class, separate from the object it belongs to (or perhaps use a table?), and this class has a LOT of boilerplate. You might even have to create two different classes, one for "removable" iterator and one read-only one, because forcing EVERY iteration to allow removing is inefficient. One iterator object consists of the class instance AND a namespace to hold its fields, and a namespace has a blob of memory; that's three memory allocations for one iteration. Ehhhhh, don't think so.
Inversion of control
Like D. The foreach loop's body becomes a closure that's passed to the object's opApply method. It's a clever solution and makes iterators easy to write, but has a major problem unique to Croc: seamlessly returning multiple values from inside the loop becomes very tricky. You have to somehow save an arbitrary number of results, return THROUGH the opApply metamethod, and then return the results that you saved. I can't think of any way to do this short of introducing extra opcodes/complexity in the interpreter and/or allocating all the return values on the heap, somehow. Also, debugging/stepping through inside-out foreach loops is, in my experience, rather annoying B| Not to mention the extra entries in exception tracebacks due to it.
Generator functions
Like in Squirrel. These are a form of restricted thread, and like threads, they allow you to yield some values while saving the state of the generator, and then the generator can be resumed to get the next values. Unlike a thread, a generator can only yield from the top-level function; this means that the stack space that it needs to store between yields is constant and can be allocated in a single blob along with a generator function object. This also means that a new type would have to be introduced. Generator functions give you a lot of the ease of using threads (or inversion of control), but without the overhead; in fact, if generators were available, many thread iterators could probably just be turned into generators instead. The downside of generator functions is increased complexity in the interpreter. Now there's ANOTHER callable type to deal with; the interpreter has to keep track of whether or not it's inside a generator; things work differently inside generators.. blah blah blah. It's a good bit of work, but it IS an attractive solution.
So generators look nice. There's also a nice parallel between them and threads; all iteration is handled by threadlike objects. Neat.
Maybe! Possibly! Probably!