Lua-MapReduce framework implemented in Lua using luamongo driver and MongoDB as storage. It follows Iterative MapReduce for training of Machine Learning statistical models.
Currently the partitioning of map keys is done sequentially a the server. It is a merge algorithm between several number of GridFS files. This procedure is a bottleneck in the current implementation. It will be nice to use a parallelized sort/merge algorithm, in order to improve the performance.
It is possible to increase the performance of N lists merge implementation by using a heap with the best N values. This modification increases performance of minimum search algorithm, being a large improvements when N becomes large (hundreds).
The extension of this tool for iterative map-reduce paradigm will be nice to implement machine learning algorithms. In this way, an easy coupling between April-ANN ( or Torch 7) and this tool will be possible, and large artificial neural networks could be trained over very large datasets.