This is my first project in a Cloudera Quickstart Container. This is a low level approach of getting multiple large (100GB) files and combining them into hdfs. Uses multiprocessing and runs quickly. At most, this uses 7-8GB of memory.
coleferg / cloudera-python-test Goto Github PK
View Code? Open in Web Editor NEWMoving large files into hdfs in python using multiprocessing