Giter VIP home page Giter VIP logo

ganga-assignment's Introduction

Ganga Project - CERN-HSF - GSoC 2019

As part of the Ganga Project assignment for Google Summer of Code 2019, the codes and instructions have been put to execute the mentioned tasks in the given file.

Required Modules

  1. ganga
  2. pdfminer
  3. PyPDF2
  4. Jupyter
  5. memory-profiler

Since there was a Task Statement and Memory Management Statement, both have been discussed seperately in detail below.

Task

First task was to execute a simple Hello World_ job in the Ganga Shell whose output can be found here: Ganga_Hello_World.ipynb. The Jupyter Notebook can be opened in the Colab Notebook whose link is available at the top of the notebook.

In the next task, the given PDF file needs to be seperated into individual pages. Next, the Ganga Job should count the number of the in the given PDF file. The count of individual pages should be performed using subjobs. Finally, a merge needs to be written which takes the count from each subjob and adds up the values and writes it in a file.

In this regard, two helper modules/functions: execute.sh and adder.py are written and explained below:

  1. execute.sh

This file contains bash commands which convert the individual PDF pages into text file and count the number of the existing in the file.

  1. adder.py

This file contains a CustomMerger function which adds up all the counts and writes it in a output file.

The Ganga_File_Split.ipynb notebook contains the commands and code for:

  1. Install and Import needed modules
  2. Getting the required files
  3. Split the PDF file to PDF pages
  4. Commands to execute in the Ganga Shell

Note: I tried placing the code in a single Python file but while execution the merger failed due to the job being in submitted mode. Even after adding time-delay nothing worked. Hence, commands need to be put manually in the Ganga Shell.

The file stdout in the current directory will contain the needed sum.

Memory Management

For Memory Management, 4 tasks were given, out of which 3 were performed with all the requirements fulfilled. Please find the description of the performed experiments below:

  1. There are two folders: Deep Copy and Shallow Copy.
  2. In Deep Copy folder, there are two python files:
    • deepcopy_delay-1.py executes the first task of performing deep copy of previous simple objects and monitors the memory usage.
    • deep-release_reference-2.py executes the second task of releasing the reference of created objects one by one and observe the memory usage.
  3. In Shallow Copy folder, there is one python file:

Note

I checked for implementing the algorithm for using shallow-copy to mimic deep-copy (as described by Ulrik sir's in the email). I got an idea as well which is described below:

Shallow Copy creates a new object and has only references from original object for the sub-objects within it. This can be shown below. To use shallow-copy and make it mimic like deep-copy, we have to make shallow-copies of the available sub-objects as well.

Results

  1. Deep-Copy of Objects

deep-copy-1

  1. Release Reference - Deep Copy

deep-copy-2

  1. Shallow Copy

shallow-copy-3

ganga-assignment's People

Watchers

James Cloos avatar Yash Srivastava avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.