Giter VIP home page Giter VIP logo

league-of-nations-archives-digitization-challenge-starter-kit's Introduction

league-of-nations-archives-digitization-challenge-starter-kit

CrowdAI-Logo

This is a starter kit for the League of Nations archives digitization challenge on crowdAI.

Problem Statement

This challenge is an image classification problem, where in the training set you are given 4692 images belonging to either english or french, and then you are provided 14216 images in the test set, where you are supposed to predict the class the said image belongs to.

Dataset

The datasets are available in the Dataset section of the challenge page, and on following the links, you will have two files :

  • train.tar.gz
  • test.tar.gz

train.tar.gz expands into a folder containing two subfolders, of the form :

.
└── train
    ├── en (contains *.jpg images)
    └── fr (contains *.jpg images)

The folders en and fr have .jpg images belonging to the respective class. For the rest of this starter kit you are encourage to download both the files, and extract them and place them in the data/ directory to make the directory structure look like :

.
└── data
    ├── test_images  (contains *.jpg images)
    └── train 
        ├── en (contains *.jpg images)
        └── fr (contains *.jpg images)

Prediction file format

The predictions should be a valid CSV file with 14216 rows (one for each of the images in the test set), and the following headers :

filename, prob_en, prob_fr

where :

  • filename : filename of a single test file
  • prob_en : the confidence [0,1] that this image belongs to the class english
  • prob_fr : the confidence [0,1] that this image belongs to the class french

The sum of of prob_en and prob_fr for a single row should be less than 1.

Random prediction

The you can use the script below to generate a sample submission, which should be saved at random_prediction.csv.

#!/usr/bin/env python

import numpy as np
import os
import glob

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference


LINES = []
LINES.append("filename,prob_en,prob_fr")
for _file_path in glob.glob("data/test_images/*.jpg"):
    probs = softmax(np.random.rand(2))
    LINES.append("{},{},{}".format(
        os.path.basename(_file_path),
        probs[0],
        probs[1]
    ))

fp = open("random_prediction.csv", "w")
fp.write("\n".join(LINES))
fp.close()

Submission

Then you can submit on crowdAI, by going to the challenge page and clicking on Create Submission: create_submission

and then upload the file by clicking on Browse file at the bottom of the screen:

browse_file

and then finally, your submission should either be accepted, or the error shown :

feedback

Best of Luck

Author

Sharada Mohanty [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.