Giter VIP home page Giter VIP logo

hayj's Projects

404detector icon 404detector

This tool recognize 404 error according to the html content

annotator icon annotator

This tool allow users to labelize data one by one with a tkinter UI. You just need to give a set of label type and a set of data to be labelized. All labels will be stored either in a pickle file or a mongo collection.

authfilt icon authfilt

This repository gather functions and classes allowing to apply a highly scalable and efficient author filtering process on any corpus.

bash icon bash

Some useful bash functions

basics icon basics

Library of Java tools including basics

databasetools icon databasetools

The class MongoCollection allow an easy config of a MongoDB collection by providing an interface which handle authentication, indexes management, data conversion and pretty print of collections. It can work like a Python dict if you give at least one index.

datastructuretools icon datastructuretools

This repository provide some useful Python data structures, especially SerializableDict

datatools icon datatools

This project gathers useful modules on url parsing, csv reading, html parsing etc.

deepstyle icon deepstyle

DeepStyle provides pretrained models aiming to project text in a stylometric space. The base project consists in a new method of representation learning and a definition of writing style based on distributional properties. This repository contains datasets, pretrained models and other ressources that were used to train and test models.

domainduplicate icon domainduplicate

This tool detect duplicates over web pages of a domain to control crawling process. It prevent the crawl of captcha pages or "refuse" page for example.

honeypotdetector icon honeypotdetector

This tool can recognize honeypot urls using selenium to prevent bot detection

hyperopt icon hyperopt

Distributed Asynchronous Hyperparameter Optimization in Python

lbpextract icon lbpextract

Un outil permettant de convertir les relevés de compte PDF de La Banque Postale en fichier CSV lisibles dans un tableur.

ma-fsa icon ma-fsa

This is a minimal acyclic finite-state automata algorithm in Java based on the paper, "Incremental Construction of Minimal Acyclic Finite-State Automata".

markdown2html icon markdown2html

Convert a mardown file to a html file with a given css style (or a default one)

newstools icon newstools

This tool is useful to detect news URLs. It also aggregates several libraries which scrap news web pages (title, content...).

nlptools icon nlptools

Provide useful NLP tools to get word embeddings, preprocess text data...

py4j icon py4j

Py4J enables Python programs to dynamically access arbitrary Java objects

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.