Giter VIP home page Giter VIP logo

sdimple-chf / projettd-ac Goto Github PK

View Code? Open in Web Editor NEW

This project forked from osiastossou/projettd-ac

0.0 1.0 0.0 3.84 MB

This paper presents TD-AC which is an effective algorithm for the truth discovery problem when the attributes over data are structurally correlated. We build our procedure on an abstract representation of the truth in the data, the k-means clustering technique and the silhouette measure to automatically find an optimal partitioning of the input data (or a near-optimal) maximizing the accuracy of any base truth discovery process. The intensive experiments conducted on synthetic and real datasets show that TD-AC outperforms existing partitioning approaches with a more reasonable running time. It improves on synthetic datasets the accuracy of standard truth discovery algorithms by 6% at least and by 16% at most and also significantly when the data coverage rate is high for the other types of datasets

Python 64.05% Jupyter Notebook 35.95%

projettd-ac's Introduction

Projet TD-AC: Efficient Data Partitioning based Truth Discovery

Author 1 : Osias Noël Nicodème Finagnon TOSSOU (African Institute for Mathematical Sciences at Mbour,Senegal) [email protected]
Author 2 : Mouhamadou Lamine Ba (Université Alioune Diop de Bambey at Bambey,Senegal) [email protected]
Note :

Note: For confidentiality reasons, some of the actual data we used are not online, others are publicly available actual data as we have obtained them in other articles. The synthetic data are generated by an algorithm described in the paper and implemented in python in the DataSynthetiqueGenerator.py file. Three synthetic data whose configuration is in the paper is present here in the data folder (DS1, DS2, DS3)

Absract :

This paper presents TD-AC which is an effective algorithm for the truth discovery problem when the attributes over data are structurally correlated. We build our procedure on an abstract representation of the truth in the data, the k-means clustering technique and the silhouette measure to automatically find an optimal partitioning of the input data (or a near-optimal) maximizing the accuracy of any base truth discovery process. The intensive experiments conducted on synthetic and real datasets show that TD-AC outperforms existing partitioning approaches with a more reasonable running time. It improves on synthetic datasets the accuracy of standard truth discovery algorithms by 6% at least and by 16% at most and also significantly when the data coverage rate is high for the other types of datasets.

Code Run description :

.....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.