Giter VIP home page Giter VIP logo

phd's Introduction

My PhD Papers and Presentations

Thesis: Enabling Self-Service Data Provisioning Through Semantic Enrichment of Enterprise Data

Enterprises use a wide range of heterogeneous information systems in their business activities such as Enterprise Resource Planning (ERP), Customer Relationships Management (CRM) and Supply Chain Management (SCM) systems. In addition to the large amounts of heterogeneous data produced by those systems, external data is an important resource that can be leveraged to enable taking quick and rational business decisions. Classic Business Intelligence (BI) and even the newer Agile Visualization tools focus much of their selling features on attractive and unique visualizations. Preparing data for those visualizations still remains the far most challenging task in most BI projects large and small. Self-service data provisioning aims at tackling this problem by providing intuitive datasets discovery, data acquisition and integration techniques to the end user.

The goal of this thesis is to provide a framework that enables self-service data provisioning in the enterprise. This framework empowers business users to search, inspect and reuse data through semantically enriched datasets profiles.

Publicly available datasets contain knowledge from various domains such as encyclopedic, government, geographic, entertainment and so on. The increasing diversity of these datasets makes it difficult to annotate them with a fixed number of pre-defined tags. Moreover, manually entered tags are subjective and may not capture their essence and breadth. We propose a mechanism to automatically attach meta information to data objects by leveraging knowledge bases like DBpedia and Freebase which facilitates data search and acquisition for business users.

In many knowledge bases, data entities are described with numerous properties. However, not all properties have the same importance. Some properties are considered as keys for performing instance matching tasks while other properties are generally chosen for quickly providing a summary of the key facts attached to an entity. Business users may want to enrich their reports with these data entities. To facilitate this, we propose a mechanism to select what properties should be used when augmenting extra columns into an existing dataset or annotating instances with semantic tags.

Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to benefit from this mine of data, one needs to access to descriptive information about each dataset (or metadata). This metadata enables dataset discovery, understanding, integration and maintenance. Data portals, which are datasets' access points, offer metadata represented in different and heterogeneous models. We first propose a harmonized dataset model based on a systematic literature survey that enables complete metadata coverage to enable data discovery, exploration and reuse by business users. Second, rich metadata information is currently very limited to a few data portals where they are usually provided manually, thus being often incomplete and inconsistent in terms of quality. We propose a scalable automatic approach for extracting, validating, correcting and generating descriptive linked dataset profiles. This approach applies several techniques in order to check the validity of the metadata provided and to generate descriptive and statistical information for a particular dataset or for an entire data portal.

Traditional data quality is a thoroughly researched field with several benchmarks and frameworks to grasp its dimensions. Ensuring data quality in Linked Open Data is much more complex. It consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. We propose an objective assessment framework for Linked Data quality based on quality metrics that can be automatically measured. We further present an extensible quality measurement tool implementing this framework that helps on one hand data owners to rate the quality of their datasets and get some hints on possible improvements, and on the other hand data consumers to choose their data sources from a ranked set.

Finally, the Internet has created a paradigm shift in how we consume and disseminate information. Data nowadays is spread over heterogeneous silos of archived and live data. People willingly share data on social media by posting news, views, presentations, pictures and videos. We propose a service that brings relevant, live and archived information to the business user. The key advantage is an instantaneous access to complementary information without the need to search for it. Information appears when it is relevant enabling the user to focus on what is really important.

link

Papers

Posters

phd's People

Contributors

ahmadassaf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phd's Issues

Invite the PhD Committee

Thesis manuscript:

  • If the PhD student wants to write, and present, his/her thesis in English, he/she will need to ask for the Director of the PhD school’s authorization
  • If he/she chooses to write the thesis in English, he/she will need to add to the final manuscript a French text of 25 pages to give the main ideas of the thesis, the results, conclusions and the scientific links of the different chapters. This text could be done after the defense.

Calendar:

  • 2 months before the defense:
    • the jury proposition (see doc enclosed, the title of the thesis must be written in FRENCH, and THIS TITLE WILL BE THE OFFICIAL ONE ON ALL THE DEFENSE’S DOCUMENTS)
    • 2 first versions of the thesis (one PDF + another one without the bibliographies' chapter)
      I will send those 3 documents to Télécom ParisTech and they will give me their agreement for the thesis to be defended.
  • 8 weeks before the defense: We will have to send the manuscript to the jury members. The reviewers will have 4 weeks to prepare their report and give their agreement for the defense.
  • 15 days before the defense: If the reviewers reports are positives, we will send the thesis announcement, using 2 abstracts (1 in French and 1 in English).
  • The thesis defense envelope will be given to the advisor the defense's day. When the defense will be finished, the PhD student will have 2 months to prepare his/her administrative ending procedures.

Jury rules:

Jury proposition: the jury members must be maximum 8 persons. In those 8 persons, you must find the advisor(s), 2 reviewers, and the other jury persons.

  • Conditions to be reviewers:
    • they mustn’t be from EDITE or Telecom ParisTech or EURECOM or having links with the PhD student, as co-publications,
    • they must have an HDR or being the same level than a HDR.
  • If the advisor is not Professor, we must count a Professor from Télécom ParisTech or EURECOM in the examinators.
  • The half of the jury must be Professors.
  • The half of the jury must not be from EDITE or telecom ParisTech or EURECOM.
  • The jury president must be Professor.

Tentative Committee

Re-work the table of contents generation of the Thesis manuscript

When opening the Thesis.pdf file, on the left, enabling the bookmark panel, one could see the list of things and various chapters, but after Part 2, all subsequent chapters including the final conclusions, all appendixes and the Bibliography are collapsed within Part 2.

Appendix A: Add a description explaining the proposed HDL model and examples

Appendix A is quite useless in its current form. First, there should be a text that introduces the model and its JSON serialization. Beyond the 5 pages listing, this text should describe what each parts that are captured in this model mean and how they should be used. Next, concrete examples that instantiate this model should be provided.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.