Giter VIP home page Giter VIP logo

alfresco-java-content-model's Introduction

Alfresco Reporting Integration Solution Design Reference

Documents in Alfresco content repository are crawled and indexed using the crawling facility of the Google Search Appliance. Alfresco document urls are exposed by means of a crawl webscript in Aidoc which allows the GSA to replicate the company folder structure; Year/ClassCode/Document Number under which documents are indexed. Currently the GSA is set to continuous crawl the Alfresco document repository on a daily basis at specified times outside the scheduled backup time. However allowing the GSA to continuously crawl documents has the following drawbacks:

• Unnecessary load on the Alfresco server - the frequency of documents in Alfresco being add, modified or deleted is minimal and does not require a continuous crawl process running constantly.

• When there is downtime on the Alfresco server, the GSA crawler receives at HTTP Response code 404 – “Document not found" and assumes the document has been removed from the Alfresco repository. Therefore it attempts to remove the document from its index.

• GSA crawler is a time consuming process and often requires approximately 3-4 days to crawl the whole Alfresco document collection.

Requirements

The proposed change to Alfresco’s integration with GSA is to use an asynchronous XML feed that is automatically triggered upon a change in Aidoc. This will use the Feed’s facility provided by the GSA. This will allow for the GSA continuous crawler to be switched off. The use of feeds will have the following advantages: • Documents that can be crawled but are best recrawled at different times than those set by the automatic crawl scheduler that runs on the search appliance.

• Documents that can be crawled but there are no links on your web site that allow the crawler to discover them during a new crawl.

• Documents that can be crawled but are much more quickly uploaded using feeds, due to web server or network problems.

How it works

The first step is to implement an action that includes logic to generate an XML feed with the properties of an Alfresco document node and send a multipart HTTP POST to GSA. Alfresco content repository has a mechanism to trigger actions, this is called behaviours. The behaviour can bound to a Policy which specifies the Alfresco event that will trigger the action to which the behaviour is attached to. Then the second stage is to collect the response from the GSA and verify that the feed has been successfully sent and indexed/removed in the GSA. The result of the response will be recorded as a new property added to the Asset node in Alfresco. As there is a time lag between the GSA accepting the HTTP POST and completing the Indexing of a document, a third stage is required. This would be to implement another action that uses the Alfresco task scheduler to search the repository and verify the document has been successfully Indexed in the GSA. If it fails, this action will attempt to resend the feed again.

alfresco-java-content-model's People

Contributors

dawudr avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.