amgiri-1996 / zangshasha Goto Github PK
View Code? Open in Web Editor NEWThe project was to get the similarity between two sentences of any language ( It can be natural language or computer language). The first step we are doing is to parse the sentence and get the tree structure of it. Then applying Zhang and Shasha's algorithm (we modified it as pure algorithm was not appropriate here) to find tree distance between these parsed trees. So now we divide the tree in a tree forest according to part of speech and calculate tree distance and we assigned different weightage for different parts of speech structures. for example. Ram is playing. Sita is playing You can see noun part does not alter the meaning of sentence but Ram is playing . Ram is eating verb is completely changing the meaning so clearly parts of sentence has to be different weightage for finding similarity. We used regression techniques to find these parameters.