Giter VIP home page Giter VIP logo

arramooz's Introduction

Arramooz

Arabic Dictionary for Morphological analysis

downloads downloads

Developers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com Collect data manually Mohamed Kebdani, Morroco < med.kebdani gmail.com>

Features value
Authors Authors.md
Release 0.3
License GPL
Tracker linuxscout/arramooz/Issues
Website http://arramooz.sourceforge.net
Source Github
Download sourceforge
Feedbacks Comments
Accounts @Twitter @Sourceforge

Description

Arramooz Alwaseet is an open source Arabic dictionary for morphological analyze, It can help Natural Language processing developers. This work is generated from the Ayaspell( Arabic spellchecker) brut data, which are collected manually.

This dictionary consists of three parts :

  • stop words
  • verbs
  • Nouns

If you would cite it in academic work, can you use this citation

T. Zerrouki‏, Arramooz Alwaseet : Arabic Dictionary for Morphological analysis,  http://arramooz.sourceforge.net/ https://github.com/linuxscout/arramooz

or in bibtex format

@misc{zerrouki2011arramooz,
  title={Arramooz Alwaseet : Arabic Dictionary for Morphological analysis},
  author={Zerrouki, Taha},
  url={http://arramooz.sourceforge.net/},
  year={2011}
}

API

The python API is available as arramooz-pysqlite

Files formats

Those files are available as :

  • Text format (tab separated)
  • SQL database
  • XML files.
  • StarDict files
  • Python + Sqlite libray

BUILD Dictionary in multiple format

The source files are data folder as open document speadsheet files, then we can build dictionary with

make

which will generate xml, sql and text files, and package it in releases folder.

To make Hunspell files only

make spell

To make SatrDict files only

make stardict

NOTE: you must use stardict-editor to Compile releases/stardict/arramooz.sdic in babylon format

To modify the version, you can update $VERSION variable in Makefile file.

To clean releases use:

make clean

To modify data or updating data you can open files in data/ in libreoffice calc format, clean releases, and do make.

Stopwords

The Stop words list is developed in an independent project (see http://arabicstopwords.sourceforge.ne)

Data Structure

Data Structures in multiple format (csv, sql, xml) are described in DataStructures.md

  • nouns and verbs are described in datastructures.md
  • Stop words ( are explained in separate project Arabic Stopwords

Script Files:

1- generate the abstract dictionary from the brut manual dictionary:

python2 $SCRIPT/verbs/gen_verb_dict.py -f $DATA_DIR/verbs/verb_dic_data-net.csv > $OUTPUT/verbs.aya.dic

2- generate the file format (xml, csv, sql) of dictionary from verbs.aya.dic

python2 $SCRIPT/verbs/gen_verb_dict_format.py -o xml -f $OUTPUT/verbs.aya.dic > $OUTPUT/verbs.xml
  • [scripts/verbs]

    1- verbdict_functions.py : functions to handle verbs dict used in the generation process

    2- verbs/gen_verb_dict.py: generate the abstract dictionary from the brut manual dictionary

    3- verbs/gen_verb_dict_format.py: generate the file format (xml, csv, sql) of dictionary from verbs.aya.dic

  • [scripts/nouns]

    1- noundict_functions.py : functions to handle nouns dict used in the generation process

    2- nouns/gen_noun_dict.py: generate the file format (xml, csv, sql) of dictionary

  • [requirement]

    1- libqutrub

    2- pyarabic

Data Files:

This files are used to create ayaspell dictionary for spellchecking arramooz\verbs\data

File Description
verb_dic_data-net.csv brut data made manually by Mohamed kebdani.
ar_verb_normalized.dict A list of arabic verbs, from Qutrub project.
triverbtable.py A list of trilateral verbs, used by Qutrub.
verbs.aya.dic The verb dictionary in abstract format.

arramooz's People

Contributors

linuxscout avatar mapmeld avatar munzirtaha avatar muotaz avatar sohaibafifi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arramooz's Issues

Notes running on Mac / OSX and Python3

My current code is a bit of a hack, so I don't know if you would like this as a PR. But here are my notes for running it on my current machine

  • Edited Makefile to use /Applications/LibreOffice.app/Contents/MacOS/soffice (in place of libreoffice cli)
  • Replacing "python2" in Makefile with "python3"
  • Removing "#!/usr/bin/python2", "print ", "has_key", ".decode()" in scripts/*
  • Replace "unicode()" with "str()" in scripts/nouns/csvdict.py

Usage

This seems interesting I need some help regarding this, specially how to use it for any productive purpose. I think i need something like this to be able to generate forms of verbs and nouns

الإعراب

السلام عليكم ورحمة الله وبركاته

باستخدام هذا المشروع، هل نستطيع اعراب الجمل؟

A lot of questions since i do not know Arabic - trying to compose best online Arabic dictionary for non-Arabic speakers

First of all, thank you very much for such a great project. It means a lot for the non-Arabic speakers/engineers community

General questions

1: Can you tell me each of these word type in English?

  • "اسم فاعل"
  • "اسم مفعول"
  • "جامد"
  • "منسوب"
  • "مصدر"
  • "صيغة مبالغة"
  • "صفة مشبهة"
  • "صفة"
  • "اسم تفضيل"

which one matches against below ones

  • noun
  • adjective
  • verb
  • adverb
  • interjection
  • prefix
  • pronoun
  • suffix
  • conjunction
  • preposition
  • abbreviation
  • particle
  • phrase
  • auxiliary verb
  • number
  • idiom
  • private name
  • article
  • past particle
  • gerund
  • verbal adjective
  • predicative verb
  • letter
  • noun phrase
  • onomatopoeia
  • verbal adverb
  • person name

2: Are all roots in verbs database 3 letters? Because i have all conjugations of all roots that are 3 letters

3: What are these categories mean in English?

  • "فاعل"
  • "(فا.من حَصَد)"
  • "(فا.من شَغَلَ)"
  • "(فا.من نَاسَبَ)"
  • "مفعول"
  • "اسم أداة"
  • "اسم"
  • "اِسْمٌ"
  • "اسْم"
  • "اسم هيئة"
  • "مبالغة"
  • "اسم المرة"
  • "اسم نوع"
  • "(النَّوْعُ مِنْ فَرَشَ)"
  • "(النَّوْعُ مِنْ قَامَ)"
  • "(النَّوْعُ مِنْ هَانَ)"
  • "تصغير"
  • "صفة"
  • "طرف زمان"
  • "منسوب"
  • "اسم مكان"
  • "اسم آلة"
  • "أُنْثَى الحِرْباءِ."
  • "اِسْمٌ مِنْ أسْمَاءِ جَهَنَّمَ"
  • "(مَنْسُوبٌ إِلَى هُوَ)"
  • "(صِيغَةُ فَعِل)"
  • "(فَا. من وَدَى)"
  • "مصدر صناعي"
  • "مصدر"
  • "اسم المصدر"
  • "اسم مصدر"
  • "مصدر ميمي"
  • "صيغة"
  • "صفة/صيغة"
  • "صفة مشبهة"
  • "(صفة مشبهة - ربــاعي)"
  • "اسم تفضيل"

Also since i dont know Arabic i am having some problems

I am listing my all questions 1 by 1 for nouns

1: What does original mean? I mean what are the difference between these 2

unvocalized: معتاد and original: اِعْتَادَ

2: what does stamped mean?

I mean what are the difference between these 2
unvocalized: معتاد and stamped: معتد

3: what does wazn mean?

I mean what are the difference between these 2
unvocalized: منزوي and stamped: مُنْفَعِلٌ

4: what does mankous mean?

I mean what are the difference between these 2
unvocalized: بَارِي and how do we obtain mankous?

5: to make feminable we add this literal ة to the end of the all words right?

for example to make بَارِي to feminine we make it as بَارِية . Is this approach correct?

6: what does defined column means?

7: Are these genders correct?

مذكر : male
"" : no gender
مؤنث : female
مشترك : can be male or female : both?

8: number :

مفرد : single
جمع تكسير : broken plural?

9: i need some cases explanation about number, single, broken plural

case 1 :

number is مفرد which means single
single is not empty
for example : unvocalized: أكلة , single: رِعْيٌ , brokenplural : +ات [لا يجوز جمع مذكر سالم]
Ok i need explanation about case 1.
If word is already single, why single column is not empty?
If word is single, how do i generate plural form of it from brokenplural description. It has + and ] literals

case 2 :

number is مفرد which means single
single is empty
for example : unvocalized: شاذ , single: '' , brokenplural : ون;ات;وُحْدَانٌ
So in this case unvocalized form is single right? and for generating plural forms, i add each one of the broken plural to the end of the unvocalized form right?
for example plural forms of شاذ are شاذون and شاذات and شاذوُحْدَانٌ
If this approach is correct, are there any meaning difference between those 3 plural forms?

case 3 :

number is جمع تكسير which broken plural
single is empty
broken_plural is empty
in this case, does this mean, this word is already plural form and doesnt have single form?
for example vocalized is : ضأن so this word doesnt have singular form?
or does it mean it is both plural and singular?

case 4 :

number is جمع تكسير which broken plural
single is empty
broken_plural is كَاسِبَة
in this case, does this mean, unvocalized form of the word is singular?
if so how do i generate plural form? just adding broken plural to the end of the word?
for example singular form is كواسب and plural form is كواسبكَاسِبَة
is this correct?

case 5 :

number is مفرد
single is empty
broken_plural is empty
so i am guessing that this word is regular plural
in this case, how do i generate its plural form
for example unvocalized is مخبر so how do i generate its plural form?

10: what does dulable column means?

how do i dualable a word?

11: what does mamnou3_sarf column means?

how do i mamnou3_sarf a word?

12: what does relative column means?

how do i relative a word?

13: what does w_suffix means?

how do i w_suffix a word?

14: what does hm_suffix means?

how do i hm_suffix a word?

15: what does kal_prefix means?

how do i kal_prefix a word?

16: what does ha_prefix means?

how do i ha_prefix a word?

16: what does k_prefix means?

how do i k_prefix a word?

17: what does annex means?

how do i annex a word?

I am listing my all questions 1 by 1 for verbs

1: what does stamped means?

for example unvocalized is بيت and stamped is بت
what is the difference?

2: what does each of the following columns means and how do i obtain those forms?

transitive, double_trans, think_trans, unthink_trans, reflexive_trans,

3: for verb conjugations i will use here : http://acon.baykal.be/index.php

can i obtain all conjugations? that you have listed as past, future, imperative, passive, future_moode,

4: what does confirmed means?

5: what does future_type means?

for example unvocalized is شهب and stamped is فتحة
what is the difference?

Final question: Can i somehow obtain adverbs and adjectives from your database?

Issue about building dictionary

Hi linuxscout,

I want to build a dictionary for hunspell. But I encountered an issue when I tried to build dictionary with make according to the README.md.
image

When I run make spell, I got this issue:
image

Do you have any suggestions?

Thanks

LFS Objects

Hello.
I have made a fork to push some data to the repository, then suggest a pull request to the original repo.
Though the data is quite large in size, so I'm using LFS, but as you know I can't push an LFS object to a public fork, unless either 1-the original repo already has LFS objects 2-or I have push access to the original repo.
I suggest using LFS with your repo,(by uploading a test object which size's >25mb).

So what do you think?
The files I wanna publish are extended version of the nouns table.
Thanks.
PS: Can we talk in Arabic?

Enrich verb list

*verb list to check in order to enrich arramooz verb list get from ayaspell

  • data extracted from arramooz project from noun list
  • verbVocalized
    سَنِيَ
    آرَبَ
    أَبَّدَ
    أَبِدَ
    أَبَّنَ
    أَزَّ
    أَصْغَرَ
    أَعَلَّ
    أَمِرَ
    أَمَنَ
    أَنْطَحَ
    اِتَّأَدَ
    اِسْتَبْخَرَ
    اِسْتَنْوَقَ
    مَخَطَ
    بَحَتَ
    بَدَهَ
    بَرَصَ
    بَصَّ
    بَكَّتَ
    بَكِمَ
    بَلَهَ
    بَهَى
    تَدَانَى
    تَرَفَلَّ
    تَقَى
    تَكَبَّلَ
    تَكَبَّلَ
    تَكَمَّشَ
    تَكَمَّشَ
    جَافَ
    جَمِلَ
    حَاصَ
    حَاصَ
    حَصِلَ
    حَصَنَ
    حَفَظَ
    خَامَ
    خَرَّفَ
    دَنَسَ
    رَخَى
    رَصَّعَ
    رَغَبَ
    رَمَضَ
    زَهَدَ
    سَعِرَ
    سَهَدَ
    شَأَى
    شَاتَى
    شَجِيَ
    شَعِرَ
    شَقَرَ
    شَقَى
    شَهَّدَ
    طَبَقَ
    طَرِدَ
    عَصِبَ
    عَطَشَ
    عَلِنَ
    غَبَّ
    غَبِرَ
    غَبِرَ
    غَبِشَ
    غَبَطَ
    غَبَنَ
    غَبِنَ
    غَلَّلَ
    غَلِمَ
    غَمُرَ
    غَمِرَ
    غَمِصَ
    غَمَطَ
    غَمِطَ
    غَمْغَمَ
    غَنَّ
    غَوَّى
    غَيِدَ
    فَرِغَ
    فَسُدَ
    قَحَلَ
    كَذِبَ
    كَرِهَ،كَرُهَ
    كَرِهَ،كَرُهَ
    كَمِلَ
    لَابَ
    لَحَقَ
    مَازَ
    مَذَرَ
    مَعْمَعَ
    مَهُنَ
    نَبَلَ
    نَبَلَ
    نَبُلَ
    نَبُلَ
    نَحَفَ
    نَخَا
    نَذَلَ
    نَهَّدَ
    هَذَا
    هَرَمَ
    هَلَعَ
    وَثَقَ
    وَثَقَ
    وَذَمَ
    وَرَّطَ
    وَشَكَ
    وَعُقَ
    وَنِيَ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.