Giter VIP home page Giter VIP logo

genderizer's People

Contributors

muatik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

genderizer's Issues

some import errors...

you need to pip install naiveBayesClassifier, and then you get this...

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-431-c7439d1d9405> in <module>()
----> 1 from genderizer.genderizer import Genderizer

/Users/swyx/anaconda/lib/python2.7/site-packages/genderizer/genderizer.py in <module>()
      8 
      9 from namesCollection import NamesCollection
---> 10 from cachedModel import CachedModel
     11 
     12 

/Users/swyx/anaconda/lib/python2.7/site-packages/genderizer/cachedModel.py in <module>()
----> 1 import memcache
      2 import cPickle
      3 import os
      4 
      5 class CachedModel(object):

ImportError: No module named memcache

Genderizer in Python 3.5.1 on Windows

Hi Mustafa, thanks for taking a look at this. Here is my bug report.

Genderizer has some syntax issues with Python 3.5.1 on Windows in namesCollection.py where on line 48
items[firstName] = dict(items[firstName].items() + item.items())
Produces an error.
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

I think I fixed it with this code, but I'm new to Python and I'm not sure if it works correctly to expand the dictionary to account for genders of names in different languages.
tempItems = dict(items[firstName]) tempItems.update(dict(item.items())) items[firstName] = dict(tempItems)

Also a reference to cPickle in cachedModel.py no longer exists in Python 3.5.1. I had to change it to pickel and change the object reference in the code from cPickle. to pickel.

I think the last change was how the code was referencing OS paths in namesCollection.py when referencing the collectionSourceFile

Making these changes I have it working over the weekend. But it errors out when it can't find a name such as an Indian name Bharath. It would be nice to have the code label Bharath as Undetermined or something to that extent.

Here is the error when I try to detect Bharath
Traceback (most recent call last): File ".\GenderDetect.py", line 12, in <module> print(name + " " + Genderizer.detect(firstName = name)) TypeError: Can't convert 'NoneType' object to str implicitly

Finally, for some reason in my test code, I had to add the genderizer package to the python sys.path to search for all the python packages/modules/classes that genderizer was referencing.

Here is my test code:

`import sys

sys.path.append('C:\Program Files (x86)\Python35-32\Lib\site-packages\genderizer')

print(sys.path)

from genderizer.genderizer import Genderizer

firstNames = ["kathy", "paul", "frank", "Mauricio", "paula", "sonia", "masha", "stephen", "stephanie", "braden", "brandon", "John", "Joan", "Liz", "Elizabeth", "April", "Julius", "Julie", "Bill", "Bharath"]

for name in firstNames:
print(name + " " + Genderizer.detect(firstName = name))`

Thanks,
Matias

Training set for English is need

We need to have lots of tweets written in English to use as training set. Also, each tweet is required to be pre-classified as female or male.

Using a rule based algorithm, we can collect tweets sent by female users. For example, if we have female names we can collects tweets of users whose first names match one of these names. By the way, we have a first names database. We can do this.

ZeroDivisionError

If genderizer takes an text argument like below;

Tavla oynamayı bilmeyen erkek gitsin örgü örsün amk  Bu akşam konuşamadık 😦  Yaz geldi siz hala kış masalı şarkısını paylasıyorsunuz yeter yani  Erkek adam form yiyip formda kalmaz  Uyku diye meslek olsa mesaiye bile kalırım  Şimdi gitme vakti geldi burası son durak  Doğan güneş Bandırma 😊  Radu feat Sefir &amp; Farabi - Elveda: http://t.co/OKNGfoR3xV @YouTube aracılığıyla  Ben hala küçük bir çocuk  http://t.co/AwoTqKAqGK  
RT @saptroloji: BALIK: Balık burcuyla başa çıkabileceğinizi sanıyorsanız yanılıyorsunuz. Boşuna uğraşmayın.  Bundan sonra tarih derslerini kaçırmak yok  Tek yapacağın şey ilgilenmekti aslında  Gördüm yine kötü oldum  Let The games begin 😄  Canımız bira isterse sipariş ederiz kapımıza gelir hizmette sınır yok 😀  
Senin için mücadele etmeyen biri, doğru kişi olamaz  ? http://t.co/n9wh7t7af0  Biri bizim kafaları kargoyla bursa yollayabilr mi ?  Bizi biliyon, boktan insanlara alışıyoz 
#detect gives an  error like this
classifierScoreLogF = probablities['female'] / sum(probablities.values())
#ZeroDivisionError: float division by zero 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.