bsolomon1124 / demoji Goto Github PK
View Code? Open in Web Editor NEWAccurately find/replace/remove emojis in text strings
Home Page: https://pypi.org/project/demoji/
License: Apache License 2.0
Accurately find/replace/remove emojis in text strings
Home Page: https://pypi.org/project/demoji/
License: Apache License 2.0
when calling demoji.last_downloaded_timestamp()
it returns a timeaware object of system localtime zone instead utc time
which is incorrect in UTC time.
To Reproduce
>>> demoji.last_downloaded_timestamp()
datetime.datetime(2020, 10, 26, 11, 21, 57, 347238, tzinfo=<demoji.UTC object at 0x7fab3f849eb0>)
>>> datetime.utcnow()
datetime.datetime(2020, 10, 26, 7, 58, 53, 767124)
>>> datetime.now()
datetime.datetime(2020, 10, 26, 11, 28, 54, 955153)
Desktop
Describe the bug
demoji.download_codes() has no inbuilt timeout and it gets stuck for a very long time if something is wrong.
I had this line at start of my app with a try except but it does not raises exception rather just gets stuck resulting in nginx timeout of my web app and the loop continues.
To Reproduce
Steps to reproduce the behavior:
import demoji
try:
demoji.download_codes()
except Exception:
pass
Expected behavior
It should download or timeout and raise Exception but since requests default timeout is None it keeps stuck
Hello. Thanks for your app.
Good idea to replace download endpoint's url with a permanent one. You are working with 12.0, but 13.1 is available. So, let's use "unicode.org/Public/emoji/latest/emoji-test.txt" ?)
Describe the bug
unicode.org connection refused. When try to download emojis codes. Rise exception connection refused.
Seams that unicode.org change the web to https://home.unicode.org
And the https://home.unicode.org/Public/emoji/12.0/emoji-test.txt does not exists anymore
To Reproduce
Steps to reproduce the behavior:
$ python3 -m timeit 'import json; json.load(open("/Users/brad/.demoji/codes.json"))'
100 loops, best of 5: 2.1 msec per loop
$ python3 -m timeit 'import ujson; ujson.load(open("/Users/brad/.demoji/codes.json"))'
200 loops, best of 5: 1.31 msec per loop
demoji version 1.1.0
python 3.10.6
Does not work with this emoji:
๐ซถ๐ป
it gets translated to
๐ซถlight skin tone
It is difficult for me to provided reusable software if the emjois are not already bundled in the demoji
package. I understand and acknowledge that the list of emjois could be routinely updated, but this is not a reason to routinely not bundle the list in the package itself. For instance, I believe Python comes bundled with unicode data. You can still offer users a way to update the emoji list, and otherwise fallback to what's in the package.
Why the tutorial only gives example of demoji.findall()?
What about the other major function replace()?
Is your feature request related to a problem? Please describe.
Yes. AWS Lambda provides the application ephermal storage space only on /tmp directory. When implementing demoji on AWS Lambda, we get a permission issue when caching the emoji data
Describe the solution you'd like
Ability to specify a directory when caching the emoji data. Possibly in the download codes function
Describe alternatives you've considered
I do not see any other alternative
I want to upload this library to the Arch user repository, and to do so it would be practical to have git tags for the releases, or at least for the most current release and future releases. Could you add that?
I.e. run these git commands
git tag -a -m 'Release v0.1.5' v0.1.5 f28ffba
git push --follow-tags
Is there a way to load existing codes.json
file that I put in some directory instead of downloading it every time, so that I can use the library in an offline environment? Thanks.
Useful library.
How can I remove all emojis from text?
Is there any solution for counting emojis in a given text?
Emoji
Describe the bug
There is a little mistype:
>>> pprint(seq.encode('unicode-escape')) # Python 3
(b"I bet you didn't know that \\U0001f64b, \\U0001f64b\\u200d\\u2642\\ufe0f,"
b' and \\U0001f64b\\u200d\\u2640\\ufe0f are three different emojis.\\n')
I suppose it must be print
Describe the bug
replace() function leaves unicode variation selector-16 (\xef\xb8\x8f
) when replacing Repeat Button emoji (๐๏ธ).
To Reproduce
import demoji
sample_var = '๐๏ธ sample text'
print(sample_var.encode('utf-8'))
>>> b'\xf0\x9f\x94\x81\xef\xb8\x8f sample text'
sample_var = demoji.replace(sample_var)
print(sample_var.encode('utf-8'))
>>> b'\xef\xb8\x8f sample text'
Expected behavior
String without \xef\xb8\x8f
sequence:
>>> b' sample text'
Is your feature request related to a problem? Please describe.
I am trying to replace the emoji found with description given in the demoji.findall(tweet)
{
"๐ฅ": "fire",
"๐": "volcano",
"๐จ๐ฝ\u200dโ๏ธ": "man judge: medium skin tone",
"๐
๐พ": "Santa Claus: medium-dark skin tone",
"๐ฒ๐ฝ": "flag: Mexico",
"๐น": "ogre",
"๐คก": "clown face",
"๐ณ๐ฎ": "flag: Nicaragua",
"๐ฃ๐ผ": "person rowing boat: medium-light skin tone",
"๐": "ox",
}
Describe the solution you'd like
I have a string with emoji :
Rooney ! Oh dear, oh dear ! Fucking dreadful ๐โฝ๏ธโฝ๏ธ
I want to make it like this:
Rooney ! Oh dear, oh dear ! Fucking dreadful 'see-no-evil monkey' 'soccer ball' 'soccer ball'
Describe alternatives you have considered
emo_dict=demoji.findall(emoticonString)
for key in emo_dict.keys():
emoticonString=emoticonString.replace(key, emo_dict[key])
using the replace_with_desc function on this string: "๐ง๐ปโโค๏ธโ๐โ๐ง๐ผ ๐ง๐ปโโค๏ธโ๐โ๐ง๐ฝ ๐ง๐ปโโค๏ธโ๐โ๐ง๐พ ๐ง๐ปโโค๏ธโ๐โ๐ง๐ฟ ๐ง๐ผโโค๏ธโ๐โ๐ง๐ป ๐ง๐ผโโค๏ธโ๐โ๐ง๐ฝ ๐ง๐ผโโค๏ธโ๐โ๐ง๐พ ๐ง๐ผโโค๏ธโ๐โ๐ง๐ฟ ๐ง๐ฝโโค๏ธโ๐โ๐ง๐ป ๐ง๐ฝโโค๏ธโ๐โ๐ง๐ผ ๐ง๐ฝโโค๏ธโ๐โ๐ง๐พ ๐ง๐ฝโโค๏ธโ๐โ๐ง๐ฟ ๐ง๐พโโค๏ธโ๐โ๐ง๐ป ๐ง๐พโโค๏ธโ๐โ๐ง๐ผ ๐ง๐พโโค๏ธโ๐โ๐ง๐ฝ ๐ง๐พโโค๏ธโ๐โ๐ง๐ฟ ๐ง๐ฟโโค๏ธโ๐โ๐ง๐ป ๐ง๐ฟโโค๏ธโ๐โ๐ง๐ผ ๐ง๐ฟโโค๏ธโ๐โ"
produces:
"๐ง๐ป\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ผ ๐ person, person, light skin tone, medium skin tone: ๐ person, person, light skin tone, medium-dark skin tone: ๐ง๐ป\u200d:red heart:\u200d:kiss mark:\u200d:person: dark skin tone: ๐ง๐ผ\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ป ๐ง๐ผ\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ฝ ๐ง๐ผ\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐พ ๐ง๐ผ\u200d:red heart:\u200d:kiss mark:\u200d:person: dark skin tone: ๐ง๐ฝ\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ป ๐ person, person, medium skin tone, medium-light skin tone: ๐ง๐ฝ\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐พ ๐ง๐ฝ\u200d:red heart:\u200d:kiss mark:\u200d:person: dark skin tone: ๐ง๐พ\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ป ๐ง๐พ\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ผ ๐ person, person, medium-dark skin tone, medium skin tone: ๐ง๐พ\u200d:red heart:\u200d:kiss mark:\u200d:person: dark skin tone: :person: dark skin tone:\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ป :person: dark skin tone:\u200d:red heart:\u200d:kiss mark:\u200d๐ง๐ผ :person: dark skin tone:\u200d:red heart:\u200d:kiss mark:\u200d"
emojis remain in the string
This package is advertised as being able to remove emjois, yet it lacks a basic remove function. I understand that there is a replace function, but that's not the same. It is misleading to advertise a package as being capable of removing emjois, but not providing a basic function to do so!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.