mozilla / puente Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 5.0 186 KB

UNMAINTAINED: Django/Jinja2 l10n extract/merge commands and things (Tower replacement)

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.69% Python 98.31%

puente's People

Contributors

Stargazers

Watchers

Forkers

washort vericant thor jwhitlock mozilla-github-standards

puente's Issues

get puente hooked up to readthedocs

Every time we land stuff, it should update the docs on readthedocs.

ditch the "gettext marks all msgids as safe" thing

Tower and thus Puente have some code to mark all translated strings from gettext calls as safe and thus the values don't get escaped when used in a Jinja2 template.

Jinja2 defaults to everything being unsafe except parts which are explicitly marked as safe. Several things:

This is a much better default for security reasons.
In practice we should be avoiding putting HTML into strings as much as possible.
Thus in practice, we shouldn't have that many strings that need to be safe.

This issue covers ditching that code, putting a big backwards-compatability note in the changelog and writing up documentation walking developers through how they should be doing gettext in their templates in regards to |safe.

fix exit codes in extract command

The extract command is inconsistent in its error handling and exit codes.

FIX IT!

convert Puente's merge to be like pybabel's update?

Puente has a merge command and pybabel doesn't. It's possible pybabel's update command is the same thing, but I don't know offhand.

This issue covers looking into that. Are they doing the same thing? If so, then we should convert Puente's merge to update and make it behave like pybabel's update. If not, then what's the pybabel equivalent?

flesh out template usage docs

The documentation for using gettext, ngettext, pgettext and npgettext in templates is pretty lacking. Amongst other things, we wanted to flesh out the various situations regarding escaping, variable interpolation, etc.

This issue covers that work.

look into removing compendia code

Tower has a bunch of stuff for compendiums which I never ripped out of the extract/merge commands for Puente.

Does anyone use that? As far as I can tell, the answer is "no".

Does Django have something like that? As far as I can tell, the answer is still "no".

If those two things are really "no", then we should nix it.

add db strings extraction command?

SUMO has some infrastructure for extracting strings from the database and putting them in a Python file which then gets passed over for strings extraction.

Input is going to need that, too.

Can we add that to Puente?

add details to .pot file header

We should add the rest of the details to the .pot file header with sensible defaults:

project: PROJECT
version: 0.1
msgid_bugs_address: bugzilla.org url
copyright holder: Mozilla Foundation

fix issues with testing and jinja templates?

When using Jinja2 templates, assert method from Django's SimpleTestCase that relate to rendering don't work namely these:

assertTemplateUsed
assertTemplateNotUsed

Also, the response object has no context, so you can't test whether things were correctly in the context.

I wrote about it a bit in my blog post here:

http://bluesock.org/~willkg/blog/mozilla/input_django_1_8_upgrade.html#checking-templates-rendered-and-context-in-tests-with-jinja-templates

I did some monkeypatch stuff in Fjord to deal with this. @robhudson is hitting the same problem in MDN. Maybe it makes sense to fix this issue in puente, while we/others figure out how to fix it upstream in Jinja2, django-jinja, Django, etc.

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please see Mozilla-GitHub-Standards or email [email protected].

(Message COC001)

locale/compendia is unused by merge

The merge command say that the locale/compendia directory will be used by ./manage.py merge:

puente/puente/management/commands/merge.py

Lines 21 to 26 in 7c04aad

  During merging (or initializing), the command will also look in 

  `locale/compendia` for a locale-specific compendium of 

  translations (serving as a translation memory of sorts). The 

  compendium file must be called `${locale}.compendium`, 

  e.g. `es_ES.compendium` for Spanish. The translations in the 

  compendium will be used by gettext for fuzzy matching.

During some recent tests (mdn/kuma#5302), compendia in this folder were not used to initialize a new .po file from the template .pot file. I suspect this functionality was removed, but the docstring wasn't cleaned up. I was able to write commands to get the desired behavior.

tests for merge command

We need more tests for the merge command. First thing would be to look at the Tower code and see what's there. Second thing would be to add tests for things that aren't tested.

If someone wants to handle the first thing (which should be more straight-forward), I can do the analysis for the second thing.

implement compile command

We should implement a compile command or document how to use the babel one.

http://babel.pocoo.org/docs/cmdline/#compile

add django gettext keywords

upgettext
upgettext_lazy
pgettext
pgettext_lazy
etc

Add all the ones that we're currently missing.

endgame: switch to babel extract, init, update, compile

Instead of converging on django's makemessages, we might want to converge instead on Babel's extract, init, update and compile. If we did this, then the amount of code we maintain drops by a lot.

We could do this now, but there are a few caveats:

First, Puente has its own PuenteI18nExtension which fixes trans blocks to collapse whitespace and fixes gettext to output markup as safe. For extraction purposes, only the trans block fix is important. However, the Jinja2 Babel extractor does this goofy thing where it automatically adds Jinja2's i18n extension which overrides ours before doing extraction. Puente has an extract command that monkeypatches that first, then does the work. If we use Babel, we'll have to use our own extractor that does that, too.

We can fix this with our own extractor. Also, this problem goes away if when Jinja2 supports trimmed trans blocks.

Second, Puente pulls the Jinja2 extensions from the settings file and uses those to set up the environment for strings extraction. If we switch to Babel, we have to put those extensions in a babel.cfg file which means we end up with two places we have to list the extensions.

I'm not sure what to about this one. We can live with it, but it's problematic. Maybe if we write our own extractor then it could pull the extensions and populate itself?

Third, if you're using vendor/ then you have to do all this stuff to make sure the pybabel script can access the libs being used. Further, if you have babel from source (i.e. a git submodule), then it's not installed and doesn't have the cdlr data.

This goes away when we stop doing the vendor thing.

coverage is described in `make help` but has no implementation

https://github.com/mozilla/puente/blob/master/Makefile#L9

Merge command leaves domain POT file open

Tests raise this warning:

tests/test_merge.py::TestMergecommand::test_basic
  /home/travis/build/mozilla/puente/puente/commands.py:201: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/pytest-of-travis/pytest-0/test_basic0/locale/templates/LC_MESSAGES/django.pot' mode='r' encoding='UTF-8'>
    domain_pot_file = open(domain_pot)

My reading of the code is that when the locale is en_US, the file is left open:

puente/puente/commands.py

Lines 201 to 223 in 4e22c22

 domain_pot_file = open(domain_pot) 

 if locale == 'en_US': 

 enmerged = TemporaryFile('w+t') 

 p2 = Popen(['msgen', '-'], stdin=domain_pot_file, 

 stdout=enmerged) 

 p2.communicate() 

 mergeme = enmerged 

 else: 

 mergeme = domain_pot_file 

 mergeme.seek(0) 

 command = [ 

 'msgmerge', 

 '--update', 

 '--width=200', 

 '--backup=%s' % ('simple' if backup else 'off'), 

 domain_po, 

 '-' 

 ] 

 p3 = Popen(command, stdin=mergeme) 

 p3.communicate() 

 mergeme.close()

It may be possible to eliminate the warning by closing domain_pot_file before leaving the if clause. A larger change would be to modify the code to use a context manager.

switch to mozilla/ org

At some point P, we'll decide that this experiment is good and we should move forward. At that point, we should move this to the mozilla github org.

We should do that before doing any releases.

silent=False may cause the rainforest to die

I had the best of intentions when setting silent=False automatically. However, I misread the Jinja2 code. I thought it'd just print out the errors and move on. I didn't realize it'd print out the error and then set itself on fire and die an agonizing death. That's totally not what we want.

Given that, we should nix it automatically getting added.

document extract and merge commands

Add documentation and examples for using the extract and merge commands.

test puente with django 1.7 and jingo 0.8.1

jingo 0.8.1 support django 1.7+. Given that, we should test puente with it and update the documentation where appropriate.

Wiki changes

FYI: The following changes were made to this repository's wiki:

defacing spam has been removed
the wiki has been disabled, as it was not used

These were made as the result of a recent automated defacement of publically writeable wikis.

drop whitespace collapsing

Pretty sure that we can drop collapsing whitespace in trans blocks in Jinja2 templates. We should verify that, update the Jinja2 requirement accordingly, and drop the hacky handling for it in Puente.

tell the webdev folks about tower -> puente

tests for extract command

We need more tests for the extract command. First thing would be to look at the Tower code and see what's there. Second thing would be to add tests for things that aren't tested.

If someone wants to handle the first thing (which should be more straight-forward), I can do the analysis for the second thing.

documentation

Need to write up documentation on the following topics:

installation (pretty easy)
configuration
usage
switching to puente from tower
hacking
1. install for hacking
2. running the tests
3. submitting fixes
project goals and what things need to be done so we can set fuego to this puente

double-check copyright and licensing on all files

This is essentially a derivative of Tower and Fjord.

We need to double-check the license and copyright information through the files of the project.

add tests for ngettext

Gah! We have no tests for ngettext!

endgame: look into merging extract and merge commands into a makemessages command a la django

Django has a makemessages command that does both extract and merge. Why did Tower separate the two steps? Can we merge the two commands into one thus getting us closer to vanilla Django?

puente needs a logo

We need a logo. Something that really evokes the underlying reason for existence for this project.

As soon as we come up with an idea, I'll draw it freehand in gimp.

update AUTHORS

A bunch of this code is derived from Tower. Ergo we should include that AUTHORS list with this one.

get puente working with python 3.3, 3.4 and 3.5

Need to add tox environments for python 3.3, 3.4 and 3.5 and then test Puente out with them all and fix the issues.

ditch STANDALONE_DOMAINS

We inherited STANDALONE_DOMAINS from Tower. The code is still in the extract and merge commands, but the setting isn't documented in the docs and the default is to merge all the extracted strings into one .pot file regardless of what DOMAIN_METHODS is which I don't think is a good default. A better default would be to have one .pot file for each domain.

I checked Input and MDN and they're not using this feature. SUMO uses it, but doesn't need to--they could trivially tweak their configuration so they didn't need it anymore. I don't know of anyone else using it and can't think of a compelling reason to keep it around.

Given that, I think we should ditch the STANDALONE_DOMAINS code from the extract and merge commands. Instead, each domain listed in DOMAIN_METHODS should get its own .pot file.

pgettext for templates

Templates have _ (alias for gettext), gettext and ngettext available. These are handled by Jinja2 contextfunctions which call the underlying installed gettext/ngettext callables, get strings back and then wrap in a Markup.

https://github.com/mitsuhiko/jinja2/blob/master/jinja2/ext.py#L135

Yay! That's great!

However, there's no pgettext and no way that I can see to pass in a msgctxt.

This issue covers figuring out what to do about that. At the moment, I'm leaning towards adding it as a global to our PuenteI18nExtension for the short-term.

Long-term, we should submit a PR for this issue: pallets/jinja#441

specify patterns for extracting strings from installed dependencies

In the tower days, the mozilla sites all used vendor/ where they would have all the dependencies on the file system. This made it easy to specify extraction patterns so that strings from those dependencies would be included in the .pot file.

Nowadays, mozilla sites aren't doing vendor/. Instead, we've got requirements files and during deploys the dependencies are installed into a virtual environment.

The problem this creates is that there's no way to specify extraction for strings in a dependency. There's no way to know where that dependency's files are at the time the settings module is being loaded.

One possibility to figure this out is to do something like this:

from django.utils.functional import lazy
from django.utils.module_loading import import_string
import os


def get_pattern(module_name, pattern):
    module = import_string(module_name)
    path = os.path.dirname(module.__file__)
    return path + pattern

get_path_lazy = lazy(get_path, str)

...

PUENTE = {
    ...
    'DOMAIN_METHODS': {
        'django': [
            ('**.py', 'python'),
            ('fjord/**/jinja2/**.html', 'jinja2'),
            ('fjord/**/templates/**.html', 'django'),

            (get_path_lazy('django', '*.py'), 'python'),
            (get_path_lazy('allauth', '**.html'), 'django')
        ],
        ...
    }
}

Assuming that works (I haven't tested it), then is this something Puente should internalize? We can't let people import Puente things in their settings file--that's a world of hurt waiting to happen. We could provide a three-item tuple for specifying patterns. So then the above might be:

PUENTE = {
    ...
    'DOMAIN_METHODS': {
        'django': [
            ('**.py', 'python'),
            ('fjord/**/jinja2/**.html', 'jinja2'),
            ('fjord/**/templates/**.html', 'django'),

            ('django', '*.py', 'python'),
            ('allauth', '**.html', 'django')
        ],
        ...
    }
}

Having three items would tip Puente off into knowing to import the module and use that as a root path for the pattern. Plus it's clear in the configuration what that pattern applies to.

Thoughts on other or better ways to solve this?

get puente working with django 1.7 and jingo

It'd be much easier for migrating if Puente worked with Django 1.7 and jingo. We should set up a tox environment for that and see how well it works.

Incompatible with Django 1.10 due to option_list being deprecated

In Django 1.8 option_list was depracted in favour of implementing add_arguments, and as such with Django's deprecation policy, the former way was removed in 1.10.

The recommended fix is to implement add_argument(self, parser) in the management modules.

Use Django's default prefix for translator comments

The prefix is "Translators:" for Django: https://docs.djangoproject.com/en/1.8/topics/i18n/translation/#comments-for-translators

puente may not work with jingo

@robhudson is having problems with puente and mdn.

First off, he needs to set extensions to ['jinja2.ext.i18n', 'puente.ext.PuenteI18nExtension', ...] otherwise Jingo doesn't install null gettext stuff and then gettext isn't in the context for puente to find. Here's our discussion about that:

https://github.com/mozilla/kuma/pull/3613/files#r43560442

Including both extensions seems to solve that, but there's a new problem where it's erroring out with:

19:32:21 web.1        | [02/Nov/2015 11:32:21] "GET / HTTP/1.1" 301 0
19:32:23 web.1        | Traceback (most recent call last):
19:32:23 web.1        |   File "/usr/lib/python2.7/wsgiref/handlers.py", line 85, in run
19:32:23 web.1        |     self.result = application(self.environ, self.start_response)
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/django/django/contrib/staticfiles/handlers.py", line 64, in __call__
19:32:23 web.1        |     return self.application(environ, start_response)
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/django/django/core/handlers/wsgi.py", line 187, in __call__
19:32:23 web.1        |     response = self.get_response(request)
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/django/django/core/handlers/base.py", line 199, in get_response
19:32:23 web.1        |     response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/django/django/core/handlers/base.py", line 111, in get_response
19:32:23 web.1        |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/django/django/db/transaction.py", line 394, in inner
19:32:23 web.1        |     return func(*args, **kwargs)
19:32:23 web.1        |   File "/home/vagrant/src/kuma/landing/views.py", line 31, in home
19:32:23 web.1        |     return render(request, 'landing/homepage.html', context)
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/django/django/shortcuts.py", line 50, in render
19:32:23 web.1        |     return HttpResponse(loader.render_to_string(*args, **kwargs),
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/django/django/template/loader.py", line 178, in render_to_string
19:32:23 web.1        |     return t.render(context_instance)
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/jingo/jingo/__init__.py", line 206, in render
19:32:23 web.1        |     return super(Template, self).render(context_dict)
19:32:23 web.1        |   File "/home/vagrant/env/local/lib/python2.7/site-packages/jinja2/environment.py", line 989, in render
19:32:23 web.1        |     return self.environment.handle_exception(exc_info, True)
19:32:23 web.1        |   File "/home/vagrant/env/local/lib/python2.7/site-packages/jinja2/environment.py", line 754, in handle_exception
19:32:23 web.1        |     reraise(exc_type, exc_value, tb)
19:32:23 web.1        |   File "/home/vagrant/src/kuma/landing/templates/landing/homepage.html", line 1, in top-level template code
19:32:23 web.1        |     {% extends "base.html" %}
19:32:23 web.1        |   File "/home/vagrant/src/templates/base.html", line 12, in top-level template code
19:32:23 web.1        |     <title>{% block title %}{{ _('Mozilla Developer Network') }}{% endblock %}</title>
19:32:23 web.1        |   File "/home/vagrant/src/templates/base.html", line 12, in block "title"
19:32:23 web.1        |     <title>{% block title %}{{ _('Mozilla Developer Network') }}{% endblock %}</title>
19:32:23 web.1        |   File "/home/vagrant/src/vendor/src/puente/puente/ext.py", line 11, in _gettext_alias
19:32:23 web.1        |     context.resolve('gettext')(context, text, *args, **kwargs)
19:32:23 web.1        | TypeError: <lambda>() takes exactly 1 argument (2 given)

The lambda could be the null gettext. I'm not sure what else it could be.

I have a few thoughts:

it's possible that Tower required Jingo, but Jingo also requires Tower
it's possible that we need to implement something more in Puente that Jingo needs from Tower
it's possible that Puente will never work with Jingo

This issue covers figuring all this out, implementing whatever needs implementing and updating the documentation accordingly.

make extract behave more like pybabel extract

Supporting most of the options should be pretty easy. Some of them we're going to want values in settings so they don't have to be specified on the command line.

However, there's one big difference between Puente extract and pybabel extract that I'm not sure what to do with. Puente takes an output directory and a domain, list of domains or 'all' and then generates all those .pot files and sticks them in the directory. pybabel takes a mapping file and an output file and extracts everything specified in the mapping file and dumps it in the output file. Those are two pretty different behaviors that don't work together.

support msgctxt in jinja2 templates: how?

Jinja2 trans tags don't support context. However, I'm not sure if gettext in Jinja2 templates can support context.

This issue covers figuring out whether they can or can't. If they can, we should add tests for it. If they can't, we should figure out what to do about that and possibly write up an issue in the Jinja2 issue tracker to add it.

get puente connected to travis-ci

We should be running tox for all environments on every pull request etc on Travis.

Sumo can't squash all our domains into one, we should talk

If you have a bunch of domains and can't squash them into django.po(t)
and javascript.po(t), then we should talk--open up an issue.

Sumo has several tools that extract strings. For example, there are tools in BuddyUp that generate a .pot file directly. We then use the tower merge command to merge those into the .po files. Tower's extract command doesn't do anything.

Options I can see:

Make puente support multiple domains, like tower did, despite making it further away from Django-core.
Modify BuddyUp to do something like the DB strings trick. It would generate a file that puente already knows how to extract from and put it somewhere puente will notice.
Write a extract tool that is compatible with puente that can extract from BuddyUp code, use puente's extract command, and ignore the one written for BuddyUp.
Ignore BuddyUp and never speak of this again.

add tests for context extraction

Tower added special code to handle msgctxt. Since then both Django and Babel added support for msgctxt, so I removed the code from Tower that did the magic things.

In Django, this is covered by the pgettext routines:

https://docs.djangoproject.com/en/1.8/topics/i18n/translation/#contextual-markers

This issue covers adding tests for msgctxt and making sure that Puente handles this correctly. If it turns out it doesn't, then we need to add code to handle it.

on readthedocs, star button is busted

There's a "Star" button in the upper left hand corner of the Puente docs on readthedocs. It's busted, though. We probably have to add something to docs/conf.py to fix it.

Add option to not create merge backup

According to https://www.gnu.org/software/gettext/manual/html_node/msgmerge-Invocation.html that should be --backup=off to the msgmerge tool.

Add width option to merge command

When I do a extract/merge it creates a lot of churn in git b/c the line wrapping widths vary from Pontoon to our merge command. It'd be great if there were an option to override puente's default of 200 with one that matches Pontoon's value of 78.

./manage.py merge tries to merge LC_MESSAGES

In a run of ./manage.py merge in the kuma project:

Merging javascript.po for ka
............ done.
Merging javascript.po for ko
.............. done.
 Can not find (/home/vagrant/src/locale/LC_MESSAGES/LC_MESSAGES/javascript.po).  Creating...
msginit: cannot create output file "/home/vagrant/src/locale/LC_MESSAGES/LC_MESSAGES/javascript.po": No such file or directory
Merging javascript.po for LC_MESSAGES
msgmerge: error while opening "/home/vagrant/src/locale/LC_MESSAGES/LC_MESSAGES/javascript.po" for reading: No such file or directory
Merging javascript.po for ln
............. done.
Merging javascript.po for mg
............ done.

Merge should detect a locale name of LC_MESSAGES and skip merging.

add babel extraction for django templates

Currently, none of the extractors work for Django templates. You can sort of use the Jinja2 extractor, but if an error is thrown when parsing (and it's likely this happens), then parsing silently fails and no strings are extracted.

Maybe add a dependency for django-babel https://pypi.python.org/pypi/django-babel which adds a Babel extractor for Django templates.

This issue covers looking into this further and then figuring out the best fix.

add support for django 2.0

Some projects are still using Puente. We should make sure it supports all currently supported versions of Django.

	During merging (or initializing), the command will also look in
	`locale/compendia` for a locale-specific compendium of
	translations (serving as a translation memory of sorts). The
	compendium file must be called `${locale}.compendium`,
	e.g. `es_ES.compendium` for Spanish. The translations in the
	compendium will be used by gettext for fuzzy matching.

	domain_pot_file = open(domain_pot)

	if locale == 'en_US':
	enmerged = TemporaryFile('w+t')
	p2 = Popen(['msgen', '-'], stdin=domain_pot_file,
	stdout=enmerged)
	p2.communicate()
	mergeme = enmerged
	else:
	mergeme = domain_pot_file

	mergeme.seek(0)
	command = [
	'msgmerge',
	'--update',
	'--width=200',
	'--backup=%s' % ('simple' if backup else 'off'),
	domain_po,
	'-'
	]
	p3 = Popen(command, stdin=mergeme)
	p3.communicate()
	mergeme.close()