Evaluate the scalability of Whoosh

Convert to issues all the items in the tracker issues

It is necessary in order to avoid people referring to the tracker issues instead of using some more specific issue that can be closed when changes are landed.

Split some pages

For example it might be a good idea to put the TM search and the TM list in separate pages.

Translate comments in Catalan to English

Provide API capable of feeding CAT tools

Get rid of globals on Python code

They don't feel right. Code will be cleaner and easier to understand if we just don't use global at all.

Allow to tag or group TMs by categories

For example Inkscape TM can be tagged as "graphics", and LibreOffice and Apache OpenOffice can be tagged as "office".

jQuery is missing in HTML files

Even if all SC-related HTML files are including some jQuery-dependent scripts (like the cookiechecker), they don't include the jQuery library, what generates an error when the pages are displayed in the browser.

Allow to retrieve all matching results independently on how many they are

This is required by #16

Use pep8 checker to check PEP8 compliance

https://pypi.python.org/pypi/pep8

Examples of use in Pootle repo, among others.

Move images to img/ directory

Including the svg/ directory.

Make sure which Python this works on

In the README it says "python 2.7 or higher".

But I have serious doubts about if this code can work on Python 3 (I am thinking on stuff like print "some string" where print is treated like a keyword instead of a function).

Also I am curious why this is not supposed to work on Python 2.6 (maybe it is because of format?).

Move embedded HTML in Python code to standalone templates

Move embedded HTML in Python code to standalone templates (first step towards using Flask).

This means, for example, converting all the prints with HTML code in https://github.com/Softcatala/translation-memory-tools/blob/57fa66e1/web/web_search.py to something like https://github.com/Softcatala/translation-memory-tools/blob/57fa66e1/web/index-creation.py#L37-L39 that renders a template with all that embedded HTML code.

Move CSS files to css/ directory

Choose coding style for shell scripting

It is not clear which coding style is used for the various shell scripts present in this repository. Having some clear guidelines, like how many characters are used for indentation, if then appears in the same line as if or not, and such will be helpful to create consistent code easier to read by humans.

Perform a cleaning sweep on all the code to ensure we only have valid HTML5

Reachitecture the tool for a version 2.0

Background

This ticket collects all the architecture improvements needed to fully the new set of requirements and address the limitations that we learnt until September 2014.

Downloading, converting and building translation memories

Decouple the download process #63
Create a configuration subdirectory where every project has its own json file #62

Solution: Consider moving to TMX as native format for the tool. Some considerations:

The limitation here is that there are not out of the box tools for merging (msgcat) catalogs.
Probably we will need to build something and consider contributing it to translate toolkit.
Many tools that convert from other formats like TS, strings, etc, they do convert to PO (ts2po) not to TMX. We need need to think if we are OK converting from these formats to PO and then to TMX or we need native conversors.
This will require rewritting the index creator, terminology analysis and other tools since all the of they relay on PO files as source format

Limitation: The conversion from any format to PO format is limited. The problems observed are:

Currently we are using the file extensions to identify the formats. In the case of INI or strings files you need sometimes to be more specific since these can have different variations.

Solution: By the default, as today, we have conversors associated to extensions. However, also having some kind of pattern matching in projects.json where you can specify per project which conversors to use.

Web application

Limitation: Currently all the Softcatalà application is tightly coupled with with the backends.
Solution: The Softcatalà application, and any other front ends, should be independent applications that different teams maintain that use APIs to interact with the system. In github, we should have a simple agonistic web application to show the APIs work (instead of Softcatalà one). We should provide 3 APIs:

API to search the text index (#28)
API to access the translation memory downloads created (date, file, etc)
API to access the terminology items created (glossaries)

Limitation: The web application is written using CGI
Solution: Write the application using MVC (#24)

Text Search engine

Potential limitation: We are currently using Whoosh as full text search engine. We are not sure how this will scale if we add 50 languages and 50 projects more for example.
Solution: See (#24)

Integration with Translation Memory servers

The vision here is not to implement a Translation Memory server. Implement #23 to integrate with (Amagrama) https://github.com/translate/amagama.

Remove all PO and TMX files that are not used for testing purposes

Add tagging to differentiate between documentation and interface TMs

Because sometimes you might need to narrow your search for specifically searching in docs or in interface translations.

Have versioning for TMs

This means keep TMs for old releases, like GNOME 2.28 or GNOME 2.30, available for checking how the translation evolves.

These old TMs shouldn't be queried in the default search.

Add docs about how to deploy

Choose coding style for Python

I suggest using the Pootle Python coding style (which extends the TTK one which extends PEP8): http://docs.translatehouse.org/projects/pootle/en/latest/developers/styleguide.html

Choose coding style for CSS

I suggest using the Pootle CSS coding style: http://docs.translatehouse.org/projects/pootle/en/latest/developers/styleguide.html#css

Move SoftCatalá specifics out of repo

I am talking about maybe:

Simplify the root README

2 or 3 paragraphs intro, and move the rest to the docs/ directory.

Move templates to templates/ directory

This includes plain HTML files and any file used as template or snippet parsed by any templating system (Jinja, Mustache or whatever).

Migrate from CGI to an MVC framework for web apps

Flask has been suggested to be such framework

[TRACKER] Add docs

Move all tests code to tests/ directory

This means moving unittests/ directory and integration-tests/ directory to a new tests/ directory.

Add integration with Travis CI

Add license

I can't believe this has no license or copying file.

Remove specific references in code to specific paths

Like jmas home or other similar paths.

Highlight probable translation for the query in the results

I know it is going to be really hard.

Add docs about how to import translations

Add more projects to generate TMs for

This is feeding the tool with more translations :)

Perform a cleaning sweep on all the code to ensure we have Python following agreed coding style

Remove all README (or other notes files) in the repository, but the root README

They can be converted into issues or moved to docs/.

[TRACKER] Allow to generate TMs for the different projects using different frequencies

This is necessary because some projects update few times in long periods of time, and some other projects are just dead. Also not all the projects use a translation format that can be easily converted to PO or TMX, at least without human intervention.

Acceptable frequencies can be:

never (this TM was created manually and is not meant to be recreated automatically)
yearly
monthly
weekly (most important FLOSS projects)

In a future it must be possible to specify different frequencies by language.

This issue has to be split into several smaller ones, i.e. this is a tracker issue.

Add live form to filter returned search results by target translations

This is for providing a way to quickly find all the ways a given English word is translated to a source language.

For example if we want to know all the ways the "window" word is translated to Galician language we can start by specifying "xanela", then "fiestra"... and that hides the results that match and counting the occurrences by project. This useful when discussing terminology.

This can be achieved with a bit of JavaScript on a special search page that returns all the results for the specified query.

[TRACKER] Reorganize/cleanup code

Choose coding style for HTML

I suggest using the Pootle HTML coding style: http://docs.translatehouse.org/projects/pootle/en/latest/developers/styleguide.html#html

Include this project docs in Read the Docs

Record requirements in a requirements file

This allows to quickly install all the dependencies using pip. It might be necessary to have a directory with several files, each for different scenarios.

Decouple web from API

This means provide search results using only a JSON API. The web page holding the search form then can retrieve the results from the API and append the results in the very search page. That way the search page can be served as static HTML and we already have an API capable of feeding CAT tools.

Pushstate is essential. It must be possible to be able to share specific search URLs with other people.

Depends on #23

Have better visual design (using CSS)
Split some pages (#26)

Perform a cleaning sweep on all the code to ensure we have valid CSS

Allow to assign a quality level to each TM

In order to:

Prioritize the returned results
Allow to just query by default the TMs with higher quality (in some special scenarios it still is necessary to query all the existing TMs)

softcatala / translation-memory-tools Goto Github PK

translation-memory-tools's Introduction

Introduction

Installation

Setting up before execution

Running the builder code locally

Running the system locally using Docker

Contributing

Contact Information

translation-memory-tools's People

Contributors

Stargazers

Watchers

Forkers

translation-memory-tools's Issues

Background

Downloading, converting and building translation memories

Web application

Text Search engine

Integration with Translation Memory servers

Recommend Projects

Recommend Topics

Recommend Org