The rstxml2db
script converts RST XML files to DocBook XML.
To use the program without pip
and virtual environment, use the following command after cloning this repository:
$ PYTHONPATH=src python3 -m rstxml2db -h
To install rstxml2db
in a Python virtual environment, use the following steps:
Clone this repository:
$ git clone http://github.com/openSUSE/rstxml2docbook.git $ cd rstxml2docbook
Create a Python 3 environment and activate it:
$ python3 -m venv .env $ source .env/bin/activate
Update the
pip
andsetuptools
modules:$ pip install -U pip setuptools
Install the package:
$ ./setup.py develop
If you need to install it from GitHub directly, use this URL:
git+https://github.com/openSUSE/rstxml2docbook.git@develop
After the installation in your Python virtual environment, two executable scripts are available: rstxml2db
and rstxml2docbook
. Both are the same, it's just for convenience.
The script does the following steps:
- Read the intermediate XML files from a previous Sphinx conversion step (see
sec.build.xml.files
). - Resolves any references to external files and create a single XML tree in memory.
- Transform the tree with XSLT into DocBook and if requested, split it into several smaller files.
- Output to stdout or save it into one or more file, depending on if splitting mode is activated.
Usually, you first create the intermediate XML file (using the XML builder with the -b
option):
$ sphinx-build -b xml -d .../build/html.doctree src/ xml/
The src/
directory contains all of your RST files, whereas the xml/
directory is the output directory.
Each RST file generates a corresponding XML file.
After you have created the intermediate XML files, it's now time to use the rstxml2db
script. The script reads in all XML files and creates DocBook files, for example:
$ rstxml2db xml/index.xml
By default, the previous step uses the index.xml
file and generates several DocBook files all located in the out/
directory.
If you need one DocBook file, use the option -ns
to output the result DocBook file on stdout.
The workflow from converting RST XML files into DocBook involves these steps:
- Load the
index.xml
file. - Resolve all external references to other files; create one single RST XML tree.
- If
--legalnotice
is used, add the legalnotice file intobookinfo
. - If
--conventions
is used, replace first chapter withpreface
content. - Clean up XML:
- Remove IDs with no corresponding
<xref/>
. - Fix absolute colum width into relative value.
- Add processing instruction in
<screen>
, if the maximum characters inside screen exceeds a certain value.
- Remove IDs with no corresponding
- Output tree, either by saving it or by printing it to std out.
The transformation from separate RST XML files into a single RST XML tree uses mainly the element list_item[@classes='toctree-l1']
. Anything that is referenced is used as a file for inclusion. Everything else is copied as it is.
The transformation from the single RST XML tree into DocBook 5 uses the rstxml2db.xsl
stylesheet.
The convertion internally creates a single RST XML tree. This tree contains all information which is needed.
For example, the following things work:
- Internal referencing from one section to another (element
reference[@internal='True']
) - Internal references to a glossary entry (element
reference[@internal='True']
, but with@refuri
containing an#
character - External referencing to a remote site (element
reference[@refuri]
) - Different, nested sections are corretly converted into the DocBook structures (book, chapter, section etc.)
- Admonition elements
- Tables and figures
- Lists like
bullet_list
,definition_list
, andenumerated_list
- Glossary entries
- Inline elements like
strong
,literal_emphasis
The following issues are still problematic:
- Double IDs When RST contains the same title, the same IDs are generated from the RST XML builder. I consider it as a bug.
- Invalid Structures RST allows structures which are not valid for DocBook. For example, when you have sections and add after the last section you add more paragraphs. This will lead to validation errors in DocBook. The script currently does not detect these structural issues. You need to adapt the structure manually.