hunterhacker / jdom Goto Github PK

View Code? Open in Web Editor NEW

345.0 37.0 117.0 50.37 MB

Java manipulation of XML made easy

License: Other

Java 99.14% XSLT 0.03% HTML 0.83%

jdom's Introduction

Introduction to the JDOM project

Please see the JDOM web site at http://jdom.org/ and GitHub repository at https://github.com/hunterhacker/jdom/

Quick-Start for JDOM

See the github wiki for a Primer on using JDOM:

https://github.com/hunterhacker/jdom/wiki/JDOM2-A-Primer

Also see the web site http://jdom.org/downloads/docs.html. It has links to numerous articles and books covering JDOM.

Installing the build tools

The JDOM build system is based on Apache Ant. Ant is a little but very handy tool that uses a build file written in XML (build.xml) as building instructions. For more information refer to "http://ant.apache.org".

The only thing that you have to make sure of is that the "JAVA_HOME" environment property is set to match the top level directory containing the JVM you want to use. For example:

C:\> set JAVA_HOME=C:\jdk1.6

or on Mac:

% setenv JAVA_HOME /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home
  (csh)
> JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home; export JAVA_HOME
  (ksh, bash)

or on Unix:

% setenv JAVA_HOME /usr/local/java
  (csh)
> JAVA_HOME=/usr/java; export JAVA_HOME
  (ksh, bash)

That's it!

Building instructions

If you do not have the full source code it can be cloned from GitHub. The JDOM project at https://github.com/hunterhacker/jdom has the instructions and source URL to make the git clone easy.

You will need to have Apache Ant 1.8.2 or later, and you will need Java JDK 1.6 or later.

Ok, let's build the code. First, make sure your current working directory is where the build.xml file is located. Then run "ant".

If everything is right and all the required packages are visible, this action will generate a file called "jdom-2.x-20yy.mm.dd.HH.MM.zip" in the "./build/package" directory. This is the same 'zip' file that is distributed as the official JDOM distribution.

The name of the zip file (and the jar names inside the zip) is controlled by the two ant properties 'name' and 'version'. The package is called "${name}-${version}.zip". The 'official' JDOM Build process is done by creating a file 'build.properties' in the 'top' folder of the JDOM code, and it contains the single line (or whatever the appropriate version is):

version=2.0.0

If your favourite Java IDE happens to be Eclipse, you can run the 'eclipse' ant target, and that will configure your Eclipse project to have all the right 'source' folders, and 'Referenced Libraries'. After running the 'ant eclipse' target, you should refresh your Eclipse project, and you should have a project with no errors or warnings.

Build targets

The build system is not only responsible for compiling JDOM into a jar file, but is also responsible for creating the HTML documentation in the form of javadocs.

These are the meaningful targets for this build file:

package [default] -> generates ./build/package/jdom*.zip
compile -> compiles the source code
javadoc -> generates the API documentation in ./build/javadocs
junit -> runs the JUnit tests
coverage -> generates test coverage metrics
eclipse -> generates an Eclipse project (source folders, jars, etc)
clean -> restores the distribution to its original and clean state
maven -> generates the package, and makes a 'bundle' for maven-central

To learn the details of what each target does, read the build.xml file. It is quite understandable.

Bug Reports

Bug reports go to the jdom-interest list at jdom.org. But BEFORE YOU POST make sure you've tested against the LATEST code available from GitHub (or the daily snapshot). Odds are good your bug has already been fixed. If it hasn't been fixed in the latest version, then when posting BE SURE TO SAY which code version you tested against. For example, "GitHub from October 3rd". Also be sure to include enough information to reproduce the bug and full exception stack traces. You might also want to read the FAQ at http://jdom.org to find out if your problem is not really a bug and just a common misunderstanding about how XML or JDOM works.

Searching for Information

The JDOM mailing lists are archived and easily searched at http://jdom.markmail.org.

jdom's People

Contributors

Stargazers

Watchers

Forkers

doncorley lt1946 stefanbirkner hieulq cparker gburgett b828445 ngury81 petergeneric annb raycw zmy991215 igorgitcit kerwinma yafwang saife siddhadev turtledb wooyang dipakhore zhangleistar xxtheawesomerxx kukiwi fixg codelion ilm-informatique openconcerto lkjx77 yitusandan hamzazoudani typekpb playcraftserver rocwzp azechow al3jandr032 mr--what tempbottle lixiaoyue alphadyz violetlife mengrant pyboylei ripper2hl cjq phamthaithinh jornason jiangrongbo kettas sobolsigizmund zxltmj chojungnyun marvinyu gamedt ttxing emonmishra xiao-chen seberget johnsblatter appsecai-test xiuhy archaicjade guxin233 guogongjun smoothreggae liuxiaoqiang meetmrchen minsifansi xiaohunlt developer-os dganzorig consulo binglongworld tkggft charygao pebsconsulting dalavancloud josmarsm koko000000 zhutougg uwesinha yagee-de anakolutka gee12 zashed shothogun guyrapaport waveburst esti-burstein lbihanic ksaiganeshreddy linesmerrill magiclean mcastrocr schwehr eneklo mrpinott zivomri romanzek y4nn1k8 karianna

jdom's Issues

SAXBuilder.fileToURL method is not consistent with File.toURI().toURL()

fileToURL was seemingly writtent because the URL / URI classes were only introduced with Java 1.4.

There are some subtleties about URL's that this method does not accomodate. For example, it appears that for files, the 'authority' section for the file URL should (to be strictly compliant) be absent entirely (null), not "" (empty string), although most (all) processes that use file:/// type URLs will process the URL correctly.

For example, the file URL for the file /myfile should be: file:/myfile ... but, JDOM produces file:///myfile

The reference for this appears to have been http://www.ietf.org/rfc/rfc2396.txt section 3, but this has been obsoleted by http://www.faqs.org/rfcs/rfc3986.html

RFC3986 clarifies the authority section of the URI in section 3.

The bottom line, in reading the RFC, is that file: URL's never have an authority section (because they can only refer to the local machine), and certainly, the authority host is not "". As a result, the correct URL is file:/path/to/file.

File.toURI().toURL() produces the correct results, and it seems logical to simply replace the SAXBuilder.fileToURL() with the standard 'Java' way of doing things, which will be better at following the changing standards.

I suggest just replaceing the fileToRUL method entirely.

Text.append("") deletes curent value

Text text = new Text("value");
text.append("");
System.out.println("We have value '" + text.getValue() + "'.");  // prints "We have value ''."

JDOM should implement a 'proxy' EntityResolver that handles w3.org lookups

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

w3.org 'constant' DTD's can maybe be stored as resources, and other DTD's can be 'cached' in some store for re-use.

This wil not only be good netizenship, but can speed up a number of operations.

Perhaps it can be done as simply as an optionally available EntityResolver (although the implementation will not be as simple...).

Update - more detail on this issue:

As I am going through the junit tests, I am now at the point of testing the SAX and DOM builders. The issue I am having is that I am doing a lot of my work on the train as I commute.... and I don't have a network connection.

This is a problem because the validating parsers need to get some DTD's and XML Schemas from the web... (if they are web-referenced resources).

This is an age-old problem, but I can't think of a great solution. The ideal would be to run junit tests without having to have a network connection at all.

Of course, I could just use input documents that only reference local resources... (and I have) but, in the spirit of JDOM, is there an option for making this process easy in a general sense?

This is further compounded by there being some restrictions on some documents too, like the w3.org 'ban' on default Java user-agents: http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

My experimentation indicates that w3.org has put a blanket 'tarpit' of 30 seconds on any connection, regardless of what User-agent you use. This is 'significant'.

Typical solutions to this problem are things like OASIS catalogs, etc. but that feels heavy-weight... or, is it?

So, what options are there? Any ideas?

I think the following are key issues (and OASIS does not solve them all):

access to local copies of unavailable resources (no network connection?).
general performance improvements by caching entities that have an appropriate 'expires' timeout... no network access for 'cached' resources.
improved 'internet-friendliness' reducing unnecessary bandwidth to places like w3.org
reduce the amount of 'expertise' a JDOM user needs to do 'the right thing'.

Can JDOM be easily configured to become a good netizen? Should it be done by default?

Should have an official build in Maven Central

People have asked for this. Someone else put in a 1.1.1 build. We should do it for 1.1.2.

Tatu Saloranta has some experience.

JDOMSource should set InputSource.getSystemID if based on a Document with a BaseURI

That way the Transform resolvers can work, if necessary.

Bug with SAXBuilder's setExpandEntities() (David Cheng-Ping Wang)

From David:

I'm not sure if this is a bug or a feature, but I thought I would
report it anyway... I have attached (also reproduced below) a simple
example that illustrates the problem. I have tested this with Java
1.6EE, and JDOM's Jan 9th, 2009 nightly build as well as the standard
1.1 release.

In this example, I am trying to prevent the expansion of the entity

"&minus;"

in an XHTML document that is being read in and then
immediately written out. I create an instance of SAXBuilder,
setExpandEntities(false), then call the build() method on an input
XHTML doc. For simplicity, I then use an instance of XMLOutputter to
print the parsed document to standard out (Even though I don't think
it's necessary for standard out, I also make sure the encoding is
consistent between the Format and the OutputStream and that it is a
common "US-ASCII" format).

The original XHTML document uses the entity:

&minus;

But, the resulting XHTML printed to standard out shows:

&minus;&#x2212;

Apparently, setting "setExpandEntities(false)" had the effect of
duplicating the character. I would expect that setting expand
entities to 'false' would simply leave the "−", without
duplicating it in US-ASCII formatting.

This isn't a big problem because if the default value, 'true', is
used for entity expansion, the resulting output will simply contain

"&#x2212;"

instead of duplicating the character. Even though the
original entity encoding has changed, the resulting output will still
behave/appear the same as the original, which is probably what's
normally required.

======= INPUT XHTML DOCUMENT START =======
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl"
href="http://www.w3.org/Math/XSL/pmathml.xsl"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/2000/REC-xhtml1-20000126/DTD/xhtml1-strict.dtd">
<html>
  <head>
  </head>
  <body>
    <p>&minus;</p>
  </body>
</html>
======= INPUT XHTML DOCUMENT END =======


======= TEST JAVA CODE START =======
import java.io.File;
import java.io.OutputStreamWriter;

import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.Format;
import org.jdom.output.XMLOutputter;

public class Test {
        public static void main(String[] args) throws Exception{
                File fileInput = new File("testEntity.xml");
                Document doc;

                SAXBuilder b = new SAXBuilder();
                b.setIgnoringElementContentWhitespace(true);
                b.setExpandEntities(false);
                doc = b.build(fileInput);
                doc.getDocType().setInternalSubset(null);

                XMLOutputter outputter = new XMLOutputter();
                Format format = Format.getPrettyFormat();
                format.setEncoding("US-ASCII");
                outputter.setFormat(format);

                outputter.output(doc, new
OutputStreamWriter(System.out,format.getEncoding()));
        }
}
======= TEST JAVA CODE END =====

======= TEST ENTITIES DOCUMENT START =======
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.w3.org/Math/XSL/pmathml.xsl"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/2000/REC-xhtml1-20000126/DTD/xhtml1-strict.dtd">
<html>
  <head>
  </head>
  <body>
    <p>&minus;</p>
  </body>
</html>
======= TEST ENTITIES DOCUMENT END =======

It is possible to add root element before the DocType in a Document

This should not be possible, but is.

Document doc = new Document();
doc.setDocType(new DocType("doctype");
doc.addContent(0, new Element("root");

The validator documentCanContain() should be checking for >= not just >

Element should have convenience methods for Namespaces in scope

There are at least 4 (8 if you count more generously) places in JDOM code (and numerous places in 'user' code) where the 'in-scope' namespace set is calculated:

The Verifier class needs to access the stack to check for Namespace collisions.
The (SAX/DOM/XML)Outputter classes each use a special NamespaceStack class to manage namespaces.
The DOM and SAX builders both query the 'in-scope' namespaces to check for namespace prefixes.
The XPath/Jaxen API uses two seperate processes: one to calculate the 'namespace axis', the second to automatically add Namespace references to the XPath select context.

Each of these places use a mechanism for 'walking' the document tree to ascertain what Namespaces have been declared, each time accessing the Element's namespace, the Element's Attributes to check for namespaces, and finally for any additional namespace declarations.... finally, then walking up the Element's ancestry to check for additional declarations.

This 'routine' yet complicated task should be centralized and formalized in such a way that there is consistency in the internal JDOM code, and additionally, the JDOM users can re-use the same reliable functionality instead of rebuilding their own mechanism each time it is needed.

My early analysis indicates that two new methods would be most useful. One method returns all Namespaces that are in-scope for an Element. The second method returns all Namespaces that are introduced by this Element. A third method that returns all in-scope Namespaces declared in ancestor elements may also be useful, but it would be possible to calculate it from the results of the other two methods anyway.

The results of these methods should be either a Set, or a Map (keyed by the prefix). The members will be dynamically calculated because tracking the namespaces at each element would be prohibitive. This would also allow useful things like:

Set inherited = element.getInScopeNamespaces();
inherited.removeAll(element.getIntroducedNamespaces());

It would be even better if these sets had a defined order so that you can get consistency when doing simple things like jUnit baselines, and, for example, XMLOutputter could always output all 'introduced' Namespaces for the element in prefix-order (perhaps always after the actual attributes of the element which would be output in the insert order of the Attributes).

Making Namespace Comparable (in order of the prefix, followed by the URI) would be a logical extension, and using a SortedSet as the result would be useful.

The implementation could be done at the Content level as well. Attributes, Text, ProcessingInstructions, etc. could all simply defer to their parent Element for the in-scope Namespaces (or return an empty set if there is no parent element/document) . The Document class (if it ever implements Content) would always return an empty collection.

SAXOutputter does not fire startPrefixMapping() events for Attribute Namespaces.

In the 'simple' document:

Element root = new Element("root");
root.setAttribute("att", "val", Namespace.getNamespace("pfx", "mynamespace"));
Document doc = new Document(root);

JDOM will haeve a logical structure:

<root xmlns:pfx="mynamespace" pfx:att="val" />

But, the namespace is not 'fired' as part of the startPrefixMapping() event in SAXOutputter....

Element.setNamespace() can produce Namespace Conflicts

Namespace Conflicts happen when a Namespace is introduced that has the same prefix, but different URI as an existing Namespace.

For example, the following is illegal:

        Namespace nsa = Namespace.getNamespace("pfx","URIA");
        Namespace nsb = Namespace.getNamespace("pfx","URIB");

        Element emt = new Element("emt", nsa);
        emt.addNamespaceDeclaration(nsb); // should fail, and it *does*

but, the reverse should be true too, but it isn't:

        Namespace nsa = Namespace.getNamespace("pfx","URIA");
        Namespace nsb = Namespace.getNamespace("pfx","URIB");

        Element emt = new Element("emt");
        emt.addNamespaceDeclaration(nsb);
        emt.setNamespace(nsa);  // should fail, but does not.

Centralize all JDOM Constants, perhaps including other useful XML values.

There are a number of constants used in the JDOM code, for example JDOM_OBJECT_MODEL_URI, XPATH_CLASS_PROPERTY, etc.

Additionally, there are lots of places where feature and property names/URI's are hard-coded, and should instead be replaced with centralized constants.

In the same vein, there are a lot of constants used for different XML processing purposes, from simple things like in Namespace where the string value "http://www.w3.org/XML/1998/namespace" appears 6 times.

Not only would centralizing these constants make for a more reliabe code base, but it would also be a useful service to JDOM users who would be less likely to have typo-based errors.

Possible constants include SAX and DOM property and feature names, w3.org constants, etc.

JDOMResult.getDocument() should return null when content is not legal.

The code should do what the code's comment say it should do....

        try {
          JDOMFactory f = this.getFactory();
          if (f == null) { f = new DefaultJDOMFactory(); }

          doc = f.document(null);
          doc.setContent((List)result);

          result = doc;
        }
        catch (RuntimeException ex1) {
          // Some of the result nodes are not valid children of a
          // Document node. => return null.
        }

but, in the catch block, it does not set the 'doc' to null !!!

JDOMSource.JDOMInputSource.setByteStream() should fail

setCharacterStream() throws an UnsupportedOperationException.
setByteStream() should too.
Waht about setEncoding(), setPublicID(), and setSystemID()

RFE: Modify XMLOutputter to allow smart subclasses (Paul Libbrecht)

http://markmail.org/message/4nfda3qfi36lc5w5

Hello,

please find at:
http://www.activemath.org/~paul/tmp/DTDaware
a contribution to JDOM in the form of a patched XMLOutputter to allow
subclasses to stop output of some attributes and namespace
declarations as well as a DTDAwareXMLOutputter subclass which uses
Mark Wutka's DTDparser (I used version 1.23) to decide not to output
attributes or namespace decls if they are implicit in the DTD. This
feature has been a key to maintenance of a clean authorable XML.

I haven't put licenses yet... want me to?
Feel free to apply any license there.
thanks in advance

paul

Here's the XMLOutputter.java diff

--- XMLOutputter.java   2010-10-28 14:05:31.000000000 -0700
+++ /tmp/XMLOutputter.java  2011-07-31 17:50:46.000000000 -0700
@@ -115,7 +115,7 @@
 public class XMLOutputter implements Cloneable {

     private static final String CVS_ID =
-      "@(#) $RCSfile: XMLOutputter.java,v $ $Revision: 1.117 $ $Date: 2009/07/23 05:54:23 $ $Name:  $";
+      "@(#) $RCSfile: XMLOutputter.java,v $ $Revision: 1.117 $ $Date: 2009/07/23 05:54:23 $ $Name: jdom_1_1_1 $";

     // For normal output
     private Format userFormat = Format.getRawFormat();
@@ -1099,9 +1099,10 @@
      * declarations.
      *
      * @param ns <code>Namespace</code> to print definition of
+     * @param elt <code>Element</code> in which this namespace is output
      * @param out <code>Writer</code> to use.
      */
-    private void printNamespace(Writer out, Namespace ns,
+    private void printNamespace(Writer out, Namespace ns, Element elt,
                                 NamespaceStack namespaces)
                      throws IOException {
         String prefix = ns.getPrefix();
@@ -1111,6 +1112,9 @@
         if (uri.equals(namespaces.getURI(prefix))) {
             return;
         }
+        if(!shouldOutputNamespace(ns,elt,namespaces)) {
+            return;
+        }

         out.write(" xmlns");
         if (!prefix.equals("")) {
@@ -1123,6 +1127,19 @@
         namespaces.push(ns);
     }

+    protected boolean shouldOutputNamespace(Namespace ns, Element element, NamespaceStack namespaces) {
+        // Add namespace decl only if it's not the XML namespace and it's
+        // not the NO_NAMESPACE with the prefix "" not yet mapped
+        // (we do output xmlns="" if the "" prefix was already used and we
+        // need to reclaim it for the NO_NAMESPACE)
+        if (ns == Namespace.XML_NAMESPACE) {
+            return false;
+        } else if ( ((ns == Namespace.NO_NAMESPACE) &&
+               (namespaces.getURI("") == null))) {
+            return false;
+        } else
+            return true;
+    }
     /**
      * This will handle printing of a <code>{@link Attribute}</code> list.
      *
@@ -1141,36 +1158,33 @@
         for (int i = 0; i < attributes.size(); i++) {
             Attribute attribute = (Attribute) attributes.get(i);
             Namespace ns = attribute.getNamespace();
-            if ((ns != Namespace.NO_NAMESPACE) &&
-                (ns != Namespace.XML_NAMESPACE)) {
-                    printNamespace(out, ns, namespaces);
+            if (shouldOutputNamespace(ns,parent,namespaces)
+                    && ns != Namespace.NO_NAMESPACE && ns != Namespace.XML_NAMESPACE) {
+                    printNamespace(out, ns, parent, namespaces);
             }

-            out.write(" ");
-            printQualifiedName(out, attribute);
-            out.write("=");
+            if(shouldOutputAttribute(attribute,parent,namespaces)) {
+                out.write(" ");
+                printQualifiedName(out, attribute);
+                out.write("=");

-            out.write("\"");
-            out.write(escapeAttributeEntities(attribute.getValue()));
-            out.write("\"");
+                out.write("\"");
+                out.write(escapeAttributeEntities(attribute.getValue()));
+                out.write("\"");
+            }
         }
     }

+    protected boolean shouldOutputAttribute(Attribute attribute, Element parent, NamespaceStack namespaces) {
+        return true;
+    }
+
     private void printElementNamespace(Writer out, Element element,
                                        NamespaceStack namespaces)
                              throws IOException {
-        // Add namespace decl only if it's not the XML namespace and it's
-        // not the NO_NAMESPACE with the prefix "" not yet mapped
-        // (we do output xmlns="" if the "" prefix was already used and we
-        // need to reclaim it for the NO_NAMESPACE)
         Namespace ns = element.getNamespace();
-        if (ns == Namespace.XML_NAMESPACE) {
-            return;
-        }
-        if ( !((ns == Namespace.NO_NAMESPACE) &&
-               (namespaces.getURI("") == null))) {
-            printNamespace(out, ns, namespaces);
-        }
+        if(shouldOutputNamespace(ns,element,namespaces))
+            printNamespace(out, ns, element, namespaces);
     }

     private void printAdditionalNamespaces(Writer out, Element element,
@@ -1180,7 +1194,7 @@
         if (list != null) {
             for (int i = 0; i < list.size(); i++) {
                 Namespace additional = (Namespace)list.get(i);
-                printNamespace(out, additional, namespaces);
+                printNamespace(out, additional, element, namespaces);
             }
         }
     }

Here's DTDAwareXMLOutputter.java

package org.jdom.output;

import com.wutka.dtd.DTD;
import com.wutka.dtd.DTDElement;
import com.wutka.dtd.DTDAttribute;
import org.jdom.Element;
import org.jdom.Namespace;
import org.jdom.Attribute;

/** A subclass of {@link XMLOutputter} to avoid printing some attributes and namespace declarations
 * whose values is already the default as specified by the DTD. This is a key ingredient to provide
 * a much more readable output but may break re-parsing if not output with the appropriate
 * {@link org.jdom.DocType}.
 *
 * @author Paul Libbrecht <[email protected]>
 */
public class DTDAwareXMLOutputter extends XMLOutputter {

    public DTDAwareXMLOutputter() {
        super();
    }

    public DTDAwareXMLOutputter(DTD dtd) {
        super();
        this.setDtd(dtd);
    }

    public DTDAwareXMLOutputter(Format format) {
        super(format);
    }

    public DTDAwareXMLOutputter(XMLOutputter that) {
        super(that);
    }

    protected DTD dtd;

    public DTD getDtd() {
        return dtd;
    }

    public void setDtd(DTD dtd) {
        this.dtd = dtd;
    }



    protected boolean shouldOutputNamespace(Namespace ns, Element element, NamespaceStack namespaces) {
        if(super.shouldOutputNamespace(ns,element,namespaces))
        if(dtd == null) return true;
        DTDElement eltDecl = null;
        eltDecl = (DTDElement) dtd.elements.get(element.getName());
         if(eltDecl!=null) {
             String nsAttName;
             String prefix = ns.getPrefix();
             if(prefix!=null && prefix.length()>0) {
                 nsAttName = "xmlns:".concat(prefix);
             } else {
                 nsAttName = "xmlns";
             }
            DTDAttribute nsDecl = eltDecl.getAttribute(nsAttName);
            if(nsDecl != null && ns.getURI().equals(nsDecl.getDefaultValue())) {
                return false;
            }
         }
        return true;
    }

    protected boolean shouldOutputAttribute(Attribute attribute, Element parent, NamespaceStack namespaces) {
        if(false == super.shouldOutputAttribute(attribute, parent, namespaces)) return false;
        // PL: check if attribute is in default value, then don't output it
        DTDElement eltDecl = null;
        if(dtd!=null) {
            eltDecl = (DTDElement) dtd.elements.get(parent.getName());
            if(eltDecl!=null) {
                DTDAttribute attDecl = eltDecl
                        .getAttribute(attribute.getQualifiedName());
                if(attDecl!=null) {
                    String defaultValue = attDecl.getDefaultValue();
                    if(defaultValue!=null && defaultValue.equals(attribute.getValue()))
                        return false;
                }
            }
        }
        return true;
    }
}

XMLOutputter needs outputElementContentString(...) method

The various output methods come in three flavours:
void output(xx, OutputStream)
void output(xx, Writer)
String outputString(xx)

Except the 'different' methods:
void outputElementContent(Element, OutputStream)
void outputElementContent(Element, Writer)

This method should also have a 'symmetrical' method:
String outputElementContentString(Element)

Update and improve transform and XSLTransformer processes

So, I think I will tackle the last remaing chunk of code (transforms), and come back to the XPath later.... (Wednesday?)

If you're looking at transform, it would be nice to see whether the interface to Saxon can be improved. At present, I think Saxon is being presented with a SAXSource and SAXResult so the JDOM source tree is reconstructed as a Saxon tree. Since Saxon is able to transform JDOM input directly, this is pretty inefficient.

Using Saxon to deliver XPath 2.0 access, as an alternative to Jaxen, would also be quite feasible.

(Also note: the Javadoc for org.jdom.transform.XSLTransformer is looking very dated.)

Michael Kay
Saxonica

Remove unnecessary classes in the default package

Brad Cox wrote:

Jason, FYI the reason I stopped using JDOM after years of satisfied use was inability to make it coexist with OSGI. As I recall, the problem was trivial; 2-3 classes in the default namespace which gives OSGI fits. Been awhile since I've looked tho.

DescendantIterator is broken after Iterator.remove()

With a 'simple' document:

Document doc = new Document(new Element("root").addContent(new Element("child"));
Iterator it = doc.getDescendants();
it.hasNext(); 
it.next();  // gets the root element
it.remove(); // removes the root element, the iterator should now be empty!

it.hasNext(); //returns true, but should not!!!
it.next(); // returns the child Element, but, it should not, because child is no longer a descendant of doc.

remove() should either be unsupported, or 'fixed' to stay 'live' with the document tree.

List implementations throw IllegalAddException instead of NullPointerException

JDOM List implementations (ContentList, FilterList, AttributeList) throws IllegalAddException (IllegalArgumentException) instead of NullPointerException when the input value is null.

http://download.oracle.com/javase/6/docs/api/java/util/List.html#add%28E%29 (also addAll, etc).

In general, need to check all exceptions thrown match the interface.

org.jdom.xpath.JaxenXPath is not public

As a result you cannot specify JaxenXPath as the 'factory' for XPath.setXPathClass(Class)

This is a bigger problem than it would at first sem, because if you change the factory to some alternative class, you can't then change it back to JaxenXPath... in the entire JVM!

Also, you cannot inherit from it, subclass it, etc.

This all adds up to make testing a problem too.

No symmetrical SAXBuilder.getFastReconfigure.

SAXBuilder.setFastReconfigure has no get/is method.

JaxenXPath has a useless equals() method.

It does some tests, but then falls back to Object.equals().

Can two JaxenXPath instances ever be equals()?

For reference, here's the code.

   public boolean equals(Object o) {
      if (o instanceof JaxenXPath) {
         JaxenXPath x = (JaxenXPath)o;

         return (super.equals(o) &&
                 xPath.toString().equals(x.xPath.toString()));
      }
      return false;
   }

Namespace URI's can be any white-space

Namespace URI's can be any sequence of white-space, and any such URI is 'trimmed' to just "". This means that any code will get the NO_NAMESPACE namespace for any white-space sequence.

The actual white-space characters are not actually legal URI anyway, and these white-space URI's should be rejected as IllegalNameException.

Namespace URI Verification is very 'light-weight' as it is, but it would fail for empty namespaces, but the test is side-stepped by the trimming.

Additionally, the trim() is an operation don every time there is a getNamespace(), and it is unnecessary in 99.9999% (rough guess) of the time which is a waste of cycles.

org.jdom2.xpath.XPath has a poor model for setting the 'factory'

This package should have a proper 'factory' pattern.

The current problem is that it is impossible to have a reliable mechanism for setting the 'backing' XPath factory. If one area of code were to ever need a 'custom' XPath 'engine' then they could set the engine with the XPath.setXPathClass(), but this would impact all areas of the program, not just the part that needed the change.

Further, these other areas would be unable to check what factory they have, and, worse again, if issue #41 is fixed, that other area of code can be set to use the JaxenXPath factory again, but that will 'break' the original area that wanted the custom factory anyway.

A proper factory pattern needs to be used.

RFE: Create an HTMLOutputter

Idea: Create an HTMLOutputter to handle the HTML specific aspects (closing tags, escaped characters like é, etc).

Serialization is inconsistent and incomplete.

THe serialization of all classes in JDOM needs to be audited and corrected, as well as tested.

JDOMException 'compatibility layer' is not compatible.

JDOMException adds support for 'caused by' type logic that was absent in the earlier versions of Java.
This is no longer necessary, and this logic should be stripped. Current issues are:
JDOMException adds the 'caused by' message to the getMessage(). This is not consistent with the current Java practice.
JDOMException's printStackTrace results in the caused-by exception being dumped multiple times, once by Java's exception handling, and the other time by JDOMException's own 'compatibility' layer.

Document.toString() throws IllegalStateException when no root element

see title.

Document doc = new Document();
doc.toString();

FilterListIterator does not behave well when add() or remove() is called after previous()

Backward iterating does not work correctly when adding or removing content after previous. The nextIndex() / previousIndex() notion gets messed up.

Base URI isn't output (Larry Levin)

I've been using JDom for quite some time with no probelems but just hit a snag when trying to create an OWL ontology using it. This issue is that while I set a baseURI for the document, it isnt showing up when the document is written using the XMLOutputer. The code is as follows:

Namespace rdfNs = Namespace.getNamespace("rdfs", "http://www.w3.org/2000/01/rdf-schema#");
Element root = new Element("RDF", rdfNs);
Document doc = new Document(root);
doc.setBaseURI("http://foo.bar/owlTest");

After building the document, I write it as follows:

Format myFormat = Format.getPrettyFormat();
XMLOutputter outputter = new XMLOutputter(myFormat);
FileOutputStream fo =  new FileOutputStream("out.txt");
outputter.output(doc, fo);
fo.flush();
fo.close();

The problem is that the xml base decalaration xml:base="http://foo.bar/owlTest" must explicitly appear in the output but doesn't. Any help would be greatly appreciated.

Thanks
Larry Levin

RFE: Fast reconfiguration can be optimized (Scott Emmons)

Greetings Jdom group,

A couple of months ago I submitted an improvement (which made it into
1.1.1) to improve performance in cases where lots (in our cases
literally hundreds to thousands) of SAXBuilder.build() calls are made.
There is an issue with this patch under a circumstance where the fast
reconfig path won't happen. This is a "safe" bug in that it doesn't
functionally change anything, the faster codepath just isn't executed
(so it behaves as if setFastReconfigure() was not called).

The problem has to do with using a single boolean to guard both
lexical reporting configurations - which won't really work as intended
as there are three potential states here. In the patch below, this is
resolved in a much more universal and extensible way by having a map
for the features and a reusable method to handle the rest.

Note I tried to avoid more recent java language constructs and
features, but as written below it probably relies on autoboxing.

--- orig/src/java/org/jdom/input/SAXBuilder.java   2009-07-22
23:26:26.000000000 -0700
+++ src/java/org/jdom/input/SAXBuilder.java     2009-08-18
10:12:55.000000000 -0700
@@ -136,12 +136,9 @@

    /** Whether to use fast parser reconfiguration */
    private boolean fastReconfigure = false;
-
-    /** Whether to try lexical reporting in fast parser reconfiguration */
-    private boolean skipNextLexicalReportingConfig = false;
-
-    /** Whether to to try entity expansion in fast parser reconfiguration */
-    private boolean skipNextEntityExpandConfig = false;
+
+    /** Map for results of properties eligible for fast reconfiguration */
+    private HashMap skipNextConfig = new HashMap(3); //<String,Boolean>

    /**
     * Whether parser reuse is allowed.
@@ -691,68 +688,63 @@
             parser.setErrorHandler(new BuilderErrorHandler());
        }

-        // If fastReconfigure is enabled and we failed in the previous attempt
-        // in configuring lexical reporting, then we skip this step.  This
-        // saves the work of repeated exception handling on each parse.
-        if (!skipNextLexicalReportingConfig) {
-            boolean success = false;
-
-            try {
-                 parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
-                                   contentHandler);
-                success = true;
-            } catch (SAXNotSupportedException e) {
-                // No lexical reporting available
-            } catch (SAXNotRecognizedException e) {
-                // No lexical reporting available
-            }
-
-            // Some parsers use alternate property for lexical handling (grr...)
-            if (!success) {
-                try {
-                    parser.setProperty("http://xml.org/sax/properties/lexical-handler",
-                                       contentHandler);
-                    success = true;
-                } catch (SAXNotSupportedException e) {
-                    // No lexical reporting available
-                } catch (SAXNotRecognizedException e) {
-                    // No lexical reporting available
-                }
-            }
-
-            // If unable to configure this property and fastReconfigure is
-            // enabled, then setup to avoid this code path entirely next time.
-            if (!success && fastReconfigure) {
-                skipNextLexicalReportingConfig = true;
-            }
+        // Set lexical reporting properties on parser, if this fails then try
+        // an alternate property that some parsers user
+        if (fastSetProperty(parser,
+                "http://xml.org/sax/handlers/LexicalHandler",
+                contentHandler) == false) {
+
+            fastSetProperty(parser,
+                    "http://xml.org/sax/properties/lexical-handler",
+                    contentHandler);
+        }
+
+        // Try setting the DeclHandler if entity expansion is off
+        if (!expand) {
+            fastSetProperty(parser,
+                    "http://xml.org/sax/properties/declaration-handler",
+                    contentHandler);
        }

-        // If fastReconfigure is enabled and we failed in the previous attempt
-        // in configuring entity expansion, then skip this step.  This
-        // saves the work of repeated exception handling on each parse.
-        if (!skipNextEntityExpandConfig) {
-            boolean success = false;
-
-            // Try setting the DeclHandler if entity expansion is off
-            if (!expand) {
-                try {
-                    parser.setProperty("http://xml.org/sax/properties/declaration-handler",
-                                       contentHandler);
-                    success = true;
-                } catch (SAXNotSupportedException e) {
-                    // No lexical reporting available
-                } catch (SAXNotRecognizedException e) {
-                    // No lexical reporting available
-                }
-            }
-
-            /* If unable to configure this property and fastReconfigure is
-             * enabled, then setup to avoid this code path entirely next time.
-             */
-            if (!success && fastReconfigure) {
-                skipNextEntityExpandConfig = true;
-            }
+    }
+
+
+    /**
+     * Fast set property. Attempts to set a property on the parser. If the
+     * property setting fails (due to an exception from the parser), then next
+     * time we attempt to configure this parser instance, we just skip this
+     * because we already know it will fail.
+     * @param parser Parser to configure
+     * @param property Property to set
+     * @param contentHandler ContentHandler to set
+     * @return true if the property was set, false if it was not
+     */
+    private boolean fastSetProperty(XMLReader parser, String property, SAXHandler contentHandler) {
+        boolean success=false;
+        Boolean haveCached=null;
+        if (fastReconfigure) {
+            haveCached=(Boolean)skipNextConfig.get(property);
+        }
+
+        // If we already have cached that we should skip this, then just skip it
+        if (fastReconfigure && haveCached!=null && haveCached.booleanValue()==true) {
+            return false;
+        }
+
+        try {
+            parser.setProperty(property, contentHandler);
+            success=true;
+        } catch (SAXNotSupportedException e) {
+            // Property not available
+        } catch (SAXNotRecognizedException e) {
+            // Property not available
+        }
+
+        // If we've never seen this property before, then
+        if (fastReconfigure && haveCached==null) {
+            skipNextConfig.put(property, new Boolean(!success));
        }
+        return success;
    }

    private void setFeaturesAndProperties(XMLReader parser,

ProcessingInstruction Map-data constrcutor should 'clone' the input map.

Currently the ProcessingInstruction keeps a reference to the 'data' Map if the Map constructor is used, or the setData(Map) method is called. This allows for broken conditions where the user may manipulate the Map directly, but the internal rawData field in the class will not be maintained correctly.

Because it is convenient for the JUnitTests of JDOM2 to use a LinkedHashMap, (for repeatability/consistency)the JDOM2 code has been revised to copy the map on input. This may break some compatibility though.... discuss.

Schema validation can miss namespaces of default attributes (Thomas Scheffler)

On Wed, Jul 20, 2011 at 8:23 AM, Thomas Scheffler
[email protected] wrote:
Hi,

if I parse a valid MODS document with XML Schema validation, JDOM changes
attributes as it handles default values of schema not correctly (by ignoring
the namespace).

Here is a short code to demonstrate this:

SAXBuilder builder = new SAXBuilder(true);
builder.setFeature("http://xml.org/sax/features/namespaces", true);
builder.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
builder.setFeature("http://apache.org/xml/features/validation/schema",
true);

Document document = builder.build(new
URL("http://academiccommons.columbia.edu/download/fedora_content/show_pretty/ac:111060/CONTENT/ac111060_description.xml"));
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(document, System.out);

Here is a result fragment:

Edwards Stephen A. author Columbia University. Computer Science

If you look at the original document you can see, that @type of name is
"personal". The "simple" comes from the xlink XML-Schema that was included
by the MODS-Schema. Therefor the result fragment should look like this:

Edwards Stephen A. author Columbia University. Computer Science

If I use DOM from Java this is done correctly (but a bit ugly as it does not
use the namespace prefix already defined).

Could someone just fix this, please?

TestSAXBuilder - test_TCU__InternalSubset fails

6a45db9

See above comment on the commit. Can't re-instate this test because it is broken. Need to investigate and add @test annotation when corrected.

SAXBuilder fails to handle certain exception cases with JAXP Properties

SAXBuilder uses JAXPParserFactory to build an XMLReader. JAXPParserFactory can set some properties on the reader, whic are set as constants on JAXPParserFactory.

When you use the JAXP mechanism and also specify these specific properties, the properties fail to be set, and throw SAX*Exception.

The JAXPParserFactory then catches these exceptions and wraps them in JDOMException.

The SAXBuilder code anticipates JDOMExceptions, and re-throws them if they happen.

Unfortunately, the actual code uses reflection to process the JAXPParserFactory, and reflection intervenes in the process, wrapping the JDOMException in an InvocationTargetException. This is handled differently by SAXBuilder, which makes the assumption that JAXP Failed, and quietly reverts back to other constant mechanism.

Thus, the bottom line is that XMLReader property problems are treated as a failed JAXP mechanism, and are lost.

Need to set Document root element before Document.getContent() works.

Why is the following condition required? Seems 'arbitrary'.

    public List getContent() {
        if (!hasRootElement())
            throw new IllegalStateException("Root element not set");
        return content;
    }

StringIndexOutOfBoundsException on Android 1.5 (S. Seide, David Keyes)

http://markmail.org/thread/ikginiuqdsdb2wbg

Hello,

using jdom 1.1.1 on android 1.5 gives an
StringIndexOutOfBoundsException. Androids default XML-Parser ExPat
returns for Attributes without namespace an emtpy qname:

<?xml version="1.0" encoding="UTF-8"?>
<config version="2009-07-23" summary="main AVF configuration file">
</config>

atts.getLocalName(i) - "version"
atts.getQName(i) - ""

This seems to be the expected behavior, since
http://www.saxproject.org/apidoc/org/xml/sax/Attributes.html#getQName(int)
says getQName returns an empty string if no qualified name is available:

getQName
public java.lang.String getQName(int index)
Look up an attribute's XML qualified (prefixed) name by index.

Parameters:
index - The attribute index (zero-based).
Returns:
The XML qualified name, or the empty string if none is available, or
null if the index is out of range.

StackTrace:

07-28 14:54:10.354: ERROR/Global(725): java.lang.StringIndexOutOfBoundsException
07-28 14:54:10.354: ERROR/Global(725): at java.lang.String.substring(String.java:1571)
07-28 14:54:10.354: ERROR/Global(725): at org.jdom.input.SAXHandler.startElement(SAXHandler.java:568)
07-28 14:54:10.354: ERROR/Global(725): at org.apache.harmony.xml.ExpatParser.startElement(ExpatParser.java:145)
07-28 14:54:10.354: ERROR/Global(725): at org.apache.harmony.xml.ExpatParser.append(Native Method)
07-28 14:54:10.354: ERROR/Global(725): at org.apache.harmony.xml.ExpatParser.parseFragment(ExpatParser.java:506)
07-28 14:54:10.354: ERROR/Global(725): at org.apache.harmony.xml.ExpatParser.parseDocument(ExpatParser.java:467)
07-28 14:54:10.354: ERROR/Global(725): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:329)
07-28 14:54:10.354: ERROR/Global(725): at org.apache.harmony.xml.ExpatReader.parse(ExpatReader.java:286)
07-28 14:54:10.354: ERROR/Global(725): at org.jdom.input.SAXBuilder.build(SAXBuilder.java:518)
07-28 14:54:10.354: ERROR/Global(725): at org.jdom.input.SAXBuilder.build(SAXBuilder.java:865)
07-28 14:54:10.354: ERROR/Global(725): at com.tlabs.avf.AVFGlobal.addInputStreams(AVFGlobal.java:538)
07-28 14:54:10.354: ERROR/Global(725): at com.tlabs.avf.AVFGlobal.initialize(AVFGlobal.java:630)
07-28 14:54:10.354: ERROR/Global(725): at com.tlabs.avf.AVFMain.init(AVFMain.java:122)
07-28 14:54:10.354: ERROR/Global(725): at com.tlabs.avf.test.AVFMainTest.onCreate(AVFMainTest.java:78)
07-28 14:54:10.354: ERROR/Global(725): at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1123)
07-28 14:54:10.354: ERROR/Global(725): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2231)
07-28 14:54:10.354: ERROR/Global(725): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2284)
07-28 14:54:10.354: ERROR/Global(725): at android.app.ActivityThread.access$1800(ActivityThread.java:112)
07-28 14:54:10.354: ERROR/Global(725): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1692)
07-28 14:54:10.354: ERROR/Global(725): at android.os.Handler.dispatchMessage(Handler.java:99)
07-28 14:54:10.354: ERROR/Global(725): at android.os.Looper.loop(Looper.java:123)
07-28 14:54:10.354: ERROR/Global(725): at android.app.ActivityThread.main(ActivityThread.java:3948)
07-28 14:54:10.354: ERROR/Global(725): at java.lang.reflect.Method.invokeNative(Native Method)
07-28 14:54:10.354: ERROR/Global(725): at java.lang.reflect.Method.invoke(Method.java:521)
07-28 14:54:10.354: ERROR/Global(725): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:782)
07-28 14:54:10.354: ERROR/Global(725): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:540)
07-28 14:54:10.354: ERROR/Global(725): at dalvik.system.NativeStart.main(Native Method)

patch attached - adding this simple check my xml document is parsed
successfully on android 1.5.

regards,
S. Seide

diff -ur src/java/org/jdom/input/SAXHandler.java src_patch/java/org/jdom/input/SAXHandler.java
--- src/java/org/jdom/input/SAXHandler.java 2009-07-28 17:46:40.703125000 +0200
+++ src_patch/java/org/jdom/input/SAXHandler.java   2009-07-28 17:31:06.546875000 +0200
@@ -564,7 +564,7 @@
            // patch from Mattias Jiderhamn
            if ("".equals(attLocalName) && attQName.indexOf(":") == -1) {
                attribute = factory.attribute(attQName, atts.getValue(i), attType);
-            } else if (!attQName.equals(attLocalName)) {
+            } else if (!attQName.equals(attLocalName) && attQName.length()>0) {
                String attPrefix = attQName.substring(0, attQName.indexOf(":"));
                Namespace attNs = Namespace.getNamespace(attPrefix,
                                                         atts.getURI(i));

Investigate removing most of the org.jdom2.adapters package

The org.jdom2.adapters package is hard to test, because it requires loading up all the various parsers (crimson, oracle v1, oracle v2, etc.) and that needs some discussion...

The adapters were from a time before JAXP. Is there any reason we'd want to keep them around?

-jh-

XMLOutputter can produce wrong result with TextMode.TRIM_FULL_WHITE

    Element root = new Element("root");
    root.addContent(new Text(" "));
    root.addContent(new Text("x"));
    root.addContent(new Text(" "));
    Format mf = Format.getRawFormat();
    mf.setTextMode(TextMode.TRIM_FULL_WHITE);
    XMLOutputter xout = new XMLOutputter(mf);
    String output = xout.outputString(root);
    assertEquals("<root> x </root>", output);

but The output is not actually:

"<root> x </root>"

it is:

"<root>x</root>"

which is wrong...

For reference, it should be identical to the output of (this works...):

    Element root = new Element("root");
    root.addContent(new Text(" x "));
    Format mf = Format.getRawFormat();
    mf.setTextMode(TextMode.TRIM_FULL_WHITE);
    XMLOutputter xout = new XMLOutputter(mf);
    String output = xout.outputString(root);
    assertEquals("<root> x </root>", output);

Element.getNamespace(String prefix) does not check attributes.

Element emt = new Element("tag");
emt.setAttribute("att", "val", Namespace.getNamespace("pfx", "nsuri'));
// This should print "nsuri", but instead throws NullPointerException
System.out.println(emt.getNamespace("pfx").getURI());

RFE: Support StAX (Tatu Saloranta)

http://markmail.org/message/pg5kkbvrdy32o6f5

Looks like Tatu Saloranta did much of the work needed:

http://docs.codehaus.org/display/WSTX/StaxMisc

ContentList.set() method changes modCount but should not

ConcurrentModificationException is supposed to reflect a 'structural' change to the List. From the ArrayList javadoc:

The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException.

set() method on List and List iterators are not considered to be 'Structural Modifications' becase they do not change the number of elements in the list.

Ths, the set() methods should not impact the 'modCount' of the ContentList.

DOMOutputter can add Document content in the wrong order

If you use DOMOutputter when the Document has a doctype, the underlying DOM implementation/adaptor can/may create a root element on that Document.
If the JDOM Document also has other content (comments or PI's), then that additional content will be added after the DOM document's element.

For example the JDOM document:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<!--This is a document--><?jdomtest?><root />

will be output as the DOM document (note the location of the root element)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root>
<root /><!--This is a document--><?jdomtest?>

org.jdom2.output.Format implements Cloneable, but clone() is not public

This is against convention.... is this intended?

RFE: In-memory validation

Ideally support Schema, DTD, and Relax NG.

Filter interface should declare negate(), and() and or().

Currently the and(), negate() and or() methods are defined on the AbstractFilter class only. Since these methods are not on the Filter interface, it makes it hard to access them. The not-so-obvious example of an issue is that it is not possible to do things like build a Filter that selects all elements except those with the name 'notthis' and 'notme'. It would be logical to have the structure:

ElementFilter notme = new ElementFilter("notme");
ElementFilter notthis = new ElementFilter("notthis");

Filter orfilter = notme.or(notthis);
Filter neitherfilter = orfilter.negate();

AndFilter and OrFilter have broken equals() and hashCode() contract

OrFilter and AndFilter should not break hashCode() and equals() contract.

For example:

ElementFilter efa = new ElementFilter("a");
ElementFilter efb = new ElementFilter("b");

OrFilter orfa = efa.or(efb);
OrFilter orfb = efb.or(efa);

// the following succeeds
assertTrue(orfa.equals(orfb));
// the following fails.
assertTrue(orfa.hashCode() == orfb.hashCode());

Re-Enable JUnit tests in AbstractTestFilter once fixed.