Wednesday, January 29, 2014

DOM4J: Remove duplicate xml elements from xml

Today i will explain about handling duplicate xml elements of xml.

I think you are little bit aware with DOM4J API from Apache Foundation.

DOM4J API basically is for playing with XML. Sometime your facing the issue that how to get distinct element list from the given xml. To achive this we iterate whole document and get required list. But same thing can be achive by given DOM4J API.

You need to have a latest copy of dom4j.jar. You can easily download it from Sourceforge

Here i will show you the practical example of the same.

//XML Document
<?xml version="1.0" encoding="UTF-8"?>
<document>
<element>
<id>1</id>
</element>
<element>
<id>1</id>
</element>
<element>
<id>2</id>
</element>
<element>
<id>3</id>
</element>
<element>
<id>2</id>
</element>
<element>
<id>6</id>
</element>
<element>
<id>5</id>
</element>
<element>
<id>3</id>
</element>
<element>
<id>4</id>
</element>
<element>
<id>4</id>
</element>
</document>

Now create one class file and add below code snippet in your class
String xmlDoc = ""; //Add code which reads the xml input file from your file location

//Creating org.dom4j.Document objet from xml content
Document document = DocumentHelper.parseText(xmlDoc);

//Now creating distinct list from the created xml ducument with below code snippet

List<Node> nodes = document.selectNodes("//document/element", "id", true);

/*
* XPath for distinct element : "//document/element"
* Element name on which nodes will be compare: "id"
* Boolean flag which indicates to remove duplicate nodes or not.
**/