. it is simple to do simple XML processing tasks :)
. it is simple to have the XML processor stored in a single variable
(see example 4)
. it is simple to translate XML -> perl user controled complex structure
with a compact "-type" definition (see last section)
Feedback wellcome -> jj@di.uminho.pt
. based on XML::Parser (tree mode).
. design to do simple and compact translation/processing of XML document
. it includes some features of omnimark and sgmls.pm; functional approach
. it includes functions to automatic build user controled complex perl
structures (see "working with structures" section)
. it was build to show my NLP perl students that it is easy to work with XML
. home page and download: http://www.di.uminho.pt/~jj/perl/XML/DT.html
. the user must define a handler and call the basic function :
dt($filename,%handler) or dtstring($string,%handler)
. the handler is a HASH mapping element names to functions. Handlers can
have a "-default" function , and a "-end" function
. in order to make it smaller each function receives 3 args as global variables
$c - contents
$q - element name
%v - attribute values
. the default "-default" function is the identity. The function "toxml" makes
the original xml text based on $c, $q and %v values.
. see some advanced features in the last examples
INDEX:
1. change to lowercase attribute named "a" in element "e"
2. better solution
3. make some statistics and output results in HTML (using side effects)
4. In a HTML like XML document, substitute <contents/>...<contents> by the
real table of contents (a dirty solution...)
5. a more realistic example: from XML gcapaper DTD to latex
WORKING WITH STRUTURES INSTEAD OF STRINGS...
6. Build the natural perl structure of the following document (ARRAY,HASH) 7. Multi map on...
use XML::DT ;
my $filename = shift;
print dt($filename,
( e => sub{ "<e a='". lc($v{a}). "'>$c</e>" }));
print dt($filename,
( e => sub{ $v{a} = lc($v{a});
toxml();}));
use XML::DT ; my $filename = shift;
%handler=( -default => sub{$elem_counter++;
$elem_table{$q}++;"";} # $q -> element name
);
dt($filename,%handler);
print "<H3>We have found $elem_counter elements in document</H3>";
print "<TABLE><TH>ELEMENT<TH>OCCURS\n";
foreach $elem (sort keys %elem_table)
{print "<TR><TD>$elem<TD>$elem_table{$elem}\n";}
print "</TABLE>";
%handler=( h1 => sub{ $index .= "\n$c"; toxml();},
h2 => sub{ $index .= "\n\t$c"; toxml();},
h3 => sub{ $index .= "\n\t\t$c"; toxml();},
contents => sub{ $c="__CLEAN__"; toxml();},
-end => sub{ $c =~ s/__CLEAN__/$index/; $c});
print dt($filename,%handler)
. "TITLE" is processed in context dependent way!
. output in ISOLATIN1 (this is dirty but my LaTeX doesn't support UNICODE)
. a stack of authors was necessary because LaTeX structure was different
from input structure...
. this example was partially created by the function mkdtskel
perl -MXML::DT -e 'mkdtskel "f.xml"' > f.pl
and took me about one hour to tune to real LaTeX/XML example.
NAME gcapaper2tex.pl - a perl script to translate XML gcapaper DTD to latex
SYNOPSIS gcapaper2tex.pl mypaper.xml > mupaper.tex
use XML::DT ;
my $filename = shift;
my $beginLatex = '\documentclass{article} \begin{document} ';
my $endLatex = '\end{document}';
%handler=(
'-outputenc' => 'ISO-8859-1',
'-default' => sub{"$c"},
'RANDLIST' => sub{"\\begin{itemize}$c\\end{itemize}"},
'AFFIL' => sub{""}, # delete affiliation
'TITLE' => sub{
if(inctxt('SECTION')){"\\section{$c}"}
elsif(inctxt('SUBSEC1')){"\\subsection{$c}"}
else {"\\title{$c}"}
},
'GCAPAPER' => sub{"$beginLatex $c $endLatex"},
'PARA' => sub{"$c\n\n"},
'ADDRESS' => sub{"\\thanks{$c}"},
'PUB' => sub{"} $c"},
'EMAIL' => sub{"(\\texttt{$c}) "},
'FRONT' => sub{"$c\n"},
'AUTHOR' => sub{ push @aut, $c ; ""},
'ABSTRACT' => sub{
sprintf('\author{%s}\maketitle\begin{abstract}%s\end{abstract}',
join ('\and', @aut) ,
$c) },
'CODE.BLOCK' => sub{"\\begin{verbatim}\n$c\\end{verbatim}\n"},
'XREF' => sub{"\\cite{$v{REFLOC}}"},
'LI' => sub{"\\item $c"},
'BIBLIOG' =>sub{"\\begin{thebibliography}{1}$c\\end{thebibliography}\n"},
'HIGHLIGHT' => sub{" \\emph{$c} "},
'BIO' => sub{""}, #delete biography
'SURNAME' => sub{" $c "},
'CODE' => sub{"\\verb!$c!"},
'BIBITEM' => sub{"\n\\bibitem{$c"},
);
print dt($filename,%handler);
the "-type" definition defines the way to build strutures in each case:
. "HASH" or "MAP" -> make an hash with the subelements;
keys are the subelement names; warn on repetitions;
returns the hash reference.
. "ARRAY" or "SEQ" -> make an ARRAY with the subelements
returns an array reference.
. "MULTIMAP" -> makes an HASH of ARRAY; keys are the sub-element
. MMAPON(name1, ...) -> similar to HASH but accepts repetitions of
the subelements "name1"... (and makes an array with them)
. STR ->(DEFAULT) concatenates all the subelements returned values
all the subelement sould return strings to be concatenated
<institution>
<id>U.M.</id>
<name>University of Minho</name>
<tels>
<item>1111</item>
<item>1112</item>
<item>1113</item>
</tels>
<where>Portugal</where>
<contacts>J.Joao; J.Rocha; J.Ramalho</contacts>
</institution>
use XML::DT;
%handler = ( -default => sub{$c},
-type => { institution => 'HASH',
tels => 'ARRAY' },
contacts => sub{ [ split(";",$c)] },
);
$a = dt("ex10.2.xml", %handler);
$a is a ref to an HASH:
{ 'tels' => [ 1111, 1112, 1113 ],
'name' => 'University of Minho',
'where' => 'Portugal',
'id' => 'U.M.',
'contacts' => [ 'J.Joao', ' J.Rocha', ' J.Ramalho' ] };
<people>
<person>
<name> name0 </name>
<address> address00 </address>
<address> address01 </address>
</person>
<person>
<name> name1 </name>
<address> address10 </address>
<address> address11 </address>
</person>
</people>
Now we are going to build a structure to store the address book and write a Christmas card to the first address of everyone
#!/usr/bin/perl
use XML::DT;
%handler = ( -default => sub{$c},
person => sub{ mkchristmascard($c); $c},
-type => { people => 'ARRAY',
person => MMAPON('address')});
$people = dt("ex11.1.xml", %handler);
print $people->[0]{address}[1]; # prints address01
sub mkchristmascard{ my $x=shift;
open(A,"|lpr") or die;
print A <<".";
$x->{name}
$x->{address}[0]
Dear $x->{name}
Merry Christmas from Braga perl mongers\n
.
close A; }