|  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
 | | | |  |  |  |  |  |  Building Xerces-C++ with ICU |  |  |  |  |  | 
 |  |  | Xerces-C++ may be built in stand-alone mode using
        native encoding support and also using ICU where you get support over 180
        different encodings and/or locale specific message support. ICU stands for
        International Components for Unicode and is an open source distribution from
        IBM. You can get
        ICU libraries from
        IBM's developerWorks site
        or go to the ICU
        download page
        directly. |  | Important: Please remember that ICU and
        Xerces-C++ must be built with the same compiler,
        preferably with the same version. You cannot for example,
        build ICU with a threaded version of the xlC compiler and
        build Xerces-C++ with a non-threaded one. | 
 |  |  |  | There are two options to build Xerces-C++ with ICU on Windows. One is to use the
        MSDEV GUI environment, and the other is to invoke the compiler from the
        command line. Using, the GUI environment, requires one to edit the project files.
        Here, we will describe only the second option. It involves using the
        perl script 'packageBinaries.pl'. Prerequisites: 
        Perl 5.004 or higherCygwin tools or MKS Toolkitzip.exe Extract Xerces-C++ source files from the .zip archive using WinZip, say
        in the root directory (an arbitrary drive x:). It should create a directory like
        'x:\xerces-c-src2_6_0'. Extract the ICU files, using WinZip, in root directory of the disk
        where you have installed Xerces-C++, sources. After extraction, there
        should be a new directory 'x:\icu' which contains all the ICU
        source files. Start a command prompt to get a new shell window. Make sure you have
        perl, cygwin tools (uname,rm,cp, ...), andzip.exesomewhere in the
        path. Next setup the environment for MSVC using
        'VCVARS32.BAT' or a similar file. Then at the prompt
        enter: |  |  |  |  |  | set XERCESCROOT=x:\xerces-c-src2_6_0
set ICUROOT=x:\icu
cd x:\xerces-c-src2_6_0\scripts
 |  |  |  |  |  | 
To build with ICU, either specify using ICU transcoding service, |  |  |  |  |  | 
perl packageBinaries.pl -s x:\xerces-c-src2_6_0 -o x:\temp\xerces-c2_6_0-win32 -t icu
 |  |  |  |  |  | 
 or specify using ICU message loader service |  |  |  |  |  | 
perl packageBinaries.pl -s x:\xerces-c-src2_6_0 -o x:\temp\xerces-c2_6_0-win32 -m icu
 |  |  |  |  |  | 
(Match the source directory to your system; the target directory can be
        anything you want.) If everything is setup right and works right, then you should see a
        binary drop created in the target directory specified above. This script
        will build both ICU and Xerces-C++, and copy the files (relevant to the binary
        drop) to the target directory. If the parser is built with icu message loader (as mentioned above), or message
         catalog loader, you need an environment variable, XERCESC_NLS_HOME to point to
         the directory, $XERCESCROOT/msg, where the message files reside.
         For a description of options available, you can enter: | 
 
 |  |  |  | Extract Xerces-C++ source files into, say, the home directory ($HOME).
        It should create a directory like '$HOME/xerces-c-src2_6_0'. Extract the ICU files into the same directory
        where you have installed Xerces-C++ sources. After extraction, there
        should be a new directory '$HOME/icu' which contains all the ICU
        source files. Build the ICU according to the
        ICU Build instruction in ICU Readme.   Then have its dll, libicuuc*andlibicudt*available from your library search path. Then build the Xerces-C++ with ICU.  This is similar to building a standalone
        Xerces-C++ library as instructed in 
        "Building Xerces-C++ on UNIX platforms"; except that you have to specify
        the transcoder option '-t icu'and/or the message loader option'-m icu'.  For example: |  |  |  |  |  | runConfigure -plinux -cgcc -xg++ -minmem -nsocket -ticu -rpthread |  |  |  |  |  | 
Or instead of building the ICU and Xerces-C++ manually in two steps,
        you can use the bundled perl script 'packageBinaries.pl' which
        will build both of them in one step.  For example: |  |  |  |  |  | export XERCESCROOT=$HOME/xerces-c-src2_6_0
export ICUROOT=$HOME/icu
cd $HOME/xerces-c-src2_6_0/scripts
 |  |  |  |  |  | 
To build with ICU, either specify using ICU transcoding service, |  |  |  |  |  | 
perl packageBinaries.pl -s $HOME/xerces-c-src2_6_0 -o $HOME/temp/xerces-c2_6_0-aix -t icu
 |  |  |  |  |  | 
 or specify using ICU message loader service |  |  |  |  |  | 
perl packageBinaries.pl -s $HOME/xerces-c-src2_6_0 -o $HOME/temp/xerces-c2_6_0-aix -m icu
 |  |  |  |  |  | 
If the parser is built with icu message loader (as mentioned above), or message
         catalog loader, you need an environment variable, XERCESC_NLS_HOME to point to
         the directory, $XERCESCROOT/msg, where the message files reside.
         | 
 
 | 
 
 
 
 
 
 | |  |  |  |  |  |  What should I define XMLCh to be? |  |  |  |  |  | 
 |  |  | XMLCh should be defined to be a type suitable for holding a
           utf-16 encoded (16 bit) value, usually an unsigned short. All XML data is handled within Xerces-C++ as strings of
           XMLCh characters.  Regardless of the size of the
           type chosen, the data stored in variables of type XMLCh
           will always be utf-16 encoded values.  Unlike XMLCh, the  encoding
               of wchar_t is platform dependent.  Sometimes it is utf-16
               (AIX, Windows), sometimes ucs-4 (Solaris,
               Linux), sometimes it is not based on Unicode at all
               (HP/UX, AS/400, system 390).   Some earlier releases of Xerces-C++ defined XMLCh to be the
           same type as wchar_t on most platforms, with the goal of making
           it possible to pass XMLCh strings to library or system functions
           that were expecting wchar_t parameters.  This approach has
           been abandoned because of 
              
                 Portability problems with any code that assumes that
                 the types of XMLCh and wchar_t are compatible
              Excessive memory usage, especially in the DOM, on
                  platforms with 32 bit wchar_t.
              utf-16 encoded XMLCh is not always compatible with
                  ucs-4 encoded wchar_t on Solaris and Linux.  The
                  problem occurs with Unicode characters with values
                  greater than 64k; in ucs-4 the value is stored as
                  a single 32 bit quantity.  With utf-16, the value
                  will be stored as a "surrogate pair" of two 16 bit
                  values.  Even with XMLCh equated to wchar_t, xerces will
                  still create the utf-16 encoded surrogate pairs, which
                  are illegal in ucs-4 encoded wchar_t strings.
                | 
 
 
 | 
 |