## Creating Multiple Language PDFs using Apache FOP

by Balaji Loganathan

This article will explain how to create PDFs in multiple languages using Apache FOP with XML and XSL. This article assumes that the reader is familiar with basic of Apache FOP, XML and XSL.

This is a 5 step process...

Step 1: Locate font.

For english you don't need to find extra fonts unless you need more style.
For other languages, you need to find either a TrueType or Type1 font.
For example for Arabic you can use the TrueType(Iqraa.ttf) font downloadable at http://www7.bev.net/civic/icb/ICB_Arabic.html
Store the specific font file in your hard disk say at C:\ folder.
Note: You have to explicitly tell to fo:block to use particular font for rendering other language data's, otherwise a ? or # symbol will appear in the generated PDF.

Step 2: Create a language resource XML file.

This XML file contains text values in various languages with an special element called "fontname" which will tell FOP what font to use for displaying the specific language text. For example Chinese text cannot be displayed using fonts like Helvetica or Arial, so we will assign specific font name for specific language.

Sample XML structure (Lets call it as Lang.xml)


<?xml version="1.0" encoding="utf-8" ?>
<Lang>
<en><!-- for english-->
<fontname>Arial</fontname>
<text1>Consignee!</text1>
</en>
<fr><!-- for french -->
<fontname>Arial</fontname>
<text1>Destinataire!</text1>
</fr>
<ar><!-- for Arabic -->
<fontname>Naqsh</fontname>
<text1>المرسل إليه</text1>
</ar>
<jp><!-- for Japanese -->
<fontname>MSGothic</fontname>
<text1>荷受人!</text1>
</jp>
<ch/> <!-- Chinese -->
</Lang>



Step 3: Configure userconfig.xml

Now read the document at http://xml.apache.org/fop/fonts.html carefully, which will explain how to add, embed a new TrueType or Type1 font for FOP to understand the input character and display it at particular font style.
For example:
To import and use the arabic font C:\Iqraa.ttf, you have to generate the metrics file first using FOP TTFReader java file, like
>java org.apache.fop.fonts.apps.TTFReader C:\Iqraa.ttf Iqraa.xml

then you have to change your userconfig.xml file like
    <font metrics-file="Iqraa.xml" kerning="yes" embed-file="C:\myfonts\Iqraa.ttf">
<font-triplet name="Iqraa" style="normal" weight="normal">
</font>

this will tell FOP how to display text with Iqraa font style for Arabic texts.

Step 4: Configure the style sheet

Configure the XSL which you will use for converting the XML in to XSL:FO and then to PDF.
In the XSL file, try to import particular language data and store it in a XSL variable
For example the below code will store the fr\text1 value in the variable "message" and the fontname to use in the variable "font".
  <xsl:variable name="message" select="document('Lang.xml')/Lang/fr/text1"/>
<xsl:variable name="font" select="document('Lang.xml')/Lang/fr/fontname"/>


Step 5: Use it

Now use this in fo:block like this
 <fo:block font-family="{$font}"><xsl:value-of select="$message"/></fo:block>
It is important to make sure that FOP.bat or FOP.sh is able to locate userconfig.xml, Iqraa.xml, Iqraa.ttf and LAng.xml
Make sure that you specify the option "-c userconfig.xml" while running the FOP
For example
>FOP -c userconfig.xml -xml InputXML.xml -xsl MultiLAng.xsl - pdf MultiLang.pdf


That's it.

With some XSL tricks you can make everything dynamic without hard coding any part.
For example: Arabic font always starts at right end which can be made dynamic by supplying some extra language specific tags in Lang.xml

The sample file MultiLang.xsl and Lang.xml can be used for your local testing, however it is important to configure the above mentioned steps for proper display of texts in PDF.You can also have a look at the generated PDF MultiLang.pdf

Now the question is UNICODE. Use XMLSPY or Visual Studio or equivalent editor to edit your Lang.xml file,
For example for to display "Consignee" in Chinese,  go to http://www.babylon.com/ copy and paste the Chinese word into the Lang.xml using the XML editor, the XML editors (like XMLSPY) will take care of encoding them to UTF8.