Java Apache PDFBox Convert Multipage Tiff To PDF

Java Apache PDFBox Convert Multipage Tiff To PDF

4 343564

tiff_to_pdfThis week I got to research and do some work on figuring out the best way to take a multipage tiff file and convert it to a PDF format. When I first started on this, I went immediately to iText as that was the only library I was familiar with. After getting a working example going, I checked out the license for the most current version of iText and realized that it cannot be used in any closed source applications without buying the license. This is what led me to begin searching for other Java PDF libraries.

I came across Apache PDFBox and saw that it had the ability to add images to the PDF, and even had a class to add a TIFF. I gave the TIFF example a try and I got a complaint about unsupported compression. After pondering this for a while, I had the thought of reading in each page of the tiff as a BufferedImage and then placing each one into the PDF as a JPG. This actually requires 2 libraries.

  1. commons-imaging
  2. pdfbox

The commons imaging can be substituted for something else if you can get a List from the Tiff. It is currently not released and is in the apache sandbox as a snapshot. Here is the maven dependency for it with the repository:


<repository> <id>apache.snapshots</id> <name>Apache Development Snapshot Repository</name> <url>https://repository.apache.org/content/repositories/snapshots/</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-imaging</artifactId> <version>1.0-SNAPSHOT</version> </dependency>

Here is the Maven dependency for pdfbox

        <dependency>
          <groupId>org.apache.pdfbox</groupId>
          <artifactId>pdfbox</artifactId>
          <version>1.8.9</version>
        </dependency>

Here is a example class that can be run against a directory of tiff images. All tiff files in the directory will be converted over to PDF.

package com.iws.export;

import java.awt.Dimension;
import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.util.Iterator;
import java.util.List;

import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;

import org.apache.commons.imaging.Imaging;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDJpeg;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;

import com.google.common.base.Stopwatch;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.io.FileChannelRandomAccessSource;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.RandomAccessFileOrArray;
import com.itextpdf.text.pdf.codec.TiffImage;

public class QueueToPdf {

public static void main(String[] args) {
try {
new QueueToPdf().generatePdfFromTifPbox(new File("/mydir"));
} catch (Exception e) {
e.printStackTrace();
}
}

public void generatePdfFromTifPbox(File dir) {
for(final File f : dir.listFiles()) {

try (
PDDocument doc = new PDDocument();
)
{

List bimages = Imaging.getAllBufferedImages(f);
for(BufferedImage bi : bimages) {
PDPage page = new PDPage();
doc.addPage( page );
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
try {
//the .08F can be tweaked. Go up for better quality, but the size of the PDF will increase
PDXObjectImage image = new PDJpeg(doc, bi, .08F);

Dimension scaledDim = getScaledDimension(new Dimension(image.getWidth(), image.getHeight()), page.getMediaBox().createDimension());
contentStream.drawXObject(image, 1, 1, scaledDim.width, scaledDim.height);
} finally {
contentStream.close();
}

}

doc.save( f.getAbsolutePath() + ".pdf");

} catch (Exception e) {
e.printStackTrace();
}
}
}

//taken from a stack overflow post http://stackoverflow.com/questions/23223716/scaled-image-blurry-in-pdfbox
//Thanks Gyo!
private Dimension getScaledDimension(Dimension imgSize, Dimension boundary) {
int original_width = imgSize.width;
int original_height = imgSize.height;
int bound_width = boundary.width;
int bound_height = boundary.height;
int new_width = original_width;
int new_height = original_height;

// first check if we need to scale width
if (original_width > bound_width) {
//scale width to fit
new_width = bound_width;
//scale height to maintain aspect ratio
new_height = (new_width * original_height) / original_width;
}

// then check if we need to scale even with the new height
if (new_height > bound_height) {
//scale height to fit instead
new_height = bound_height;
//scale width to maintain aspect ratio
new_width = (new_height * original_width) / original_height;
}

return new Dimension(new_width, new_height);
}

}

The only complaint I have about this solution is that it is doubling the size of the tiff at the currently JPG quality level. I was impressed that when I first tried iText, the PDF actually ended up being slightly smaller than the tiff file I was converting. I am not sure if there is some other way the jpeg could be compressed, or if it is just the tiff compression being better than jpeg that is doing it. Please feel free to share your thoughts, or other solutions.

4 COMMENTS

  1. Thanks! I converted it to PDFBox 2.0 release candidate, removed your old itextpdf imports and cleaned up the source code a bit. This converts one TIFF multipage file (which you might have if you receive a fax, test.tif in this example) to a PDF file (test.tif.pdf) :

    import java.awt.Dimension;
    import java.awt.image.BufferedImage;
    import java.io.File;
    import java.util.List;

    import org.apache.commons.imaging.Imaging;
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.pdmodel.PDPage;
    import org.apache.pdfbox.pdmodel.PDPageContentStream;
    import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
    import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

    public class Tiff2pdf {

    public static void main(String[] args) throws Exception {
    new Tiff2pdf().generatePdfFromTifPbox(new File("test.tif"));
    }

    public void generatePdfFromTifPbox(File file) throws Exception {
    PDDocument doc = new PDDocument();
    List<BufferedImage> bimages = Imaging.getAllBufferedImages(file);
    for (BufferedImage bi : bimages) {
    PDPage page = new PDPage();
    doc.addPage(page);
    PDPageContentStream contentStream = new PDPageContentStream(doc, page);
    try {
    // the .08F can be tweaked. Go up for better quality,
    // but the size of the PDF will increase
    PDImageXObject image = JPEGFactory.createFromImage(doc, bi, 0.08f);
    Dimension scaledDim = getScaledDimension(new Dimension(image.getWidth(), image.getHeight()),
    new Dimension((int) page.getMediaBox().getWidth(), (int) page.getMediaBox().getHeight()));
    contentStream.drawImage(image, 1, 1, scaledDim.width, scaledDim.height);
    } finally {
    contentStream.close();
    }
    }
    doc.save(file.getAbsolutePath() + ".pdf");
    }

    // taken from a stack overflow post
    // http://stackoverflow.com/questions/23223716/scaled-image-blurry-in-pdfbox
    // Thanks Gyo!
    private Dimension getScaledDimension(Dimension imgSize, Dimension boundary) {
    int original_width = imgSize.width;
    int original_height = imgSize.height;
    int bound_width = boundary.width;
    int bound_height = boundary.height;
    int new_width = original_width;
    int new_height = original_height;

    // first check if we need to scale width
    if (original_width > bound_width) {
    // scale width to fit
    new_width = bound_width;
    // scale height to maintain aspect ratio
    new_height = (new_width * original_height) / original_width;
    }

    // then check if we need to scale even with the new height
    if (new_height > bound_height) {
    // scale height to fit instead
    new_height = bound_height;
    // scale width to maintain aspect ratio
    new_width = (new_height * original_width) / original_height;
    }

    return new Dimension(new_width, new_height);
    }

    }

  2. This was very helpful! I appreciate both Paul and Frank who posted the PDFBox 2.0 version. It was a great boost to me. Thank you!

  3. You do not have to encode the Tiff images as Jpeg Images – simpy copy the the src Images to the PDF, this is a lot more eficent. (400ms instead of 4 Seconds)

    Improved Version:
    public class Tiff2pdf {

    public static void main(String[] args) throws Exception {
    generatePdfFromTifPbox(new File("test.tif"));
    }

    public static void generatePdfFromTifPbox(File file) throws Exception {

    PDDocument doc = new PDDocument();
    log.info("Read Image");
    RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");

    log.info("Process Image parts");

    int pageNo = 0;
    while(!randomAccessFile.isEOF()) {
    log.info("Add Page to PDF");
    PDPage page = new PDPage();
    doc.addPage(page);
    PDPageContentStream contentStream = new PDPageContentStream(doc, page);
    try {
    log.info("Render Image ");
    @SuppressWarnings("deprecation")
    PDImageXObject image = CCITTFactory.createFromRandomAccess(doc, randomAccessFile, pageNo);
    pageNo++;

    log.info("Scale and draw Image");
    Dimension scaledDim = getScaledDimension(new Dimension(image.getWidth(), image.getHeight()),
    new Dimension((int) page.getMediaBox().getWidth(), (int) page.getMediaBox().getHeight()));
    contentStream.drawImage(image, 1, 1, scaledDim.width, scaledDim.height);
    } finally {
    contentStream.close();
    }
    }

    log.info("Write target File ...");
    File targetFile = new File(file.getParent(), FilenameUtils.getBaseName(file.getName()) + ".pdf");
    targetFile.delete();
    doc.save(targetFile);
    doc.close();
    log.info("Write target File ... done.");
    }

    // taken from a stack overflow post
    // http://stackoverflow.com/questions/23223716/scaled-image-blurry-in-pdfbox
    // Thanks Gyo!
    private static Dimension getScaledDimension(Dimension imgSize, Dimension boundary) {
    int original_width = imgSize.width;
    int original_height = imgSize.height;
    int bound_width = boundary.width;
    int bound_height = boundary.height;
    int new_width = original_width;
    int new_height = original_height;

    // first check if we need to scale width
    if (original_width > bound_width) {
    // scale width to fit
    new_width = bound_width;
    // scale height to maintain aspect ratio
    new_height = (new_width * original_height) / original_width;
    }

    // then check if we need to scale even with the new height
    if (new_height > bound_height) {
    // scale height to fit instead
    new_height = bound_height;
    // scale width to maintain aspect ratio
    new_width = (new_height * original_width) / original_height;
    }

    return new Dimension(new_width, new_height);
    }

    }

Leave a Reply