3 Comments

imageThere are a lot of tools available online to extract images from a PDF, but most of them are shareware or trialware. If you need just a single image, you can right click it in Adobe acrobat reader and copy paste it into Microsofts paint, Paint.net or (overkill) Adobe Photoshop. But if you have a PDF with several pages and several images on each page, you’d like to have it automated. That’s when you start your search for a good free/low cost utility. Or, you can write your own! Much more fun guaranteed! [more]

Getting started

It’s easier then you might think. Start Visual Studio 11 and file –> new project (or ctrl + shift + n) I have selected .Net Framework 4, because not all pc’s which will use this utility have .Net 4.5 installed.

image

The user interface of the utility

My WinForm design looks like this:

image

So the UI ‘flow’ is a top down, simple thing.

When you fire up the utility, it only shows the top button which opens a openFileDialog and after hitting the OK button of that dialog, it makes the next button visible. That second button displays a folderBrowserDialog and after that OK button, it shows the groupbox with the two radiobuttons in it with by default one radio checked and the final button which actually starts the process.

The code behind

My first step was to get a free and awesome PDF library in the project. It is called iTextSharp and it is on NuGet http://nuget.org/packages/iTextSharp. So right click on the project and select Manage NuGet Packages

image

and search for iTextSharp and hit install

image

After that I used some code from kuunjinbo which I found here on stackoverflow: http://stackoverflow.com/a/8511314/169714 which uses an implementation of the IRenderListener interface.

What the code actually does is parse the PDF file and itterate through all the parts and checks if it is an images. If it is an image, the code adds it to the string list and the image itself as byte array to a list. You can save the image as the original extension (jpg/png etc.) or convert it if you require only jpg or only png. I have added support for both conversions in the code below.

private void ConvertPngToJpg(int i, MyImageRenderListener listener)
{
    System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
    parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
    
    System.Drawing.Imaging.ImageCodecInfo jpegEncoder = ImageCodecInfo.GetImageEncoders().Single(p => p.CodecName.Contains("JPEG"));
    System.Drawing.Image img = System.Drawing.Image.FromStream(new MemoryStream(listener.Images[i]));
    string path = label2.Text + "\\" + listener.ImageNames[i].ToLower().Replace(".png", ".jpg");
    img.Save(path, jpegEncoder, parms);
}


Selecting an ImageCodecInfo object can be done by a for (or foreach) through the GetImageEncoders array, or you can use a line of Linq like this:

System.Drawing.Imaging.ImageCodecInfo jpegEncoder = 
ImageCodecInfo.GetImageEncoders().Single(p => p.CodecName.Contains("JPEG"));


Please forgive me with the form element names: button1, button2 label1, label2 etc. Here is the full sourcecode:

Good luck!

kick it on DotNetKicks.com  Shout it

Pin on pinterest Plus on Googleplus Post on LinkedIn

Comments

Comment by Avinash

Cool .. :D

Good tool, Regards from India :)

Avinash
Comment by DotNetKicks.com

Extract jpg or png images from a PDF

You've been kicked (a good thing) - Trackback from DotNetKicks.com