For image-based watermarks, there are several tools that promise their automatic removal. For example:
All of these are free to try, but require a license to actually produce the desired output.
However, the watermark of this specific PDF file isn't a single image that is repeated on all pages. As it turns out, PDFCreator hardcoded it (almost pixel by pixel) into every single one of them. This makes the watermark much more difficult to remove (and results in a rather bloated PDF file).
Since the watermark is actually composed of many tiny images, you can remove them with a PDF editor (e.g., Foxit Advanced PDF Editor), simply by selecting them and pressing Delete. Unfortunately, you have to repeat this for every page.
A less time-consuming solution would be to "manually" remove the watermark. We need:
http://superuser.com Dennis
All of these are free to try, but require a license to actually produce the desired output.
However, the watermark of this specific PDF file isn't a single image that is repeated on all pages. As it turns out, PDFCreator hardcoded it (almost pixel by pixel) into every single one of them. This makes the watermark much more difficult to remove (and results in a rather bloated PDF file).
Since the watermark is actually composed of many tiny images, you can remove them with a PDF editor (e.g., Foxit Advanced PDF Editor), simply by selecting them and pressing Delete. Unfortunately, you have to repeat this for every page.
A less time-consuming solution would be to "manually" remove the watermark. We need:
- A tool to (un)compress and fix PDF streams (Pdftk).
- A text editor capable of replacing regular expressions (Notepad++, version 6.0 or higher).
Steps
- Download Pdftk and extract
pdftk.exe
andlibiconv2.dll
to%windir%\System32
, a directory in the path or any other location of your choice. - Download and install Notepad++.
- PDF streams are usually compressed using the DEFLATE algorithm. This saves space, but it makes the PDF file illegible.
The command
uncompresses all streams, so they can be modified by a text editor.pdftk original.pdf output uncompressed.pdf uncompress
- Open
uncompressed.pdf
with Notepad++ to reveal the structure of the watermark.
In this specific case, every page begins with the block
and nearly 4,000 blocks just like this one. This particular block sets only one (q 9 0 0 9 2997 4118.67 cm BI /CS/RGB /W 1 /H 1 /BPC 8 ID Ÿ®¼ EI Q
/W 1 /H 1
) of the watermark's pixels.
Scrolling down until the pattern changes reveals that the watermark's stream is 95,906 bytes long (counting newlines). The exact same stream is repeated on every page of the PDF file. - Press Ctrl + H and set the following:
The regular expressionFind: q 9 0 0 9 2997 4118\.67 cm.{95881} Replace: (blank) Match case: checked Wrap around: checked Regular expression: selected . matches newline: checked
q 9 0 0 9 2997 4118\.67 cm.{95881}
matches the first line of the above block (q 9 0 0 9 2997 4118.67 cm
) and all following 95,881 characters, i.e., the watermark's stream.
ClickingReplace All
removes it from all pages of the PDF file. - The watermark has now been removed, but the PDF file has errors (the streams' lengths are incorrect) and it's uncompressed.
The command
takes care of both.pdftk uncompressed.pdf output nowatermark.pdf compress
uncompressed.pdf
is no longer needed. You can delete it.
http://superuser.com Dennis
I've got the uncompressed streams open in Notepad++, but i can't find anything that looks like the watermark code. (it's a repeating watermark, sort of "underneath the main picture".
ReplyDeleteAny ideas?