<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for jduck.net</title>
	<atom:link href="http://jduck.net/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://jduck.net</link>
	<description></description>
	<lastBuildDate>Tue, 13 Jul 2010 07:51:27 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>Comment on Scanning with sane&#8217;s scanimage from an ADF scanner to PDF and OCRed Text by Gerry Dobles</title>
		<link>http://jduck.net/2008/01/05/ocr-scanning/comment-page-1/#comment-4972</link>
		<dc:creator>Gerry Dobles</dc:creator>
		<pubDate>Tue, 13 Jul 2010 07:51:27 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2008/01/05/ocr-scanning/#comment-4972</guid>
		<description>Carlotta Lantz</description>
		<content:encoded><![CDATA[<p>Carlotta Lantz</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Histogram of an ASCII Raster by pyther</title>
		<link>http://jduck.net/2007/03/06/histogram-of-an-ascii-raster/comment-page-1/#comment-4846</link>
		<dc:creator>pyther</dc:creator>
		<pubDate>Fri, 02 Jul 2010 11:14:26 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2007/03/06/histogram-of-an-ascii-raster/#comment-4846</guid>
		<description>cool piece of code.
There is a minor bug in the following line:
row = row[1:len(row)-1]

That should work:
row = row[:len(row)-1]
Otherwise the first col gets lost.</description>
		<content:encoded><![CDATA[<p>cool piece of code.<br />
There is a minor bug in the following line:<br />
row = row[1:len(row)-1]</p>
<p>That should work:<br />
row = row[:len(row)-1]<br />
Otherwise the first col gets lost.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Scanning with sane&#8217;s scanimage from an ADF scanner to PDF and OCRed Text by Vi Radish</title>
		<link>http://jduck.net/2008/01/05/ocr-scanning/comment-page-1/#comment-4808</link>
		<dc:creator>Vi Radish</dc:creator>
		<pubDate>Sat, 26 Jun 2010 11:44:45 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2008/01/05/ocr-scanning/#comment-4808</guid>
		<description>Hai friend Good afternoon.I am agree with your blog article about that but i found something strange that i felt you did not know the main problem before you posting this so i want to ask :where do you know about this dude ? Best Regard owner of cookingvideo.org  Good bye</description>
		<content:encoded><![CDATA[<p>Hai friend Good afternoon.I am agree with your blog article about that but i found something strange that i felt you did not know the main problem before you posting this so i want to ask :where do you know about this dude ? Best Regard owner of cookingvideo.org  Good bye</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Scanning with sane&#8217;s scanimage from an ADF scanner to PDF and OCRed Text by links for 2010-05-28 &#171; Where Is All This Leading To?</title>
		<link>http://jduck.net/2008/01/05/ocr-scanning/comment-page-1/#comment-4548</link>
		<dc:creator>links for 2010-05-28 &#171; Where Is All This Leading To?</dc:creator>
		<pubDate>Sat, 29 May 2010 00:32:51 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2008/01/05/ocr-scanning/#comment-4548</guid>
		<description>[...] Scanning with sane’s scanimage from an ADF scanner to PDF and OCRed Text &#124; jduck.net (tags: and bash network scanning ocr opensource adf paperless pdf sane scan linux imaging hacks hack from cli ubuntu howto) [...]</description>
		<content:encoded><![CDATA[<p>[...] Scanning with sane’s scanimage from an ADF scanner to PDF and OCRed Text | jduck.net (tags: and bash network scanning ocr opensource adf paperless pdf sane scan linux imaging hacks hack from cli ubuntu howto) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Getting to know PostGIS Part II by Getting to know PostGIS &#124; jduck.net</title>
		<link>http://jduck.net/2009/01/30/getting-to-know-postgis-part-ii/comment-page-1/#comment-3945</link>
		<dc:creator>Getting to know PostGIS &#124; jduck.net</dc:creator>
		<pubDate>Fri, 16 Apr 2010 19:11:44 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/?p=131#comment-3945</guid>
		<description>[...] Onward to Part II&#8230; [...]</description>
		<content:encoded><![CDATA[<p>[...] Onward to Part II&#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Getting to know PostGIS by Getting to know PostGIS Part II &#124; jduck.net</title>
		<link>http://jduck.net/2007/11/06/getting-to-know-postgis/comment-page-1/#comment-3944</link>
		<dc:creator>Getting to know PostGIS Part II &#124; jduck.net</dc:creator>
		<pubDate>Fri, 16 Apr 2010 18:50:16 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2007/11/06/getting-to-know-postgis/#comment-3944</guid>
		<description>[...] its been over six months since I made my first tutorial post about PostGIS. I now use PostGIS on a regular basis and thought it would be good to update the [...]</description>
		<content:encoded><![CDATA[<p>[...] its been over six months since I made my first tutorial post about PostGIS. I now use PostGIS on a regular basis and thought it would be good to update the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Scanning with sane&#8217;s scanimage from an ADF scanner to PDF and OCRed Text by Jonah</title>
		<link>http://jduck.net/2008/01/05/ocr-scanning/comment-page-1/#comment-3472</link>
		<dc:creator>Jonah</dc:creator>
		<pubDate>Mon, 22 Feb 2010 21:48:18 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2008/01/05/ocr-scanning/#comment-3472</guid>
		<description>Thanks for the contribution Joe.  I no longer have a scanner with an ADF, but I&#039;m happy to see this is the most popular post on my seldom updated blog.  I&#039;ve added your code as a revision to my original script as a &lt;a href=&quot;http://gist.github.com/311548&quot; rel=&quot;nofollow&quot;&gt;gist&lt;/a&gt; at github for others to work with and fork as they please.</description>
		<content:encoded><![CDATA[<p>Thanks for the contribution Joe.  I no longer have a scanner with an ADF, but I&#8217;m happy to see this is the most popular post on my seldom updated blog.  I&#8217;ve added your code as a revision to my original script as a <a href="http://gist.github.com/311548" rel="nofollow">gist</a> at github for others to work with and fork as they please.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Scanning with sane&#8217;s scanimage from an ADF scanner to PDF and OCRed Text by Joe</title>
		<link>http://jduck.net/2008/01/05/ocr-scanning/comment-page-1/#comment-3467</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Sun, 21 Feb 2010 12:37:09 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2008/01/05/ocr-scanning/#comment-3467</guid>
		<description>Hi,
thanks for this nice Script.
I&#039;ve done some changes and improvements:

- adding Format to the scanimage batch option (--batch=out%02d.tif)
- compress with zip when called tiff2pdf 
- added ImageMagick Image enhancement (little more contrast) and two-bit Tiff (scans in Gray and reduces colors to 4)

Scanning in Gray have no decrease in speed on my HP 5590.
I&#039;ve also added Posibility to scan more than one dokument (page sequence) with the ADF.
When you have onle one page to scan, its faster not to use the ADF.

So you can call scan2pdf:
scan2pdf myDocument -&gt; scans one page, without the adf (saved as myDocument.pdf)
scan2pdf 99 myDocument -&gt; uses ADF to scan into myDocument.pdf
scan2pdf 3,8,2 myDocument -&gt; uses ADF and scans 3 page sequences, files are saved to myDocument.01.pdf (3 pages), myDocument.02.pdf (8 pages) und myDocument.03.pdf (2 pages)

Maybe someone like it:
[code]

#!/bin/sh

SOURCE=&quot;&quot;

if [ $# -gt 1 ]
then

  SOURCE=&quot;--source ADF -l 3&quot;
  outname=$2
  pbreak=$1

  echo &quot;$pbreak&quot; &#124; egrep &quot;[^0-9,]+&quot;
  if [ $? -ne 1 ]
  then
    echo &quot;Check Sequnence List !&quot;
    exit 1
  fi
else

  pbreak=99
  outname=$1
  SOURCE=&quot;--batch-count=1&quot;

fi

startdir=$(pwd)
tmpdir=scan-$RANDOM

cd /tmp
mkdir $tmpdir
cd $tmpdir
echo &quot;################## Scanning ###################&quot;
scanimage -x 210 -y 297 --batch=out%02d.tif --format=tiff --mode Gray --resolution 300 $SOURCE

start=1
cnt=1
sc=$(echo &quot;$pbreak&quot; &#124; cut -d&quot;,&quot; -f1-99 --output-delimiter=&quot; &quot; &#124; wc -w)
for pb in $(echo &quot;$pbreak&quot; &#124; cut -d &quot;,&quot; -f1-99 --output-delimiter=&quot; &quot;)
do
    ende=$(expr $start + $pb - 1)
    pnr=0
    i=1
    echo &quot;############ Page-Sequence ($cnt), Pages: $pb, Start: $start, End: $ende ############&quot;
    tpages=&quot;&quot;
    for page in $(ls out*.tif); do
	pnr=$(expr $pnr + 1)
	if [ $pnr -ge $start -a $pnr -le $ende ]
	then
	    echo &quot;... Converting&quot;
	    # increase contrast and reduce colordepth 
	    convert $page -level 15%,85% -depth 2 &quot;b$page&quot; 
	    echo &quot;... OCRing&quot;
	    tpages=&quot;$tpages b$page&quot;
	    i=$(expr $i + 1)
	    echo -n &quot;    &quot;
            tesseract $page $page -l deu
            if [ $sc -gt 1 ]
            then
        	cnts=`printf %02d $cnt`
    		cat $page.txt &gt;&gt; $outname.$cnts.txt
    	    else
    		cat $page.txt &gt;&gt; $outname.txt
    	    fi

	fi
    done

    echo &quot;... Converting to PDF&quot;
    #Use tiffcp to combine output tiffs to a single mult-page tiff
    tiffcp $tpages output.tif
    #Convert the tiff to PDF
    if [ $sc -gt 1 ]
    then
    	cnts=`printf %02d $cnt`
        tiff2pdf -z output.tif &gt; $startdir/$outname.$cnts.pdf
	mv $outname.$cnts.txt $startdir
    else
        tiff2pdf -z output.tif &gt; $startdir/$outname.pdf
	mv $outname.txt $startdir
    fi

    start=$(expr $start + $pb)
    cnt=$(expr $cnt + 1)

done

cd ..
echo &quot;################ Cleaning Up ################&quot;
rm -rf $tmpdir
cd $startdir


[/code]</description>
		<content:encoded><![CDATA[<p>Hi,<br />
thanks for this nice Script.<br />
I&#8217;ve done some changes and improvements:</p>
<p>- adding Format to the scanimage batch option (&#8211;batch=out%02d.tif)<br />
- compress with zip when called tiff2pdf<br />
- added ImageMagick Image enhancement (little more contrast) and two-bit Tiff (scans in Gray and reduces colors to 4)</p>
<p>Scanning in Gray have no decrease in speed on my HP 5590.<br />
I&#8217;ve also added Posibility to scan more than one dokument (page sequence) with the ADF.<br />
When you have onle one page to scan, its faster not to use the ADF.</p>
<p>So you can call scan2pdf:<br />
scan2pdf myDocument -&gt; scans one page, without the adf (saved as myDocument.pdf)<br />
scan2pdf 99 myDocument -&gt; uses ADF to scan into myDocument.pdf<br />
scan2pdf 3,8,2 myDocument -&gt; uses ADF and scans 3 page sequences, files are saved to myDocument.01.pdf (3 pages), myDocument.02.pdf (8 pages) und myDocument.03.pdf (2 pages)</p>
<p>Maybe someone like it:<br />
[code]</p>
<p>#!/bin/sh</p>
<p>SOURCE=""</p>
<p>if [ $# -gt 1 ]<br />
then</p>
<p>  SOURCE="--source ADF -l 3"<br />
  outname=$2<br />
  pbreak=$1</p>
<p>  echo "$pbreak" | egrep "[^0-9,]+"<br />
  if [ $? -ne 1 ]<br />
  then<br />
    echo "Check Sequnence List !"<br />
    exit 1<br />
  fi<br />
else</p>
<p>  pbreak=99<br />
  outname=$1<br />
  SOURCE="--batch-count=1"</p>
<p>fi</p>
<p>startdir=$(pwd)<br />
tmpdir=scan-$RANDOM</p>
<p>cd /tmp<br />
mkdir $tmpdir<br />
cd $tmpdir<br />
echo "################## Scanning ###################"<br />
scanimage -x 210 -y 297 --batch=out%02d.tif --format=tiff --mode Gray --resolution 300 $SOURCE</p>
<p>start=1<br />
cnt=1<br />
sc=$(echo "$pbreak" | cut -d"," -f1-99 --output-delimiter=" " | wc -w)<br />
for pb in $(echo "$pbreak" | cut -d "," -f1-99 --output-delimiter=" ")<br />
do<br />
    ende=$(expr $start + $pb - 1)<br />
    pnr=0<br />
    i=1<br />
    echo "############ Page-Sequence ($cnt), Pages: $pb, Start: $start, End: $ende ############"<br />
    tpages=""<br />
    for page in $(ls out*.tif); do<br />
	pnr=$(expr $pnr + 1)<br />
	if [ $pnr -ge $start -a $pnr -le $ende ]<br />
	then<br />
	    echo "... Converting"<br />
	    # increase contrast and reduce colordepth<br />
	    convert $page -level 15%,85% -depth 2 "b$page"<br />
	    echo "... OCRing"<br />
	    tpages="$tpages b$page"<br />
	    i=$(expr $i + 1)<br />
	    echo -n "    "<br />
            tesseract $page $page -l deu<br />
            if [ $sc -gt 1 ]<br />
            then<br />
        	cnts=`printf %02d $cnt`<br />
    		cat $page.txt &gt;&gt; $outname.$cnts.txt<br />
    	    else<br />
    		cat $page.txt &gt;&gt; $outname.txt<br />
    	    fi</p>
<p>	fi<br />
    done</p>
<p>    echo "... Converting to PDF"<br />
    #Use tiffcp to combine output tiffs to a single mult-page tiff<br />
    tiffcp $tpages output.tif<br />
    #Convert the tiff to PDF<br />
    if [ $sc -gt 1 ]<br />
    then<br />
    	cnts=`printf %02d $cnt`<br />
        tiff2pdf -z output.tif &gt; $startdir/$outname.$cnts.pdf<br />
	mv $outname.$cnts.txt $startdir<br />
    else<br />
        tiff2pdf -z output.tif &gt; $startdir/$outname.pdf<br />
	mv $outname.txt $startdir<br />
    fi</p>
<p>    start=$(expr $start + $pb)<br />
    cnt=$(expr $cnt + 1)</p>
<p>done</p>
<p>cd ..<br />
echo "################ Cleaning Up ################"<br />
rm -rf $tmpdir<br />
cd $startdir</p>
<p>[/code]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Scanning with sane&#8217;s scanimage from an ADF scanner to PDF and OCRed Text by elio</title>
		<link>http://jduck.net/2008/01/05/ocr-scanning/comment-page-1/#comment-2460</link>
		<dc:creator>elio</dc:creator>
		<pubDate>Wed, 19 Aug 2009 12:24:47 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2008/01/05/ocr-scanning/#comment-2460</guid>
		<description>Excellent Jonah. It works very well! Let me add a few annoyances I bumped into, so that other people can take adavntage.
a) You&#039;ve got to discover the name of the device of your scanner. Issuing &gt;scanimage -L will tell it to you. In my case 
elio@gazelle:$ scanimage -L
device `hpaio:/net/Officejet_Pro_L7500?ip=192.168.1.98&#039; is a Hewlett-Packard Officejet_Pro_L7500 all-in-one

b) tesseract installs also country dependant resources. In my Ubuntu 9.04 (english US) it install by default the German files. Go and install a compatible country. I also installed tesseract-ocr-eng

c) on the last step of your program, I couldn&#039;t resolv tiffcp and tiff2pdf. Fixed by installing the package  libtiff-tools

Albeit this note is somehow long I want to state that your solution is very simple and effective. Again, my compliments, I encourage everyone to adopt your solution, It took my five minutes and three tries to be up and running

Still, I have to discover how to scan a double sided document. I&#039;m investigating the command scanimage. I&#039;ll post again if I discover how it should be accomplished
Cheers Elio</description>
		<content:encoded><![CDATA[<p>Excellent Jonah. It works very well! Let me add a few annoyances I bumped into, so that other people can take adavntage.<br />
a) You&#8217;ve got to discover the name of the device of your scanner. Issuing &gt;scanimage -L will tell it to you. In my case<br />
elio@gazelle:$ scanimage -L<br />
device `hpaio:/net/Officejet_Pro_L7500?ip=192.168.1.98&#8242; is a Hewlett-Packard Officejet_Pro_L7500 all-in-one</p>
<p>b) tesseract installs also country dependant resources. In my Ubuntu 9.04 (english US) it install by default the German files. Go and install a compatible country. I also installed tesseract-ocr-eng</p>
<p>c) on the last step of your program, I couldn&#8217;t resolv tiffcp and tiff2pdf. Fixed by installing the package  libtiff-tools</p>
<p>Albeit this note is somehow long I want to state that your solution is very simple and effective. Again, my compliments, I encourage everyone to adopt your solution, It took my five minutes and three tries to be up and running</p>
<p>Still, I have to discover how to scan a double sided document. I&#8217;m investigating the command scanimage. I&#8217;ll post again if I discover how it should be accomplished<br />
Cheers Elio</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Scanning with sane&#8217;s scanimage from an ADF scanner to PDF and OCRed Text by Charles</title>
		<link>http://jduck.net/2008/01/05/ocr-scanning/comment-page-1/#comment-2246</link>
		<dc:creator>Charles</dc:creator>
		<pubDate>Sat, 27 Jun 2009 21:59:53 +0000</pubDate>
		<guid isPermaLink="false">http://jduck.net/2008/01/05/ocr-scanning/#comment-2246</guid>
		<description>Thank you; works nicely and smoothly.  I have added option --batch-start=101 to the scanimage command, because as written the order of pages in the single tiff file is not correct when more than 10 pages are scanned (with 101, you can scan 900 pages).</description>
		<content:encoded><![CDATA[<p>Thank you; works nicely and smoothly.  I have added option &#8211;batch-start=101 to the scanimage command, because as written the order of pages in the single tiff file is not correct when more than 10 pages are scanned (with 101, you can scan 900 pages).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
