Ask Me Help Desk

Ask Me Help Desk (https://www.askmehelpdesk.com/forum.php)
-   Other Computers (https://www.askmehelpdesk.com/forumdisplay.php?f=242)
-   -   Convert image (pdf, typewriter text) to text file (https://www.askmehelpdesk.com/showthread.php?t=306714)

  • Jan 21, 2009, 10:38 AM
    RickJ
    1 Attachment(s)
    Convert image (pdf, typewriter text) to text file
    I've played with 3 different OCR apps (including one that seems to be tops (Abbyy Fine Reader Pro) and find it too much. Easy to use but far too time consuming for my skills with it.

    See the attached pdf (38 pages of typed text). I have it and 5 others that need converted but am finding it too overwhelming.

    If there's someone out there who can convert it to [accurate] text (.doc, txt, html, etc), please give us a bid.

    We are a non profit organization on a limited budged. I am eager to present the best bid to the group to see if we can get these indexes put into text format.

    ... otherwise I'm left to spending hours on it. I can't help but guess that some ocr guru out there can do it in minutes...
  • Jan 21, 2009, 11:59 AM
    seahwk83

    Go to this site and you can upload your file and the site will convert it for you to just about anything

    Zamzar - Free online file conversion
  • Jan 22, 2009, 07:24 AM
    RickJ

    Wow!

    They do far better than the expensive, and supposedly "top notch" software I tried.

    Thank you thank you thank you!!
  • Jan 22, 2009, 07:28 AM
    ScottGem

    Just as an aside, OCR software is not needed here. What you would need is PDF converter software. OCR is used when you have a scanned image from hard copy that you want to convert to editable text.
  • Jan 22, 2009, 08:02 AM
    RickJ

    Correction :(

    Zamzar is not an ocr program, but just a file conversion program. It did not convert the old typed text in the pdf to text, but just put the pdf "image" into a .doc and .html file.

    My purpose is to get the pdf file to actual text...

    Thanks, though, seahawk. I do have other uses for a quick and easy file conversion app.
  • Jan 22, 2009, 08:06 AM
    RickJ
    Quote:

    Originally Posted by ScottGem View Post
    Just as an aside, OCR software is not needed here. What you would need is PDF converter software. OCR is used when you have a scanned image from hard copy that you want to convert to editable text.

    ? Ocr can be used for a pdf file or scanned image... and in most cases the software I use does well... but research has shown me that even good ocr software does poorly with old typewriter text, which is far more irregular than computer typed text.

    The software I use does a perfect job when the scanned image or pdf is of computer text... but it just hates the old typewriter text...

    I'll keep looking... and look too at "pdf converter" apps to see if there's a difference.
  • Jan 22, 2009, 08:13 AM
    RickJ
    1 Attachment(s)
    In case it helps, attached here is just one (of the 38 attached to the original post) page, as a sample, of the old typewriter text...
  • Jan 22, 2009, 08:22 AM
    ScottGem
    1 Attachment(s)
    I know that OCR can be used on PDFs that are the result of scanned hard copy. But its still a matter of the source being a scanned image.

    Typewriter text, if clean, is usually read well. The copy you have is not that clean. I used Paperport to convert to plain text. Its not 100% but pretty good.
  • Jan 22, 2009, 08:24 AM
    RickJ

    I'm still not comprehending what difference you are implying between pdf and scanned image.

    I scan the typed text to pdf or jpg and the results are the same: very poor.

    Paperport is one I've tried in the past and it does even worse than Abbyy. To date I've tried 5-6 different apps and Abbyy does best, but still poor.

    As we speak I'm going through some newer apps to see if they do better...
  • Jan 22, 2009, 09:06 AM
    RickJ

    What a bummer. I'd have thought that ocr software has "come a long way" since I purchased Abbyy Fine Reader... but the latest review at pcmag is from 2002 and the latest from cnet is from 2000! Gosh. Surprising...

    Still hunting :)

    PS. Wikipedia needs updated. The latest entry says modern ocr software is 99% accurate on typewritten text. Very wrong.
  • Jan 22, 2009, 09:20 AM
    ScottGem

    Ok, OCR packages work on a graphical image file. If its already text, there is no need to convert. Many scan programs will save the scanned image as a pdf. So the key is the source of the pdf.

    Like I said, your problem is not so much the typewriter typeface, but the crispness of the scanned image. That's what's causing the errors.
  • Jan 22, 2009, 09:27 AM
    RickJ
    2 Attachment(s)

    That's right. In using the supposedly best ocr software out there (Abbyy Fine Reader) I scan the typewritten document, in ultra high quality, to pdf or jpg but the results are the same: Poor.

    Yes, the "crispness" is key. Typewriter text is notoriously "uncrisp"... which is the issue.

    Can you suggest an app that will turn the attached (sample in both pdf and jpg format) typed text to editable text?
  • Jan 22, 2009, 09:30 AM
    ScottGem

    Are you scanning these yourself? If so, have you tried using a TIF file as the output and then OCR that?
  • Jan 22, 2009, 10:15 AM
    RickJ

    No, I've not. I'll give it a try and report back.
  • Jan 23, 2009, 07:08 AM
    RickJ

    Tiff is no better. After digging through the help, and checking forums I see that many are in the same boat as I: With old typewritten text, there is much "training" needed for the software to convert it to editable text.

    I spent 2 hours "training" it and only got to page 6 of 38...

    So maybe that's the best that can be done with today's technology...
  • Jan 23, 2009, 08:48 AM
    seahwk83

    There is omnipage pro and abbey which you already tried
  • Jan 26, 2009, 09:21 AM
    RickJ
    Quote:

    Originally Posted by seahwk83 View Post
    there is omnipage pro and abbey which you already tried

    Yes, both. Abbyy is much better - but neither does well with old typewriter text.
  • Jan 26, 2009, 09:28 AM
    NeedKarma
    How cheap can you get a student to type the stuff into Word? :)
  • Jan 26, 2009, 09:37 AM
    RickJ

    I tried that on one of them - and at 65 wpm (my rate), it took me longer than I'd like to spend...

    ... so the request for a bid can certainly include someone willing to type it manually :)

    ** Truth is, I did not think of seeking a bid for doing it manually. I'll check at my kids' High School to see what kids would charge for typing ;)

    ... PS/Edit: And I posted a Craigslist ad.
  • Mar 29, 2009, 10:59 AM
    onefilms
    I have the same problem. I have an old letter typed in the 1950's that I need OCR. Do any of the online OCR programs do old typerwriter text?

  • All times are GMT -7. The time now is 04:56 AM.