Ask Experts Questions for FREE Help !
Ask
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #1

    Jan 21, 2009, 10:38 AM
    Convert image (pdf, typewriter text) to text file
    I've played with 3 different OCR apps (including one that seems to be tops (Abbyy Fine Reader Pro) and find it too much. Easy to use but far too time consuming for my skills with it.

    See the attached pdf (38 pages of typed text). I have it and 5 others that need converted but am finding it too overwhelming.

    If there's someone out there who can convert it to [accurate] text (.doc, txt, html, etc), please give us a bid.

    We are a non profit organization on a limited budged. I am eager to present the best bid to the group to see if we can get these indexes put into text format.

    ... otherwise I'm left to spending hours on it. I can't help but guess that some ocr guru out there can do it in minutes...
    Attached Images
  1. File Type: pdf CRSnames1975-1977.pdf (899.7 KB, 750 views)
  2. seahwk83's Avatar
    seahwk83 Posts: 3,276, Reputation: 212
    Ultra Member
     
    #2

    Jan 21, 2009, 11:59 AM

    Go to this site and you can upload your file and the site will convert it for you to just about anything

    Zamzar - Free online file conversion
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #3

    Jan 22, 2009, 07:24 AM

    Wow!

    They do far better than the expensive, and supposedly "top notch" software I tried.

    Thank you thank you thank you!!
    ScottGem's Avatar
    ScottGem Posts: 64,966, Reputation: 6056
    Computer Expert and Renaissance Man
     
    #4

    Jan 22, 2009, 07:28 AM

    Just as an aside, OCR software is not needed here. What you would need is PDF converter software. OCR is used when you have a scanned image from hard copy that you want to convert to editable text.
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #5

    Jan 22, 2009, 08:02 AM

    Correction :(

    Zamzar is not an ocr program, but just a file conversion program. It did not convert the old typed text in the pdf to text, but just put the pdf "image" into a .doc and .html file.

    My purpose is to get the pdf file to actual text...

    Thanks, though, seahawk. I do have other uses for a quick and easy file conversion app.
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #6

    Jan 22, 2009, 08:06 AM
    Quote Originally Posted by ScottGem View Post
    Just as an aside, OCR software is not needed here. What you would need is PDF converter software. OCR is used when you have a scanned image from hard copy that you want to convert to editable text.
    ? Ocr can be used for a pdf file or scanned image... and in most cases the software I use does well... but research has shown me that even good ocr software does poorly with old typewriter text, which is far more irregular than computer typed text.

    The software I use does a perfect job when the scanned image or pdf is of computer text... but it just hates the old typewriter text...

    I'll keep looking... and look too at "pdf converter" apps to see if there's a difference.
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #7

    Jan 22, 2009, 08:13 AM
    In case it helps, attached here is just one (of the 38 attached to the original post) page, as a sample, of the old typewriter text...
    Attached Images
  3. File Type: pdf CRSnames1975-1977_page1.pdf (30.0 KB, 477 views)
  4. ScottGem's Avatar
    ScottGem Posts: 64,966, Reputation: 6056
    Computer Expert and Renaissance Man
     
    #8

    Jan 22, 2009, 08:22 AM
    I know that OCR can be used on PDFs that are the result of scanned hard copy. But its still a matter of the source being a scanned image.

    Typewriter text, if clean, is usually read well. The copy you have is not that clean. I used Paperport to convert to plain text. Its not 100% but pretty good.
    Attached Files
  5. File Type: txt CRSnames.txt (2.0 KB, 406 views)
  6. RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #9

    Jan 22, 2009, 08:24 AM

    I'm still not comprehending what difference you are implying between pdf and scanned image.

    I scan the typed text to pdf or jpg and the results are the same: very poor.

    Paperport is one I've tried in the past and it does even worse than Abbyy. To date I've tried 5-6 different apps and Abbyy does best, but still poor.

    As we speak I'm going through some newer apps to see if they do better...
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #10

    Jan 22, 2009, 09:06 AM

    What a bummer. I'd have thought that ocr software has "come a long way" since I purchased Abbyy Fine Reader... but the latest review at pcmag is from 2002 and the latest from cnet is from 2000! Gosh. Surprising...

    Still hunting :)

    PS. Wikipedia needs updated. The latest entry says modern ocr software is 99% accurate on typewritten text. Very wrong.
    ScottGem's Avatar
    ScottGem Posts: 64,966, Reputation: 6056
    Computer Expert and Renaissance Man
     
    #11

    Jan 22, 2009, 09:20 AM

    Ok, OCR packages work on a graphical image file. If its already text, there is no need to convert. Many scan programs will save the scanned image as a pdf. So the key is the source of the pdf.

    Like I said, your problem is not so much the typewriter typeface, but the crispness of the scanned image. That's what's causing the errors.
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #12

    Jan 22, 2009, 09:27 AM

    That's right. In using the supposedly best ocr software out there (Abbyy Fine Reader) I scan the typewritten document, in ultra high quality, to pdf or jpg but the results are the same: Poor.

    Yes, the "crispness" is key. Typewriter text is notoriously "uncrisp"... which is the issue.

    Can you suggest an app that will turn the attached (sample in both pdf and jpg format) typed text to editable text?
    Attached Images
     
    Attached Images
  7. File Type: pdf CRSnames1975-1977_page1.pdf (30.0 KB, 196 views)
  8. ScottGem's Avatar
    ScottGem Posts: 64,966, Reputation: 6056
    Computer Expert and Renaissance Man
     
    #13

    Jan 22, 2009, 09:30 AM

    Are you scanning these yourself? If so, have you tried using a TIF file as the output and then OCR that?
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #14

    Jan 22, 2009, 10:15 AM

    No, I've not. I'll give it a try and report back.
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #15

    Jan 23, 2009, 07:08 AM

    Tiff is no better. After digging through the help, and checking forums I see that many are in the same boat as I: With old typewritten text, there is much "training" needed for the software to convert it to editable text.

    I spent 2 hours "training" it and only got to page 6 of 38...

    So maybe that's the best that can be done with today's technology...
    seahwk83's Avatar
    seahwk83 Posts: 3,276, Reputation: 212
    Ultra Member
     
    #16

    Jan 23, 2009, 08:48 AM

    There is omnipage pro and abbey which you already tried
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #17

    Jan 26, 2009, 09:21 AM
    Quote Originally Posted by seahwk83 View Post
    there is omnipage pro and abbey which you already tried
    Yes, both. Abbyy is much better - but neither does well with old typewriter text.
    NeedKarma's Avatar
    NeedKarma Posts: 10,635, Reputation: 1706
    Uber Member
     
    #18

    Jan 26, 2009, 09:28 AM
    How cheap can you get a student to type the stuff into Word? :)
    RickJ's Avatar
    RickJ Posts: 7,762, Reputation: 864
    Uber Member
     
    #19

    Jan 26, 2009, 09:37 AM

    I tried that on one of them - and at 65 wpm (my rate), it took me longer than I'd like to spend...

    ... so the request for a bid can certainly include someone willing to type it manually :)

    ** Truth is, I did not think of seeking a bid for doing it manually. I'll check at my kids' High School to see what kids would charge for typing ;)

    ... PS/Edit: And I posted a Craigslist ad.
    onefilms's Avatar
    onefilms Posts: 1, Reputation: 1
    New Member
     
    #20

    Mar 29, 2009, 10:59 AM
    I have the same problem. I have an old letter typed in the 1950's that I need OCR. Do any of the online OCR programs do old typerwriter text?

Not your question? Ask your question View similar questions

 

Question Tools Search this Question
Search this Question:

Advanced Search


Check out some similar questions!

Excel 15 digit issue, tried converting to text, text to column feature negates fix [ 6 Answers ]

I have the following numbers that exceed 15 characters that needs to be split into its own columns. Down the road, there would be thousands of such rows of data with the first couple set of unique numbers. 890432453253208820,5004500558,05CC,1,0,0,0,0,0,0, 0000,5.0000,2007-01-11...

Scanner that converts text image to real text [ 1 Answers ]

Dear Helpdesk advisors, I look for a scanner,that reads any character from paper,converts it to text and has interface with Microsoft software.Usage area: reads product name , unit of product bought, price and other sort of text info.The scanner is able to convert printed and possibly...

Save rich text box image to database [ 3 Answers ]

Hi, I very new to access. What exactly I want is that users will take a screenshot of an application and paste it in the form by possibly an unbound ole object or whatever is suitable or suggested. There are other fields in the form to be filled up. How can I do the same without asking the user...

Can a Video file be converted to a understandable text file [ 4 Answers ]

Hi, Can you please let me know if a video file be converted to some readable text file. For example, if we read the text file, we should understand what the video shows and vice versa. Is there any such conversion possible. Please help me in this issue as early as possible.

Convert text file for CD player [ 5 Answers ]

Can I convert a text file--presumably ASCII--to a format I can burn to and played on an ordinary CD?


View more questions Search