Check Out Our Shop
Results 1 to 7 of 7

Thread: Scan to documents to Excel Automatically?

  1. #1
    Join Date
    Jan 2017
    Posts
    685

    Scan to documents to Excel Automatically?

    Is there an easy to use program out there that will take a scanned pdf or jpeg file, pull specific information out and save it into an excel file?

    I repetitively enter data off of documents into an excel file as part of my job functions. We then track costs, etc off the excel sheet.

    Is there a program out there that I can set up so that it will pull specific information (date, ticket #, quantity) out of a field of other text I don't need and dump it straight into an excel file (add the information to the next row)?

    I realize there are OCR scanner programs out there, but I just want to drop a stack of documents in the scanner and then point a program at it to pull in the information automatically.

  2. #2
    Join Date
    Feb 2005
    Posts
    19,745
    You need to have the idiots get you the data before they print it to paper or pdf.
    Is it radix panax notoginseng? - splat
    This is like hanging yourself but the rope breaks. - DTM
    Dude Listen to mtm. He's a marriage counselor at burning man. - subtle plague

  3. #3
    Join Date
    Oct 2003
    Location
    Golden BC
    Posts
    4,248
    I don't know the program but have used the result. Started of with 20 year old docs not in best shape as I did use the scan PDF to check, looked like a really crappie many generation photo copy. The excel file had some errors due to the crappie source but not many.
    Mrs. Dougw- "I can see how one of your relatives could have been killed by an angry mob."

    Quote Originally Posted by ill-advised strategy View Post
    dougW, you motherfucking dirty son of a bitch.

  4. #4
    Join Date
    Jan 2017
    Posts
    685
    Quote Originally Posted by MakersTeleMark View Post
    You need to have the idiots get you the data before they print it to paper or pdf.
    Working on that end of it as well, just looking for alternatives.

  5. #5
    Join Date
    Oct 2005
    Location
    Wasatch
    Posts
    6,253
    Quote Originally Posted by MakersTeleMark View Post
    You need to have the idiots get you the data before they print it to paper or pdf.
    This is the best solution.

    Your next most straightforward solution will be to get an OCR program that saves to CSV and write a macro to dump the fields from the CSV to the spot you want in your Excel doc.

  6. #6
    Join Date
    Feb 2006
    Location
    Among Greatness All Around
    Posts
    6,866
    It is going to greatly depend on the formatting. Scanning to .pdf is not going to get you far enough as you already know. It is still not in any sort of text format. You need to do OCR (optical character recognition) and then you are a bit closer. However if the data is just characters, where the various fields or columns start and end needs to be designated and most OCR packages will just place spaces between the various items. If all the data is "standardized" such as the first column is a date, the second column is say a variable that is always the same length (xxx.yy or something) and then maybe some more entries that can be easily set up into a database schema then some coding could automatically do the work once accurate OCR scanning is completed. Otherwise you will just be looking at adding more errors or having to manually process each individual entry. So the suggestion of where the original data source (was it first recorded on paper, or entered into the computer in a particular software package then provided to you in the printed format is your first step. Best case scenario is the original data is stored electronically in a format that can be mapped or exported to a format that Excel can be read or ODBC (open database connection) used to suck all the data into Excel without any scanning or OCR or scripting automation at all.

    Otherwise if the information is in the really crappy paper only format- maybe best to hire a temp worker to just reenter the data into the computer (preferable) because they can be determining the start and end of each field, and if a good typist less errors due to the OCR attempts in addition to the scripting or program to get the variables.

  7. #7
    Join Date
    Jan 2005
    Location
    Denver, CO
    Posts
    1,620
    Is it a scanned or printed to a pdf? I have written python scripts which extract certain text from a pdf. There is a python library for OCR. I have no idea how well it works though. This might get you started.
    https://pythontips.com/2016/02/25/oc...-using-python/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •