Announcement

Collapse
No announcement yet.

Standards with scanned files?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standards with scanned files?

    Standards with scanned files?
    Code:
    [   Key     ) (Used to keep the record unique)
    Parent Child  Year MO DA HR MI SE EXTENSION
    ------ ------ ---- -- -- -- -- -- ---
    000321-000013-2020-09-11-12-32.55.PDF  '37 bytes
    000321-000013-2020-09-11-12-33.00.PDF  '37 bytes
    000400-000001-2020-09-11-13-34.04.PDF  '37 bytes
    
    
    (1) Is there any standard format for creating unique scanned file names?
    
    I left-aligned the key with zeroes thinking it might be used in the future with DIR.
    It  may be of no value in the future?
    
    (2) Should scanned records for each category be placed into a subfolder?
    
    c:\scan
    c:\scan\anything
    c:\scan\invoices
    c:\scan\purchaes
    
    (3) Is there a limit on the number of files placed into any folder?
    
    (4) Should the length of file names be short to save disk space?
    
    Don't hear about saving disk space much these days.
    
    (5) I was considering putting scanned documents into a sqlite blob column.
    
    I have heard many say just leave as is and point to them.  Any suggestions?
    I am sure people will come up with many items that could be scanned.
    
    (6) Scanner software can creates files  scan1.pdf, scan2.pdf .scan3.pdf
    Would using that considerably slow down after hundreds of thousands of scans?
    
    (7) Should the path to the file not be stored with each scan so files can be moved?
    
    (8) If files are kept outside a database XCOPY /D  will backup only the new files which is great
    Are there any other good reasons not to keep the scanned document in a database column?
    
    Lots of questions, but this is a really big feature and want to start out right.
    I'll post all the code when complete from running the scanner to adding records.  
    It is amazingly very little code.
    
    Any suggestions appreciated in creating a paperless office.
    How long is an idea? Write it down.

  • #2
    Most of the software I have seen makes a file for each scanned item so
    let the end user decide how his data should be arranged and stored.
    Walt Decker

    Comment


    • #3
      Standards with scanned files?
      [ Key ) (Used to keep the record unique)
      Parent Child Year MO DA HR MI SE EXTENSION
      ------ ------ ---- -- -- -- -- -- ---
      000321-000013-2020-09-11-12-32.55.PDF '37 bytes
      000321-000013-2020-09-11-12-33.00.PDF '37 bytes
      000400-000001-2020-09-11-13-34.04.PDF '37 bytes

      (1) Is there any standard format for creating unique scanned file names?


      No, it depends on the situation. I've seen systems that include
      an identifier for the person creating the document or for the client.

      2020-09-11T133404 will be handled correctly by MySQL,SQLServer etc without needing additional parsing - not so hyphenated time .
      (Why the dot before the seconds?)

      I left-aligned the key with zeroes thinking it might be used in the future with DIR.
      It may be of no value in the future?


      It's always a good idea to have filenames sortable so if you are going to include numeric data, zero pad .it

      (2) Should scanned records for each category be placed into a subfolder?

      c:\scan
      c:\scan\anything
      c:\scan\invoices
      c:\scan\purchaes

      (3) Is there a limit on the number of files placed into any folder?

      Maximum under NTFS: 4,294,967,295.
      But the practical limit for displaying/searching/sorting is a lot less
      If there will be a lot of files, it's always a good idea to group them in sub directories by some criteria.
      You need to decide whether you want all invoices/purchases in their respective folders or all documents for a specific client or whatever in a specific folder
      Could be a more suitable approach would be:
      c:\Scans\000321-000013
      containing
      INV-2020-09-11T123255.PDF
      PO-2020-09-11T123300.PDF
      MISC-2020-09-11T133404.PDF
      or

      2020-09-11T123255-INV.PDF
      2020-09-11T123300-MISC.PDF
      2020-09-11T133404-PO.PDF


      This sort of decision is entirely application dependent and needs to be given a lot of thought during initial design.

      (4) Should the length of file names be short to save disk space?

      Not an issue for disk space, but it still pays to keep the total path under 260 characters

      [B](5) I was considering putting scanned documents into a sqlite blob column.[/B]

      I have heard many say just leave as is and point to them. Any suggestions?
      I am sure people will come up with many items that could be scanned.


      A lot depends on how many scans and their sizes.
      Consider the easy of incremental backup of individual files v BLOBS in a database.

      A hierarchical directory structure is a form of database and combining it with an sqlite database you storing the key data about each file is often a good solution.

      (6) Scanner software can creates files scan1.pdf, scan2.pdf .scan3.pdf
      Would using that considerably slow down after hundreds of thousands of scans?


      That depends on how the software identifies the next number, but for locating specific scanned documents later, its a terrible idea

      (7) Should the path to the file not be stored with each scan so files can be moved?

      So above comments about hierarchical directories with appropriate names. If the correct data is stored, storing the file path becomes redundant.

      (8) If files are kept outside a database XCOPY /D will backup only the new files which is great
      Are there any other good reasons not to keep the scanned document in a database column?

      Agreed, see point 5 above.
      Apart from ease of backup, about the only reason not to keep them in a database file is the potential size of the file and the implications of that for any sort of backup.

      Comment


      • #4
        Originally posted by Walt Decker View Post
        Most of the software I have seen makes a file for each scanned item so
        let the end user decide how his data should be arranged and stored.
        You obviously have a much better type of user than I do

        Let me introduce you to Mx User. (I'd better not use a gender identifying first name such as "Willy" and especially not "Suzy"

        If you don't enforce consistence in an information system, you are going to run into major problems at some point in the future.

        Comment


        • #5
          2020-09-11T133404 will be handled correctly by MySQL,SLServer etc without needing additional parsing - not so hyphenated time .
          (Why the dot before the seconds?)

          Because I goofed, again. Thanks for catching that!
          How long is an idea? Write it down.

          Comment


          • #6
            (1) Is there any standard format for creating unique scanned file names?
            If you are restricted to one folder.. see the Windows API function GetTempFilename.()

            If you are not so restricted, you can use GUIDTXT$(GUID$()) to create unique names. (It's what I use for memory objects when I need to guarantee uniqueness)

            Michael Mattias
            Tal Systems Inc. (retired)
            Racine WI USA
            [email protected]
            http://www.talsystems.com

            Comment


            • #7
              No restrictions. I have project completed and trying to make it better.
              I have been using DATE + TIME which is a bit shorter, but the guid would definitely work.

              Looking if there is a twain.sdk that would be less bulky than my shell to .net program using its command line interface.
              I would give this program a ten, but would prefer not to shell to a command line interface. It is working exactly like I want.

              https://www.naps2.com/doc-command-line.html

              Reviews of naps2
              https://sourceforge.net/projects/naps2/reviews/

              Wondering if creating a subfolder for each client would be a bad idea. Created another thread on the subject.
              https://forum.powerbasic.com/forum/u...tore-pdf-files
              Attached Files
              Last edited by Mike Doty; 12 Sep 2020, 03:58 PM.
              How long is an idea? Write it down.

              Comment


              • #8
                Mr. McLachlan, no matter how you cut it, it is up to the end user. It does not matter whether it is plain J. Doe, Humana, or Gov.

                It has to be tailored to the end user. S/He is the one paying the bill.
                Walt Decker

                Comment


                • #9
                  Originally posted by Walt Decker View Post
                  Mr. McLachlan, no matter how you cut it, it is up to the end user. It does not matter whether it is plain J. Doe, Humana, or Gov.

                  It has to be tailored to the end user. S/He is the one paying the bill.
                  Not in my experience.

                  The end users (pl) are generally employees of the organisation that is paying the bill.
                  If you let the end user (sing) decide how to arrange and store his/her files, you will have no consistency in your iinformation system and the person paying the bill may well end up very unhappy and not pay all of the bill.

                  (I recently built a system where employees had been doing what you suggest for quite some time. Locating specific documents was difficult and time consuming Getting the scanned documents sorted out and appropriately organised and stored required many man hours.)

                  Comment


                  • #10
                    Stuart,
                    Did you ever put files for each client in there own subfolder?
                    I created 30,000 subfolders and it works, but not sure it is a good thing.
                    How long is an idea? Write it down.

                    Comment


                    • #11
                      Originally posted by Mike Doty View Post
                      Stuart,
                      Did you ever put files for each client in there own subfolder?
                      I created 30,000 subfolders and it works, but not sure it is a good thing.
                      See my response in your other thread

                      https://forum.powerbasic.com/forum/u...660#post799660

                      Comment


                      • #12
                        I would give this program a ten, but would prefer not to shell to a command line interface. It is working exactly like I want.
                        You do not have to SHELL it. You can use one of the functions provided by Windows to execute it.

                        For example..

                        Win 32: Monitor Process with ShellExecuteEx June 22, 2001

                        You don't have to SHELL at all, plus you get additional "handle execution" options!

                        CreateProcess() is another function you can use.

                        Michael Mattias
                        Tal Systems Inc. (retired)
                        Racine WI USA
                        [email protected]
                        http://www.talsystems.com

                        Comment


                        • #13
                          It looks like your process monitor is using shellexecuteEX, but it could eliminate my searching memory to see if scanner is active.
                          IF ISTRUE ShellExecuteEx(SEI) THEN ' function succeeded and returned
                          How long is an idea? Write it down.

                          Comment


                          • #14
                            I never really was comfortable with the PB SHELL statement as it means you are giving up control.I have used the SHELL function, but that only provides some of the functionality available with either CreateProcess() or ShellExecuteEx().

                            I figure, why settle for half a loaf when I can have the whole thing?


                            Michael Mattias
                            Tal Systems Inc. (retired)
                            Racine WI USA
                            [email protected]
                            http://www.talsystems.com

                            Comment


                            • #15
                              Re 1) I suggest to either start with the date part in ISO format (YYYY-MM-DD) or a "telling" prefix in front of it, e.g. INV2020-09-15. Helps us humans a lot.

                              Vaguely related to 2), I've written this tool a couple of years ago: https://tools.basicaware.de/SortFilesByDate.aspx Source to it is available at the linked Github repo.

                              Comment


                              • #16
                                I recently built a system where employees had been doing what you suggest for quite some time. Locating specific documents was difficult and time consuming Getting the scanned documents sorted out and appropriately organised and stored required many man hours.)


                                That's a different problem. So what exatly is the problem here?

                                Let's go back to post #1


                                (1) Is there any standard format for creating unique scanned file names?
                                No. (I think that has become pretty obvious)

                                For all the rest of the questions, the "best" (subject) answer/design depends on what you are trying to accomplish, best expressed in "user speak."

                                This is clearly an 'extension' or companion to the "30,000 subfolders" thread in this forum.

                                Let's get the application requirements and restrictions defined before writing that first line of code.



                                Michael Mattias
                                Tal Systems Inc. (retired)
                                Racine WI USA
                                [email protected]
                                http://www.talsystems.com

                                Comment


                                • #17
                                  I am done with the project. I am considering keeping the file names in the database, but it isn't required.
                                  The filenames of the .pdf files point to a database key. I have never done it this way, but it works.
                                  The filename is the index back to the database.
                                  How long is an idea? Write it down.

                                  Comment

                                  Working...
                                  X