Announcement

Collapse
No announcement yet.

Create 30,000 subfolders to store PDF files.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create 30,000 subfolders to store PDF files.

    Folder for each client to store PDF files?

    No pointers needed to find files and third-party PDF readers would have access.
    It might take a lot of space, but would make things very easy.
    How much space would 30,000 empty folders take?

    I did a DIR before and after and it appears to take about 50-million bytes.

    Anyone ever done something like this?
    Are there any big disadvantages?

    XCOPY /D/E/R/Y c:\junk\*.* c:\junk2\*.* takes about 40-seconds.
    I placed a file in user10000 user20000 and user30000 to be sure it works.

    This only took 14-seconds to create the folders.
    The folders would only be created if a client had PDF files.
    Code:
    LOCAL sFolder AS STRING
    MKDIR "c:\junk"
    FOR x = 1 TO 30000
     sFolder = "c:\junk\USER"+FORMAT$(x)
     MKDIR sFolder
    NEXT
    Related thread on storing PDF files.
    https://forum.powerbasic.com/forum/u...les#post799642
    How long is an idea? Write it down.

  • #2
    I've done it with up to about 10,000 "member" folders for an organisation. A flat structure gets cumbersome working with lots of entries in a directory.

    The file system can handle it easily, but can the users

    If users need to locate files manually by searching through directory entries, you are better off going more granular. It' easier for them to quickly locate something with a structure such as:
    C:\ClientDocs\A\Abcde\, C:\ClientDocs\B\Bdefg\ etc
    or
    C:\Clientocs\00\123\ for Client 123 and C:\ClientDocs\01\123\ for Client 1123 etc
    rather than scrolling through 30,000 rows in a directory listing.

    (I built one of those with lots of clients and documents back in the early 90's)

    >
    How much space would 30,000 empty folders take?
    > I did a DIR before and after and it appears to take about 50-million bytes


    Every NTFS file or directory uses a 1 KB Master File Table (MFT) entry as a minimum. If all of the attributes (such as user permissions etc) don't fit into that 1KB, more than one entry will be used - so a minimum would be 30 MB for 30,000 files.

    Comment


    • #3
      Thanks,
      Trying to decide on one of these.

      1 key, filename
      2 key, blob of pdf
      3 folder for each client with pdf in subfolder(s)

      With key, blob would have to write back to disk to display
      How long is an idea? Write it down.

      Comment


      • #4
        Originally posted by Mike Doty View Post
        With key, blob would have to write back to disk to display
        Yep, that is an issue.
        I favour files on disk with a function in the application to open a specific file with the registered opener for that type. (not all files need to be PDFs, they could be any type of file)
        If you have one directory per client, it is simple to populate a list of files in that clients directory. when required and pass a selected filename to that function. ( or attach it to an email, copy it elsewhere, rename or delete it or whatever)

        Comment


        • #5
          Originally posted by Stuart McLachlan View Post
          Yep, that is an issue.
          I favour files on disk with a function in the application to open a specific file with the registered opener for that type. (not all files need to be PDFs, they could be any type of file)
          If you have one directory per client, it is simple to populate a list of files in that clients directory. when required and pass a selected filename to that function. ( or attach it to an email, copy it elsewhere, rename or delete it or whatever)
          Changing from option #1 (key + filename) to option 3 (folder for each client.)
          Like the other uses mentioned.
          Have no idea if every client will have scanned documents so will not create any folders in advance.

          Noticed Abode reader doesn't display thumbnails for pdf files so switched to foxit.
          This is another subject, but the user will have the ability to use whatever pdf or viewer software they want.

          With the directory per client, users can see all files in the root and subfolders without additional coding.
          Users can use 3rd party software for advanced features because it is a flat file system.
          I have been looking at some systems that work with pdf and images files that are very impressive and low cost.
          Users will be able to do whatever they want with files without relying upon my software.

          With the flat file structure opens up more options that I could not do with sqlite and other database managers.
          One of those things is now users will be able to save word documents in a subfolder of any client.

          Thanks, again!
          How long is an idea? Write it down.

          Comment


          • #6
            Going back MANY years to my limited exposure to RDBMS design, I would think that organizing by entity (each client's files in its own folder/branch) would enable better control, especially if security is/becomes an issue... (...unless files are shared across clients/entities...)

            ...just thinking out loud here... ignore if not relevant!

            -John

            Comment


            • #7
              Folder for each client to store PDF files?
              Are you providing some kind of "cloud storage service?" Or are you generating PDF files for your clients to download with some kind of at least rudimentary security (such as user/password)?

              If the latter, I might look at setting up FTP for them. Then you could create the files with their user ID, and set up each client's password, and they can download.

              I have a real nice easy to use FTP program you might want to provide your clients. I helped set up something like this for Blue Cross Blue Shield of Wisconsin/Medicare to provide users their own private labelled software they could use to pick up their remittance advice from their healthcare claims. (Hardware concerns are different, as we stored the report files on an IBM mainframe).

              Or,,,, you might want to think about storing the PDF files in a database and using a userid/password as the key for the rows the client is to pick up.

              Drop me a note or give me a call... this sounds like the kind of thing I might like to kick around as something to offset my "do nothing" retirement days.

              MCM
              Michael Mattias
              Tal Systems Inc. (retired)
              Racine WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                This is for an office that wants to scan text documents kept in folders for each client into 3 categories.
                The number of documents may run into the millions as I've learned they are the largest user of the equipment that creates the original documents.
                It is one of the most interesting projects that I've worked on in years.

                I have been working through different methods and not sure about creating thousands of folders.
                I can read the .PDF into a database and copy any .PDF needed to a temp folder if they want to view them.

                As far as backup, 1Drive.com says they can patch a file and only backup what changed since the previous backup. I haven't tested it.
                If this gets too time-consuming, I'll definitely take you up on your offer!
                How long is an idea? Write it down.

                Comment


                • #9
                  I might think about only creating the PDFs 'on demand as needed' I do that for Hilco.( I use the Haru Library). I'd have to look at the "scan" requirements more closely to go a whole lot deeper into your design. .

                  Speaking of which...

                  Don't let the client tell you HOW to do this (e.g store in separate folders). Make him tell you WHAT he wants to accomplish and YOU figure out the best way to meet those needs.
                  Michael Mattias
                  Tal Systems Inc. (retired)
                  Racine WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                  • #10
                    I was just working on a test of pulling a group of files from millions based upon DIR$ and DIR$ NEXT.
                    It would work, but no idea how fast it would be?

                    Parent + Child + "Category" + guid .PDF
                    3-1-ZZ-
                    3-1-ZZ-

                    If files were in a blob in a SQL database it would be very easy and eliminate a tremendous number of files.
                    A new database could be created every year to keep the size down and eliminate backing up the same data..

                    I'm spinning back and forth between methods.
                    Read so many saying not to put into a blob, but I'm leaning toward it.
                    How long is an idea? Write it down.

                    Comment


                    • #11
                      I was just working on a test of pulling a group of files from millions based upon DIR$ and DIR$ NEXT.
                      It would work, but no idea how fast it would be?
                      There is only one way to know how fast something will be in "Real Life" ... try it.


                      If files were in a blob in a SQL database it would be very easy and eliminate a tremendous number of files.
                      A new database could be created every year to keep the size down and eliminate backing up the same data..
                      If you are starting with PDF files, for sure storing BLOB would be superior. Actually you could compress the BLOBs (ZIP, Windows compression, whatever) and save more space. Decompress as needed.

                      But if you are creating PDFs on request, you should be able to store only the underlying data (compressed optional) and create the PDFs only "as requested/as needed."

                      Storing the finished or presentation form of data is thinking from the days of the 6 Mhz (six megahertz) single CPU computers. Better technology today means you cannot get stuck in "oldthink."

                      One of the reasons I always liked having younger programmers on my staff was, they had not yet learned what "can't be done" so they just went ahead and did it.

                      MCM



                      Michael Mattias
                      Tal Systems Inc. (retired)
                      Racine WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment


                      • #12
                        Storing files in pdf format gives the user the ability to use any pdf reader without dependency upon my program to move or display data.
                        The user could even use an archive program. This has been a difficult decision, but since the original pdf files exist the method could be changed.
                        It also turns out the user needs to catch-up on scanning older records.
                        When they have caught up they will be able to automatically save any document with a click using a key created by where they are in my program..
                        How long is an idea? Write it down.

                        Comment


                        • #13
                          Originally posted by Michael Mattias View Post
                          If you are starting with PDF files, ...

                          But if you are creating PDFs on request, ..."

                          Storing the finished or presentation form of data is thinking from the days of the 6 Mhz (six megahertz) single CPU computers.
                          The requirement seemed fairly clear.
                          "This is for an office that wants to scan text documents kept in folders for each client into 3 categories."

                          Comment


                          • #14
                            I don't know how it is at the moment, but with older Windows NTFS versions you couldn't store more as 4000...5000 entries in a directory without problems.
                            Normally a DIR is almost instantly. Above 4000...5000 entries it quickly rises to 45+ seconds. (change is very sudden, not slowly growing)
                            (never found reason, but after that we limited the number of directory entries to 3000...)
                            Regards,
                            Peter

                            Comment


                            • #15
                              Or just name the files 1.pdf to 99999.pdf and use a database to keep track of which one is which.

                              Comment


                              • #16
                                https://www.howtogeek.com/246087/how...s-very-slowly/

                                I changed folder option which was "Pictures" to "General".
                                Will Windows index files in an alpha sort order?

                                It must index left to right because searching on 500,000 files is instantanous. 9/18/20 2:16 AM (see below post #17)

                                Changed properties to Documents and back to General and it does slow down a search.

                                Note: If the \scan folder is open while creating files does a lot of work.
                                Attached Files
                                Last edited by Mike Doty; 18 Sep 2020, 05:37 AM.
                                How long is an idea? Write it down.

                                Comment


                                • #17
                                  Now over 1,000, 000 files
                                  Unchecked "Allow files on this drive to have contents indexed ..."
                                  Curious if the search will be fast without Windows indexing.
                                  This will take a long time to process.

                                  Good news.
                                  With no Windows indexing the search time is still great with over 1,000,000 files.
                                  This test was done on Windows 10 32-bit.

                                  Searching for any group of indexed file names is instantaneous (even when returning over 10,000 matches.)
                                  Parent-*
                                  Parent-Child-*
                                  Parent-Child-Anything-*

                                  Point external files to records in a database without modifying the database.
                                  Any fixed or variable length file can be created to point to the keys in the main database.

                                  The function "search" is an example searching for a key in the output folder.
                                  The number of files found for the key is returned. It always takes under 1-second.

                                  Uses might include a memo file or any type of files for any records in the main database.
                                  The filename only needs to begin with a key to the record in the main database followed by a delimiter.
                                  I used "-" as the delimiter.

                                  30,000 Subfolders are not needed. Keeping all the files in 1-folder has advantages.
                                  XCOPY /D will backup only the new files.

                                  Newer version that optionally copies found files to a temp folder for use by other programs.

                                  Code:
                                  #DIM ALL                    'c:\sql\bin\search.bas
                                  $ScanFolder = "c:\scan\"    'simulate creating scanned files in this folder
                                  $TempFolder = "c:\output\"  'folder to receive found files from a search
                                  %WriteToTempFolder=1        'files found in $scanfolder to $tempfolder
                                  %InsertFiles = 1            'insert this many random files into $ScanFolder
                                  
                                  'Example searches: "1"  "1-123"  "1-9999-INVOICE"  "-*" is appended to find all
                                  
                                  FUNCTION PBMAIN () AS LONG
                                   MKDIR $TempFolder:ERRCLEAR      'create temp folder
                                   LOCAL s,sProg,sParent,sChild,sCategory,sDate,sTime,sFileType,sScanFolder,sProgress,sSplit AS STRING,x AS LONG
                                   sScanFolder  = $ScanFolder
                                   IF RIGHT$(sScanFolder,1) <> "\" THEN sScanFolder = sScanFolder + "\" 'or creates in \user\virtual store
                                   'RANDOMIZE  '49-11 is first file if RANDOMIZE not used
                                   FOR x = 1 TO %InsertFiles       'each file must be unique
                                    sParent   = FORMAT$(RND(1,50)) '1 Parent
                                    sChild    = FORMAT$(RND(1,50)) '2  child
                                    sCategory = "BLOOD"            '3  category
                                    sDate     = GUIDTXT$(GUID$())  'or year+month+date+hour+minute+second
                                    sFileType = ".PDF"             '5 pass file type may prevent having to change in the future
                                    '1 parent, 2 child, 3 category, 4 datetime, 5 filetype, 6 outpath, 7 --progress, 8 --split
                                    s  = CHR$(sScanFolder,sParent,"-",sChild,"-",sCategory,"-",sDate,sFileType)
                                    '-----------------------------------------------------------------------------------------
                                    'SHELL(sProg + " " + s)        'program normally shells to scanner (tested)
                                    '-----------------------------------------------------------------------------------------
                                    OPEN s FOR OUTPUT AS #1
                                    IF ERR THEN ? "Open error" + STR$(ERR) + $CR + ERROR$,%MB_SYSTEMMODAL:EXIT FUNCTION
                                    PRINT #1,"data";
                                    IF ERR THEN ? "PRINT #1 error" + STR$(ERR) + $CR + ERROR$,%MB_SYSTEMMODAL:EXIT FUNCTION
                                    CLOSE #1
                                   NEXT
                                   BEEP  'random files were creaed
                                   Search sScanFolder,%WriteToTempFolder  'search for file(s)
                                  END FUNCTION
                                  
                                  FUNCTION Search(sScanPath AS STRING,WriteToTempFolder AS LONG) AS LONG
                                  
                                   LOCAL x AS LONG
                                   LOCAL sDefault   AS STRING
                                   LOCAL sFirstFile AS STRING
                                   LOCAL sFile      AS STRING
                                   LOCAL sChar      AS STRING
                                   LOCAL ErrorFlag  AS LONG
                                   LOCAL sPrompt    AS STRING
                                  
                                   sPrompt = CHR$("PARENT NUMBER",$CR,"PARENT - CHILD",$CR,"PARENT - CHILD - ANYTHING")
                                   sDefault = "3-2"
                                   DO
                                    sDefault = INPUTBOX$(sPrompt,"FILE SEARCH by one of three methods",sDefault)
                                    IF LEN(sDefault) = 0 THEN EXIT DO
                                    FOR x = 1 TO LEN(sDefault)
                                     schar = UCASE$(MID$(sDefault,x,1))
                                     IF INSTR("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ-",sChar) = 0 THEN ERRORFLAG=1:BEEP:EXIT FOR
                                    NEXT
                                    IF ErrorFlag THEN ErrorFlag = 0:ITERATE
                                  
                                    IF  WriteToTempFolder THEN
                                      KILL $TempFolder + "*.*"  'potentially dangerous empties temp folder
                                    END IF
                                  
                                    DO WHILE RIGHT$(sDefault,1) = "-"
                                     sDefault = LEFT$(sDefault,-1)
                                    LOOP
                                    'directory search
                                    sFirstFile = sScanPath + sDefault + "-*"
                                    x = 0
                                    sFile = DIR$(sFirstFile)
                                    DO WHILE LEN(sFile)
                                     INCR x
                                     IF WriteToTempFolder THEN 'copy to temp folder if found
                                       CopyFiles(sScanPath + sFile,$TempFolder + sFile)
                                     END IF
                                     sFile =  DIR$(NEXT)
                                    LOOP
                                    IF WriteToTempFolder THEN 'results
                                     ? USING$("# FILES  & to &",x,UCASE$(sScanPath),UCASE$($TempFolder)),%MB_SYSTEMMODAL,"Copy Complete"
                                    ELSE
                                     ? USING$("Found # files",x),%MB_SYSTEMMODAL,"No copy"
                                    END IF
                                   LOOP 'outer input loop
                                  END FUNCTION
                                  
                                  FUNCTION CopyFiles(sInputFile AS STRING, sOutputFile AS STRING) AS LONG
                                   LOCAL hFile   AS LONG
                                   LOCAL sBuffer AS STRING
                                   hFile = FREEFILE
                                   OPEN sInputFile  FOR BINARY AS hFile
                                   GET$ #hFile,LOF(hFile),sBuffer
                                   CLOSE #hFile
                                   OPEN sOutputFile FOR OUTPUT AS hFile
                                   PRINT #hFile,sBuffer;
                                   CLOSE #hFile
                                  END FUNCTION
                                  
                                  FUNCTION CountAllFiles(sScanPath AS STRING,NumberOfTimes AS LONG) AS LONG
                                   LOCAL sFile AS STRING, counter AS LONG
                                   sFile = DIR$(sScanPath + "\")
                                   DO WHILE LEN(sFile)
                                    INCR counter
                                    sFile = DIR$(NEXT)
                                   LOOP
                                   FUNCTION = counter
                                  END FUNCTION
                                  Last edited by Mike Doty; 18 Sep 2020, 11:11 AM.
                                  How long is an idea? Write it down.

                                  Comment


                                  • #18
                                    RE:
                                    Code:
                                      FUNCTION CopyFiles(sInputFile AS STRING, sOutputFile AS STRING) AS LONG
                                    ==> .
                                    PowerBASIC 'FILECOPY?'

                                    WinApi 'ShFileOperation?'


                                    Michael Mattias
                                    Tal Systems Inc. (retired)
                                    Racine WI USA
                                    [email protected]
                                    http://www.talsystems.com

                                    Comment


                                    • #19
                                      MM,

                                      Good suggestions
                                      I created the million files as you suggested and using the file name as an index works very well.
                                      Now wondering how an archive like 7-ZIP would work. Might sacrifice control or performance?
                                      I'm going to also test as a column in SQLite after taking a nap.
                                      SQLite has a zlib option that might be used. Not sure how it would work on a .PDF file.
                                      How long is an idea? Write it down.

                                      Comment


                                      • #20
                                        Originally posted by Mike Doty View Post
                                        Now over 1,000, 000 files
                                        Unchecked "Allow files on this drive to have contents indexed ..."
                                        Curious if the search will be fast without Windows indexing.
                                        This will take a long time to process.

                                        Good news.
                                        With no Windows indexing the search time is still great with over 1,000,000 files.
                                        Hardly surprising, you are not searching for file content
                                        If you search for files containing certain data, "content indexing" will affect the search time.

                                        Comment

                                        Working...
                                        X