Announcement

Collapse
No announcement yet.

Question on limitations of recursive code

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on limitations of recursive code

    Calling Paul Dixon!

    In another thread (about RMDIR), message #4, you posted code showing how to capture files in folders. That code uses a recursive strategy.

    I've tried it on one of my hard drives, and as long as I'm searching one or two levels down, all is fine. But when I attempt to survey the entire drive, the program gets to a certain point and crashes. The only change I made was to add a time string in the filename so that each run is preserved (no overwrites). At the point of the crash, one of the .csv files had reached a size of over 4G!

    The Windows crash files only indicate "APPCRASH" with no other useful info. The 1T target drive has two partitions, and the one I'm searching is about 700G. A commercial file manager I use reports the drive is using over 337MB, in 643,460 files, in 43,972 directories.

    Guessing that the recursion was overflowing the stack, I tried to improve the program's memory use, by adding these compiler directives:
    #OPTION LARGEMEM32
    #STACK 2000000

    Still crashes...

    SO my question is about how to know when designing a recursive routine, is there any way to determine if it's going to run up against memory limits?
    How can those limits be determined in advance?

    And if the limits may be reasonably be reached, would you adopt a different strategy (non-recursive)?

    My interest is in the process of doing design and choosing strategy, and learning how to make reasonable assessments up-front.

    Thanks for any insights you can share! (And from anyone else with knowledge/experience in this area!)

    -John






  • #2
    To avoid the side effects, you may try to modify the routine to a non recursive one,
    like in Non recursive FileEnum() function
    It should improve speed also...

    Comment


    • #3
      At the point of the crash, one of the .csv files had reached a size of over 4G!
      Code not shown, but in there somewhere were you using a dword for file size or position in file? Should be a quad.

      Cheers,
      Dale

      Comment


      • #4
        John,
        the problem won't be the stack running out.
        My drive is a similar size to yours and the maximum recursion depth is only 23 and the maximum stack used is only 17,568 bytes.

        Although there are a million files on the drive, the code doesn't recursively call 1 million levels, it only adds 1 more level for each deeper level of sub directory and that deeper level is given up when the scan for that directory is completed.

        Comment


        • #5
          John

          I'm not sure, but it could be a path size problem.
          It can also be a problem defining variables, especially if you have files with unicode character names.
          Code:
           ...   
           LOCAL NextFromFolderName AS STRING
          See:
          https://forum.powerbasic.com/forum/u...-curdir-lenght

          Comment


          • #6
            I don't know how it is with 'modern' Windows (never tried), but old versions (Win2000, Xp) had great problems with directories
            with large numers of files in it (4500+)
            A DIR of directories with less then 4500 files took < 1second, above 4500 it took close to a minute.
            Windows then sometimes assumed the program had crashed (not responsive), in fact Windows caused it by itself...
            Regards,
            Peter

            Comment


            • #7
              Pierre,
              It should improve speed also...
              Recursion is not necessarily slow.

              Done non-recursively, you need to keep track of which folders have not yet been done.

              The recursion only occurs for a folder. On my drive there are only 40,000 folders so we have about 40,000 calls to the recursive routine compared to 40,000 strings being added to an array to track the folders.
              String creation is slow.


              In this case I would think it adds more overhead to track the 40,000 folders than it does to make 40,000 function calls but 99%+ of the time is spent doing the rest of the job so the overhead will be low in both cases.

              Comment


              • #8
                Originally posted by John Montenigro View Post
                At the point of the crash, one of the .csv files had reached a size of over 4G!
                That may be a hint that the problem is with the file writing, not the recursion. You're not writing to a FAT32 drive by any chance?)

                Comment


                • #9
                  John,
                  here's a modified version which looks at the stack usage.
                  It doesn't write the data to file as it's only a test.
                  Code:
                  PBCC6 program
                  'list all files on a drive or in a folder to a text file on disk that can be opened in Excel.
                  
                  
                  #COMPILE EXE
                  #DIM ALL
                  #BREAK ON
                  
                  #INCLUDE "win32api.inc"
                  
                  
                  GLOBAL gFilesCopied AS LONG
                  GLOBAL gDirectoriesCopied AS LONG
                  GLOBAL hFile             AS LONG
                  
                  
                  GLOBAL gInitialStack AS LONG
                  GLOBAL gCurrentStack AS LONG
                  GLOBAL gUsedStack AS LONG
                  GLOBAL gRecursionDepth AS LONG
                  GLOBAL gMaxRecursionDepth AS LONG
                  
                  GLOBAL gDeepestPath AS STRING
                  
                  
                  FUNCTION PBMAIN () AS LONG
                  
                  
                  
                  !mov gInitialStack,esp
                  
                  PRINT "gInitialStack=";gInitialStack
                  
                  
                  
                  
                  LOCAL StartDirectory  AS STRING
                  
                  INPUT "Start at which folder? (e.g c:\ or d:\ThisFolder\)";StartDirectory
                  
                  IF RIGHT$(StartDirectory,1) <> "\" AND RIGHT$(StartDirectory,1) <> "/" THEN
                      StartDirectory += "\"
                  
                  END IF
                  
                  
                  hFile = FREEFILE
                  OPEN "c:\FileList.csv" FOR OUTPUT AS #hFile
                  
                  
                  PRINT#hFile, "Started at " TIME$;" ";DATE$
                  
                  PRINT#hFile, RecursiveFileList(StartDirectory)
                  
                  
                  PRINT#hFile,
                  PRINT#hFile, "Finished at " TIME$;" ";DATE$
                  PRINT#hFile,
                  PRINT#hFile, "Files found =";gFilesCopied
                  PRINT#hFile, "Directories found =";gDirectoriesCopied
                  
                  PRINT#hFile, "On exit:";ERROR$
                  
                  CLOSE#hFile
                  
                  
                  
                  PRINT "gDeepestPath=";gDeepestPath
                  
                  BEEP
                  PRINT "Press key to end"
                  WAITKEY$
                  
                  END FUNCTION
                  
                  
                  
                  FUNCTION RecursiveFileList(StartFolderName AS STRING) AS LONG
                  
                  LOCAL FileData  AS WIN32_FIND_DATA
                  LOCAL hFindFile AS LONG
                  LOCAL NextFile  AS ASCIIZ * %MAX_PATH
                  LOCAL FileCount AS LONG
                  LOCAL temp AS LONG
                  LOCAL NextFromFolderName AS STRING
                  
                  
                  INCR gRecursionDepth
                  
                  IF gRecursionDepth > gMaxRecursionDepth THEN
                      gMaxRecursionDepth = gRecursionDepth
                  
                      !mov gCurrentStack,esp
                  
                      gUsedStack = gInitialStack - gCurrentStack
                  
                      PRINT "gMaxRecursionDepth";gMaxRecursionDepth, "gUsedStack = ";gUsedStack
                      gDeepestPath = StartFolderName
                  
                  END IF
                  
                  
                  NextFile = StartFolderName + "*.*"
                  
                  hFindFile = FindFirstFile(NextFile,FileData)
                  
                  IF hFindFile <> %INVALID_HANDLE_VALUE THEN
                      DO
                          'we found a file or directory
                          NextFile = StartFolderName + FileData.cFileName
                  
                          IF (FileData.dwFileAttributes AND %FILE_ATTRIBUTE_DIRECTORY) = 0 THEN
                  
                  
                  
                          'for this test, don't write the file data...
                           'it's a file
                           '#   PRINT#hFile, NextFile, ",",FileData.nFileSizeLow
                  
                  
                             INCR gFilesCopied
                  
                          ELSE
                              'it's a directory
                              IF (FileData.cFileName <> ".") AND (FileData.cFileName <> "..") THEN
                              'it's not the current or parent directory, so search it recursively
                                  NextFromFolderName = StartFolderName + FileData.cFileName
                  
                                  INCR gDirectoriesCopied
                  
                                  temp = RecursiveFileList(NextFromFolderName+"\")
                  
                                  FileCount =  temp + FileCount
                  
                              END IF
                  
                          END IF
                  
                      LOOP WHILE FindNextFile(hFindFile,FileData)
                  
                      FindClose(hFindFile)
                  
                  END IF
                  
                  FUNCTION = FileCount
                  
                  DECR gRecursionDepth
                  
                  END FUNCTION

                  Comment


                  • #10
                    And if the limits may be reasonably be reached, would you adopt a different strategy (non-recursive)?
                    Of course.
                    If the code can't do the job then re-write the code.

                    Recursion is a good technique for some tasks, not for others.
                    In this case there was never any chance of the stack overflowing.
                    There was negligible data being passed on the stack and with %MAXPATH being 260 the maximum possible folder depth would be around 100 and 100 recursive calls isn't going to come close to running out of stack space with a default of 1 Megabyte available.

                    Comment


                    • #11
                      Thanks for all the suggestions/ideas. I'm doing household move today, so will get back to code later tonight.

                      @Pierre, I will download and test that code tonight, and will post comparison to what I was using.

                      @Dale, sorry I forgot to include the link to Paul's code in msg#4 of this thread:
                      https://forum.powerbasic.com/forum/u...g/784312-rmdir
                      The only mods I made were mentioned in post #1 here. No DWORDs, just writing output to file.

                      @Paul (#4 above), Are the 23 levels built into the code, or just a reflection of the # of levels of folders on your drive? I have no idea of how many levels of folders I have, but I'm guessing that 23 might not cover me!

                      @Ribeiro, Good point; I will have to check how long the longest path would be. I tend to use long folder and file names, so this is worth checking. However, no Unicode names (that I know of); also worth checking...

                      @Peter, worth checking, but I tend to not crowd lots of files in a folder. However, Windows might, so again, this is worth checking.

                      @Stuart, Definitely not FAT32 - On Win7 Pro with all updates forever...

                      @Paul (#9), Thanks for the revised code; appreciate the addition of gRecursionDepth. I will check it tonight. I will be adding a time string into the output filename, and I may add some of the just-mentioned tests for things like LongestPathName and MostFilesInFolder and IsFileSystemFAT32...

                      I'll post again tonight,
                      Thanks, everyone!
                      -John

                      Comment


                      • #12
                        Basement cleanout stopped for "work break" for my real job, so a quick look at the results from the run of Paul's code from post #9...

                        Looks like it stopped before completing: the csv contains only the start time, and the screen looks like this:


                        Code:
                        gMaxRecursionDepth 1309     gUsedStack =  1000072
                        gMaxRecursionDepth 1310     gUsedStack =  1000836
                        gMaxRecursionDepth 1311     gUsedStack =  1001600
                        gMaxRecursionDepth 1312     gUsedStack =  1002364
                        gMaxRecursionDepth 1313     gUsedStack =  1003128
                        gMaxRecursionDepth 1314     gUsedStack =  1003892
                        gMaxRecursionDepth 1315     gUsedStack =  1004656
                        gMaxRecursionDepth 1316     gUsedStack =  1005420
                        gMaxRecursionDepth 1317     gUsedStack =  1006184
                        gMaxRecursionDepth 1318     gUsedStack =  1006948
                        gMaxRecursionDepth 1319     gUsedStack =  1007712
                        gMaxRecursionDepth 1320     gUsedStack =  1008476
                        gMaxRecursionDepth 1321     gUsedStack =  1009240
                        gMaxRecursionDepth 1322     gUsedStack =  1010004
                        gMaxRecursionDepth 1323     gUsedStack =  1010768
                        gMaxRecursionDepth 1324     gUsedStack =  1011532
                        gMaxRecursionDepth 1325     gUsedStack =  1012296
                        gMaxRecursionDepth 1326     gUsedStack =  1013060
                        gMaxRecursionDepth 1327     gUsedStack =  1013824
                        gMaxRecursionDepth 1328     gUsedStack =  1014588
                        gMaxRecursionDepth 1329     gUsedStack =  1015352
                        gMaxRecursionDepth 1330     gUsedStack =  1016116
                        gMaxRecursionDepth 1331     gUsedStack =  1016880
                        gMaxRecursionDepth 1332     gUsedStack =  1017644
                        gMaxRecursionDepth 1333     gUsedStack =  1018408
                        gMaxRecursionDepth 1334     gUsedStack =  1019172
                        gMaxRecursionDepth 1335     gUsedStack =  1019936
                        gMaxRecursionDepth 1336     gUsedStack =  1020700
                        gMaxRecursionDepth 1337     gUsedStack =  1021464
                        gMaxRecursionDepth 1338     gUsedStack =  1022228
                        gMaxRecursionDepth 1339     gUsedStack =  1022992
                        gMaxRecursionDepth 1340     gUsedStack =  1023756
                        gMaxRecursionDepth 1341     gUsedStack =  1024520
                        gMaxRecursionDepth 1342     gUsedStack =  1025284
                        gMaxRecursionDepth 1343     gUsedStack =  1026048
                        gMaxRecursionDepth 1344     gUsedStack =  1026812
                        gMaxRecursionDepth 1345     gUsedStack =  1027576
                        gMaxRecursionDepth 1346     gUsedStack =  1028340
                        gMaxRecursionDepth 1347     gUsedStack =  1029104
                        gMaxRecursionDepth 1348     gUsedStack =  1029868
                        gMaxRecursionDepth 1349     gUsedStack =  1030632
                        gMaxRecursionDepth 1350     gUsedStack =  1031396
                        gMaxRecursionDepth 1351     gUsedStack =  1032160
                        gMaxRecursionDepth 1352     gUsedStack =  1032924
                        gMaxRecursionDepth 1353     gUsedStack =  1033688
                        
                        D:\ProjCC\testbed - Remove Directory folder RMDIR
                        2019-08-27
                        >
                        (No that's not code; but the code tags use the fixed-width font for better appearance of the listing...) (and those last 3 lines are my "DOS-prompt")

                        If it had run to completion, the console would have been stopped at the WAITKEY$, and not be sitting at the prompt...

                        When I'm witting at the computer and see a crash, I can go to the Temp folder and view the error files.
                        But since I ran it and went downstairs, I didn't see the Windows error box, and so the tmp/xml files are gone...

                        OK, so now I'll run it again with the large memory and increased stack options. I'll check it when I stop for lunch (about an hour) and let you know the results.

                        -John


                        Comment


                        • #13
                          Actually there are two dwords in the type WIN32_FIND_DATA for file size (the variable of that type is FileData).

                          There no reference to nFileSizeHigh or nFileSizeLow, nor even is the variable used at all regarding the file that went over 4GB. So is likely just a coincidence and not cause of the problem.

                          Cheers,
                          Dale

                          Comment


                          • #14
                            Hmm... didn't take but a few minutes (while I'm checking work emails), and here's the result:

                            Code:
                            gMaxRecursionDepth 2596     gUsedStack =  1983340
                            gMaxRecursionDepth 2597     gUsedStack =  1984104
                            gMaxRecursionDepth 2598     gUsedStack =  1984868
                            gMaxRecursionDepth 2599     gUsedStack =  1985632
                            gMaxRecursionDepth 2600     gUsedStack =  1986396
                            gMaxRecursionDepth 2601     gUsedStack =  1987160
                            gMaxRecursionDepth 2602     gUsedStack =  1987924
                            gMaxRecursionDepth 2603     gUsedStack =  1988688
                            gMaxRecursionDepth 2604     gUsedStack =  1989452
                            gMaxRecursionDepth 2605     gUsedStack =  1990216
                            gMaxRecursionDepth 2606     gUsedStack =  1990980
                            gMaxRecursionDepth 2607     gUsedStack =  1991744
                            gMaxRecursionDepth 2608     gUsedStack =  1992508
                            gMaxRecursionDepth 2609     gUsedStack =  1993272
                            gMaxRecursionDepth 2610     gUsedStack =  1994036
                            gMaxRecursionDepth 2611     gUsedStack =  1994800
                            gMaxRecursionDepth 2612     gUsedStack =  1995564
                            gMaxRecursionDepth 2613     gUsedStack =  1996328
                            gMaxRecursionDepth 2614     gUsedStack =  1997092
                            gMaxRecursionDepth 2615     gUsedStack =  1997856
                            gMaxRecursionDepth 2616     gUsedStack =  1998620
                            gMaxRecursionDepth 2617     gUsedStack =  1999384
                            gMaxRecursionDepth 2618     gUsedStack =  2000148
                            gMaxRecursionDepth 2619     gUsedStack =  2000912
                            gMaxRecursionDepth 2620     gUsedStack =  2001676
                            gMaxRecursionDepth 2621     gUsedStack =  2002440
                            gMaxRecursionDepth 2622     gUsedStack =  2003204
                            gMaxRecursionDepth 2623     gUsedStack =  2003968
                            gMaxRecursionDepth 2624     gUsedStack =  2004732
                            gMaxRecursionDepth 2625     gUsedStack =  2005496
                            gMaxRecursionDepth 2626     gUsedStack =  2006260
                            gMaxRecursionDepth 2627     gUsedStack =  2007024
                            gMaxRecursionDepth 2628     gUsedStack =  2007788
                            gMaxRecursionDepth 2629     gUsedStack =  2008552
                            gMaxRecursionDepth 2630     gUsedStack =  2009316
                            gMaxRecursionDepth 2631     gUsedStack =  2010080
                            gMaxRecursionDepth 2632     gUsedStack =  2010844
                            gMaxRecursionDepth 2633     gUsedStack =  2011608
                            gMaxRecursionDepth 2634     gUsedStack =  2012372
                            gMaxRecursionDepth 2635     gUsedStack =  2013136
                            gMaxRecursionDepth 2636     gUsedStack =  2013900
                            gMaxRecursionDepth 2637     gUsedStack =  2014664
                            gMaxRecursionDepth 2638     gUsedStack =  2015428
                            gMaxRecursionDepth 2639     gUsedStack =  2016192
                            gMaxRecursionDepth 2640     gUsedStack =  2016956
                            
                            D:\ProjCC\testbed - Remove Directory folder RMDIR
                            2019-08-27
                            >
                            No Windows error panels, no crash notices, csv file contains only the Starting time...

                            After some emails, maybe I'll try something else.

                            -John

                            Comment


                            • #15

                              @Dale, I see what you're saying...
                              As for the output file, there are no pointers, just PRINT# statements in OUTPUT mode.
                              But these last two runs are using the code in post #9, and we're not writing filenames/paths to the cdv, just the results.
                              We're just not getting there.

                              -John

                              Comment


                              • #16
                                John,
                                "csv file contains only the Starting time..."

                                I REMmed out the write to the csv file to save time during the tests.
                                A directory with 2000+ nested subdirectories just doesn't look right.
                                You need to look at the deeply nested ones and see what the path is.


                                Comment


                                • #17
                                  Originally posted by Paul Dixon View Post
                                  John,
                                  "csv file contains only the Starting time..."

                                  I REMmed out the write to the csv file to save time during the tests.
                                  A directory with 2000+ nested subdirectories just doesn't look right.
                                  You need to look at the deeply nested ones and see what the path is.

                                  That does look improbable. Have you run a Chkdsk?

                                  Comment


                                  • #18
                                    I'll run CHKDSK now...

                                    Also, will PRINT to csv when new folder...

                                    -John

                                    Comment


                                    • #19
                                      Didn't run CHKDSK yet, but by capturing the folder name to the CSV, found a branch with more depth than I realized. I didn't need that anymore, so I used my file manager tool (zTreeWin) to PRUNE the entire branch. The program ran much further, but choked on another branch, so I grafted it to a higher level. Ran the program again, and it completed.

                                      Code:
                                      D:\ProjCC\testbed - Remove Directory folder RMDIR
                                      2019-08-27
                                      >Recursive_w_Depth_Counter.EXE
                                      gInitialStack= 4193832
                                      Start at which folder? (e.g c:\ or d:\ThisFolder\)d:\
                                      gMaxRecursionDepth 1        gUsedStack =  760
                                      gMaxRecursionDepth 2        gUsedStack =  1524
                                      gMaxRecursionDepth 3        gUsedStack =  2288
                                      gMaxRecursionDepth 4        gUsedStack =  3052
                                      gMaxRecursionDepth 5        gUsedStack =  3816
                                      gMaxRecursionDepth 6        gUsedStack =  4580
                                      gMaxRecursionDepth 7        gUsedStack =  5344
                                      gMaxRecursionDepth 8        gUsedStack =  6108
                                      gMaxRecursionDepth 9        gUsedStack =  6872
                                      gMaxRecursionDepth 10       gUsedStack =  7636
                                      gMaxRecursionDepth 11       gUsedStack =  8400
                                      gMaxRecursionDepth 12       gUsedStack =  9164
                                      gMaxRecursionDepth 13       gUsedStack =  9928
                                      gMaxRecursionDepth 14       gUsedStack =  10692
                                      gMaxRecursionDepth 15       gUsedStack =  11456
                                      gMaxRecursionDepth 16       gUsedStack =  12220
                                      gMaxRecursionDepth 17       gUsedStack =  12984
                                      gMaxRecursionDepth 18       gUsedStack =  13748
                                      gMaxRecursionDepth 19       gUsedStack =  14512
                                      gDeepestPath=d:\Users\John\Documents\PERSONAL\Computers\PROGRAMMING\JAVA\docs\gu
                                      ide\security\jaas\spec\com\sun\security\auth\callback\class-use\
                                      Press key to end
                                      
                                      D:\ProjCC\testbed - Remove Directory folder RMDIR
                                      2019-08-27
                                      >
                                      CSV file contains just the folder names (43,955 lines), and is 3.8MB, took 46 seconds to complete with no errors.

                                      (I'll still run CHKDSK, but I won't start it till just before bedtime...) (Also, at that time I'll run the code pointed to by Pierre.)

                                      Was this choking on too-long pathnames?

                                      Checked the max path size - surprised to see (in Jose Roca include set), it's 260... Somewhere along the foggy road, I thought that had been expanded, first to 1K or 4K, then more recently to around 32K!!!

                                      So, I think I answered my own question, yes, it probably choked on a path whose LEN > 260

                                      BUT, now I want to see if I can expand that 260 to something larger, graft that long branch back to where it was and repeat the tests...

                                      Refreshing break from cleaning out the basement!

                                      -John


                                      -John





                                      Comment


                                      • #20
                                        Originally posted by John Montenigro View Post
                                        So, I think I answered my own question, yes, it probably choked on a path whose LEN > 260

                                        BUT, now I want to see if I can expand that 260 to something larger, graft that long branch back to where it was and repeat the tests...

                                        Refreshing break from cleaning out the basement!
                                        See https://docs.microsoft.com/en-us/win.../naming-a-file

                                        In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters.
                                        ...
                                        The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters.
                                        ...
                                        The shell and the file system have different requirements. It is possible to create a path with the Windows API that the shell user interface is not able to interpret properly.
                                        ...

                                        To specify an extended-length path, use the "\\?" prefix. For example, "\\?\D:\very long path".
                                        ...
                                        Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior...

                                        Then see https://docs.microsoft.com/en-us/win...findfirstfilew

                                        In the ANSI version of this function, the name is limited to MAX_PATH characters. To extend this limit to 32,767 wide characters, call the Unicode version of the function and prepend "\?" to the path. For more information, see Naming a File....

                                        What happens if you replace FINDFIRSTFILE with FINDFIRSTFILEW and use the "\\?\..." format?

                                        Or if you rewrite it to use DIR$ instead?

                                        Comment

                                        Working...
                                        X