Announcement

Collapse
No announcement yet.

fragmentation of a file on disk

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fragmentation of a file on disk

    In using several database products that utilize a flat file, the issue of file fragmentation crops up in one form or another. At some point it must be addressed.

    For now I would like to just programatically determine a given file's fragmentation.

    How could this be done?

  • #2
    To the best of my knowledge, you can't get there via conventional means.

    To determine a files frag status, one would have to read the FAT and make calculations from there. m$ and windows doesn't like the hoy-paloy like you and me going there.
    There are no atheists in a fox hole or the morning of a math test.
    If my flag offends you, I'll help you pack.

    Comment


    • #3
      Originally posted by Mel Bishop View Post
      To the best of my knowledge, you can't get there via conventional means.
      Not true. There are API calls to do this. I had a demo VB project in a magazine that created a defragmentor. Search for DeviceIoControl/File Storage/Defragmentation in the Platform SDK

      Comment


      • #4
        >At some point [file fragmentation] must be addressed.

        Why?

        You can always defragment a file yourself by rewriting the whole thing.
        Michael Mattias
        Tal Systems (retired)
        Port Washington WI USA
        [email protected]
        http://www.talsystems.com

        Comment


        • #5
          Knuth,

          Thanks for pointing me in the right direction:
          http://msdn2.microsoft.com/en-us/library/aa906596.aspx
          http://msdn2.microsoft.com/en-us/library/aa364572.aspx

          However, it seems this only works on Win2k
          http://www.powerbasic.com/support/fo...ML/012013.html

          Is there another method?

          Comment


          • #6
            > You can always defragment a file yourself by rewriting the whole thing.

            Are you sure about that? That's not my understanding of disk fragmentation.

            -- Eric
            "Not my circus, not my monkeys."

            Comment


            • #7
              Rewrite=defragment works here as long as you have enough free disk and rewrite the whole file in one shot (One PUT/Writefile).
              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                From what I understand of the MS file system(s), the OS determines where the physical data is stored. If there is enough contiguous blocks, the file is written without fragmentation, but if not, the DOS places files in separate locations and maps the chains. AFAIK, even if you use a single output operation, its still the DOS that maintains the actual placement of the data.
                Software makes Hardware Happen

                Comment


                • #9
                  I always have lots of contiguous space, because I defrag my disks regularly. Maybe that's why it always worked for me.

                  But let's get back to the question... why bother with manual defragmentation of a single file? Regular maintenance (i.e. defrag) will handle that. And I have to believe that in the sequential access mode Windows' caching offsets any theoretical losses cause by file fragmentation. (In random access mode, fragmentation is moot unless the next random access just happens to be in the cache buffer anyway).
                  Michael Mattias
                  Tal Systems (retired)
                  Port Washington WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                  • #10
                    Originally posted by Mike Trader View Post
                    However, it seems this only works on Win2k
                    http://www.powerbasic.com/support/fo...ML/012013.html

                    Is there another method?
                    Please not the subtile "+" in Greg's post:

                    However, FSCTL_GET_RETRIEVAL_POINTERS is Win2K+ only.
                    That means W2K and higher, should work on XP as well, not sure about Vista, though.

                    But I agree with Michael. Why bother? A regular DEFRAG with the OS's defragmentor will take care of this and should be done one a regular basis.

                    Comment


                    • #11
                      yes, doing a defrag will fix defragmentation. The question I am attempting to answer, is WHEN a defrag is required.

                      The file can get quite large, Copying the entire file each time the app launches is one answer, but makes the user wait whether the file is fragmented or not.

                      Comment


                      • #12
                        Originally posted by Mike Trader View Post
                        yes, doing a defrag will fix defragmentation. The question I am attempting to answer, is WHEN a defrag is required.

                        The file can get quite large, Copying the entire file each time the app launches is one answer, but makes the user wait whether the file is fragmented or not.
                        >The question I am attempting to answer, is WHEN a defrag is required

                        The fourth Thursday of each month?

                        Seriously though, I think it was the old Norton Utilities for MS-DOS had a thing which could give you a "percent fragmented" number without actually starting a defrag operation. Maybe there is something for Windows which can do that, too. (The Microsoft defrag just starts running fter showing you a really pretty picture which means nothing).

                        Maybe some 'techie' site like Sourceforge or one of its sisters?

                        I have no clue what value you should be looking for, but this sounds like a reasonable starting point.



                        MCM
                        Michael Mattias
                        Tal Systems (retired)
                        Port Washington WI USA
                        [email protected]
                        http://www.talsystems.com

                        Comment


                        • #13
                          Do I read correctly? You know HOW to "Defragment" (AKA: Start from Scratch)
                          If this is the case then I think from the input, the obvious is being overlooked.

                          I always have lots of contiguous space, because I defrag my disks regularly. Maybe that's why it always worked for me.
                          Would be an example of an "I-T guy" (Regular Maintenance, and etc.)

                          yes, doing a defrag will fix defragmentation. The question I am attempting to answer, is WHEN a defrag is required.
                          Would be an example of Someone worried about when is the best time? (Valid Question....Answer is "It Depends")

                          "The program is slow, and getting slower every day, my system is full, what do I do? I can't loose all my data!!!! (When Asked about backups...the reply is "Ummm---Whats a backup?"
                          Although not really funny, but it is....it is the typical user that has no clue how things work, nor care, until there is a problem, and when there is one....somehow it's YOUR fault

                          Anyways, back to the point.

                          The file can get quite large, Copying the entire file each time the app launches is one answer, but makes the user wait whether the file is fragmented or not.
                          Yep I agree, it would be like other products that call home to check for an update when in reality, an update may only be available every couple years.(When done right like PB Compilers), so whats the point of checking everytime I open the program?

                          I think the overlooked options are to add a couple of options to your program.
                          1. An option to "Defrag Now"
                          2. An option to schedule a defrag (so you can "Set It and Forget it"
                          3. An option (probably in the "Schedule") that you "Defrag" at or above a certain percentage

                          #3 could be the double-edge sword depending if your question really is.
                          • How do I determine % of Fragmentation? (I believe can be found on Sourceforge, and may be worth while porting to PB)
                          • I can determine the % of Fragmentation but what is the best value? (Answer: "It Depends")

                          It is all open to interpretation as to when, why, how....but in the end, its not what you think when and how, but "what does the user want?"...and then if they don't care...you pick what you best feel comfortable with
                          Engineer's Motto: If it aint broke take it apart and fix it

                          "If at 1st you don't succeed... call it version 1.0"

                          "Half of Programming is coding"....."The other 90% is DEBUGGING"

                          "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                          Comment


                          • #14
                            Mike
                            Back in msdos days, i used a norton utilities program to sort a directory of my program files.

                            There was not a current back of the files my files. The computer locked up, i had to reboot. I lost all my files in the directory plus all subdirectories. If you look hard enough in the forum, you may see a quote i use from one of my friends, "Computer experience is directly proportional to the amount of the file ruined".

                            Our company has offices at different locations where the databases are keep at the office's sites. At each location i have partitioned each server and each workstation when using the os windows and placed files into partitions based on how large the files are and how static the files are. So i have at least one partition at each workstation set aside for files that do not change often and where i add a few other files for archiving purposes. I try my best to use drive c at the workstations just programs.

                            I am moving over to linux for file servers and as i understand it, the need to defrag is not as necessary as windows. Frankly i do not understand why, but for now, i will go along with what i have read for now.

                            Where i have not made that move on computers that provide a server service, i do not defrag those computers but every 5 months to 8 months, doing a backup before defraging. The file system is ntfs, thanks to that, i have not lost much data on that file system on location hard drives.
                            Any files on a file server should at least be on a file system with some kind of transaction based file system, not a FAT formated partition.

                            I also keep our major database files on a partition other than the partition used for the operating system for reasons to speed up defraging, backing up files, and if the partition on which the os goes down for some reason, i can place the drive in another machine and access the partition of where my data is located to retrieve the data files.


                            One of the biggest reasons to not defrag often in my view is because of a computer going down during the defrag process.

                            I have started using asuslogics defraging software for defraging this year,
                            It is fast, but it will not compact the drive, it just defragments files.
                            If you have windows xp, there is a nice msdos program to defrag your files and compact them that i run after asuslogics if i want to compact the files.

                            If you are worried about speed on flat database files, where the you are mostly reading the files, then if the file is small enough, it might fit in memory cache. Memory is cheaper now and an upgrade to larger memory may just speed up your programs, but of course if you are doing a lot of rights to flat files that may not help you.

                            I just wanted to pass along what i have been doing from long experience.
                            Last edited by Paul Purvis; 4 Dec 2007, 07:49 PM.
                            p purvis

                            Comment


                            • #15
                              The first step here is to just get the metrics.
                              We can debate all day long when is the right time to defrag or copy the file, but this thread is about is getting the file fragmentation information!

                              I am noticing that database errors are reported after a while form some users that do not defrag. It should be fairly simple to chart errors against fragmentation and go from there...

                              Can someone help with the conversion of the structures. I am not sure how to convert nested structures and the union


                              http://msdn2.microsoft.com/en-us/library/aa364572.aspx
                              http://msdn2.microsoft.com/en-us/library/aa364572.aspx
                              http://msdn2.microsoft.com/en-us/library/aa365521.aspx
                              http://msdn2.microsoft.com/en-us/library/ms684342.aspx

                              Code:
                              TYPE STARTING_VCN_INPUT_BUFFER
                                  StartingVcn AS QUAD
                              END TYPE
                                   
                              
                              TYPE ExtentsType
                                  NextVcn AS QUAD
                                  Lcn     AS QUAD
                              END TYPE
                              
                              TYPE RETRIEVAL_POINTERS_BUFFER
                                  ExtentCount AS DWORD
                                  StartingVcn AS QUAD
                                  Extents(1) AS ExtentsType
                              END TYPE
                              
                              TYPE OVERLAPPED 
                                  Internal AS DWORD
                                  InternalHigh AS DWORD 
                              
                              '  UNION {                 ???????
                              '    struct {
                              '      DWORD Offset;
                              '      DWORD OffsetHigh;
                              '    };
                              '    PVOID POINTER;
                              
                                  hEvent AS DWORD
                              END TYPE
                                
                                            
                              
                              FUNCTION FragInfo( sDrive AS STRING ) 
                                     
                                LOCAL zFile AS ASCIIZ * %MAX_PATH
                                LOCAL hDevice AS DWORD  
                                LOCAL BytesWritten, RetVal AS LONG
                                LOCAL VCNBuff AS STARTING_VCN_INPUT_BUFFER 
                                LOCAL RetBuff AS RETRIEVAL_POINTERS_BUFFER  
                                LOCAL OverLapped AS RETRIEVAL_POINTERS_BUFFER  
                                  
                                  Query.PropertyId  = %STORAGEDEVICEPROPERTY
                                  Query.QueryType   = %PROPERTYSTANDARDQUERY
                                                                                         
                              
                                  zFile = "C:\Test.db3" 
                                  
                                  hDevice = CreateFile(zFile, BYVAL 0&, (%FILE_SHARE_READ OR %FILE_SHARE_WRITE), _
                                                       BYVAL %NULL, %OPEN_EXISTING, BYVAL %NULL, BYVAL %NULL) 
                                  
                                           
                                  IF (hDevice <> %INVALID_HANDLE_VALUE) THEN
                              
                                     RetVal = DeviceIoControl( hDevice, _
                                                               %FSCTL_GET_RETRIEVAL_POINTERS,  _  
                                                               VARPTR(VCNBuff), _       ' INPUT buffer
                                                               SIZEOF(VCNBuff), _       ' SIZE OF INPUT buffer
                                                               VARPTR(RetBuff), _       ' OUTPUT buffer
                                                               SIZEOF(RetBuff), _       ' SIZE OF OUTPUT buffer
                                                               VARPTR(BytesWritten), _  ' number OF bytes returned
                                                               VARPTR(OverLapped) )     ' OVERLAPPED structure  
                               
                              
                                    IF RetVal THEN     
                                        result = (DevDesc.BusType = %BUSTYPEUSB)
                                    ELSE
                                       'error: DeviceIoControl
                                    END IF
                                    CloseHandle hDevice
                                  ELSE
                                    'error: hDevice = %INVALID_HANDLE_VALUE
                                  END IF  
                              
                                FUNCTION = RetVal
                                  
                              END FUNCTION
                              Last edited by Mike Trader; 5 Dec 2007, 03:00 AM.

                              Comment


                              • #16
                                File Frag

                                Why not write out a suitable large file up front and then add your data to it unitil it gets too large then write a new one out.....
                                Don't have to worry about defraggin it since you dont actually rewrite the file, you only append to it. What I mean is you have a 1 gig file with zeros or ones, add your real data starting at the front of the file and append to that data until you fill it up so to speak. Store the data end as a number however you like then just read data from the file to that point.
                                Warped by the rain, Driven by the snow...

                                jimatluv2rescue.com

                                Comment


                                • #17
                                  Originally posted by Michael Mattias View Post
                                  (The Microsoft defrag just starts running fter showing you a really pretty picture which means nothing).
                                  Ehrm...no, it doesn't. At least not if you click "Verify". It than tells you if you should start a defragmentation or not.

                                  If you click "Defrag" OTOH, it actually does what you told it to do...surprising, eh?

                                  I would strongly suggest to STAY AWAY from defragmentation within your own applications (until it's a disk defragmentor). Imagine every software would provide such an option and your user's would activate this feature in every application and they all would start defragmenting ... the system would immediately slow down enourmosly ... not to mention that the processes would lock each other's files so those can't be defragmented. What a mess.

                                  Defragmentation is a job of a dedicated *single* application! There're so many unknowns involved (think of RAIDS, Flash Disk, Power interruptions etc.) that you risk more (heavy data loss) than you might gain.
                                  Last edited by Knuth Konrad; 5 Dec 2007, 06:38 AM.

                                  Comment


                                  • #18
                                    I have been using DiskKeeper Pro, which defrags during idle time, since February so fragmentation ceased to be an issue then.

                                    However, I did spot Defraggler recently which allows the defragging of one or more files up to the whole shooting match. Needless to say I have not tried it and it should be noted that it is still beta. It is by the same outfit as Crap Cleaner and is free.

                                    Comment


                                    • #19
                                      Jim,
                                      That would mean re-writing all my database functionality for the this purpose. It is much simpler to just copy the file.

                                      Knuth,
                                      >I would strongly suggest to STAY AWAY from defragmentation within your own applications
                                      I can just copy the flat databse file.

                                      David,
                                      Defraggler is cool, but I probably cant ship it and install it with my app. I really need to just figure out how to determine a files fragmentation...

                                      Comment


                                      • #20
                                        > I really need to just figure out how to determine a files fragmentation

                                        >>In using several database products that utilize a flat file, the issue of file >>fragmentation crops up in one form or another

                                        Are you talking about internal fragmentation.. eg, "gaps" in the physical file where no records exist, having been deleted; or Windows fragmentation, where the file is not currently located in contiguous sectors (clusters?)?

                                        If the former, Count (valid records)*recordsize / Filesize; if the latter, don't bother as previously suggested by several.

                                        Except in the former case, your figure will be off due to any overhead requirements of the "database product" in use.. such figure more than likely proprietary and therefore unavailable to you, so you may as well ignore it.

                                        Pray, what is the question to which you believe the answer is some kind of programmatic defragmentation, with which you require some assistance?
                                        Michael Mattias
                                        Tal Systems (retired)
                                        Port Washington WI USA
                                        [email protected]
                                        http://www.talsystems.com

                                        Comment

                                        Working...
                                        X