Announcement

Collapse
No announcement yet.

What is the fastest PB command to read data at a position in a large string

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #81
    Another WOW... i missed that... I just removed the dialog doevents, and the processing finished in 10seconds!!! I guess at that speed it is almost not needed to inform the % of processing. Would a progressbar be faster? I may check.... Thanks for that tip...Thats is the big one. I used dialog doevents in all of our 2D processing and have a similar dialog showing the % of processing done and this is the bottleneck - dialog doevents as well!! Of course for our multithread processing which is an option now, we just use progressbars and things are very fast.

    Comment


    • #82
      Paul, i just moved the dialog doevents up one tier to the main z for loop and the processing speed is like 10.5 seconds and the % processing dialog still performs.... Thanks for that.... we are now at about 34 times faster than the original processing which took 340 seconds!!!

      Comment


      • #83
        Now you need to re-read this thread and add in the other speed improvements that will now be significant.
        .. and I'm sure you could get it down to around 1 second.. .

        Comment


        • #84
          Paul, thank you and the others for taking the time to provide valuable insight on the solution!!!

          Comment


          • #85
            Just thought i would share the results of a 3D migration on a site in Italy. The efforts of everyone have made possible this: The following dialog and image were placed on our FB and LinkedIn sites. Thanks again:

            GPR-SLICE new option - increased speed of processing for 3D migration!

            With help from the PowerBasic forums I was able to receive assistance from more seasoned programmers on the bottlenecks in 3D migration processing for our current software version. Several very benign software code lines were flagged as hampering speedier calculations and these were just updated and repaired on the website update.

            3D migration is really reserved for those surveyors that have collected dense datasets, usually with a multichannel system, and have generated a full 3D pulse volume. For 3D migration we do not use Hilbert transformed volumes as constructive/destructive interference from migrating pulses in the volume is a requirement. 3D migration for concrete surveying using single channel equipment can also be done when sufficient areal coverage is collected.

            With the new update for 3D migration, on one test site an increased speed of over 30 times was measured. The example data was collected with an IDS Stream X system and the volume generated at 10cm in the x and y directions. The Zscan and and Xscan through the volume where a clear hyperbola is seen in the unmigrated volume, is migrated very effectively via 3D migration.
            Attached Files

            Comment


            • #86
              There is one other bottleneck then only looks serious when reading in a large 3D volume. This initial statement of reading in a binary data file that is 508mb and getting it into a single character string takes about 35 seconds on my computer. The data are written bottom up and i have to first read each z plane and then append together into a single character string. Is there a faster way to do the following:

              ' reverse the volume definition which is written bottom up.

              temp$=""
              file3d$=""
              FOR i=ngz-1 TO 0 STEP -1
              SEEK#1,i*ngxngy2+1
              GET$#1,ngxngy2,temp$
              file3d$=file3d$+temp$
              NEXT i

              Comment


              • #87
                Don't read files backwards.
                Files are stored sequentially on disk.
                If you have a mechanical hard drive then the next sector is next on the disk so can be read immediately but the previous sector has just passed and you need to wait for a full revolution of the disk for it to come around to the heads again.

                The way to do what you want is to read the entire file into memory in one go using GET or GET$ and THEN re-order it while it's in memory.
                You can read a 500MB file in about 5 seconds on an average hard drive, maybe in 1 second on an SSD drive.
                Re-ordering it should take less than that.


                Don't repeatedly append strings.
                Each time you append to a string the OS must create a new string of the new size, copy the old string, add the appended data then delete the old string.
                It's very slow.
                Instead, create a single string of the total size at the start then fill that string with the data.

                Comment


                • #88
                  Paul, thanks for that education. I never knew these details about the hard disk and the reading issues....With the following code the 508mb file reads and gets reordered in about 2 seconds!!!

                  GET$#1,LOF(1),temp$
                  CLOSE#1
                  file3d$=SPACE$(ngx*ngy*ngz*2)
                  FOR i=1 TO ngz
                  MID$(file3d$,(ngz-i)*ngxngy2+1,ngxngy2)=MID$(temp$,(i-1)*ngxngy2+1,ngxngy2)
                  NEXT i
                  temp$=""

                  Comment


                  • #89
                    Just be careful not to exceed the memory available.
                    500MBytes is manageable. 1000MBytes might be too much as you only have 2GBytes of memory available for everything.

                    Comment


                    • #90
                      Dean,
                      It would be easier to test using a few bytes and optimize the MID$ loop.
                      Something generic without the ngx,ngy,ngz that would be reused to replace a string with another string.

                      Code:
                      #DIM ALL
                      %ShowAll=1
                      
                      FUNCTION PBMAIN () AS LONG
                       LOCAL sOldBuffer,sNewBuffer AS STRING
                       LOCAL recnum,ngx,ngy,ngz,reclen AS LONG
                       sOldBuffer = CHR$(65 TO 90)
                       ngx = 2
                       ngy = 2
                       ngz = 2    'records
                       reclen = 1 'unknown
                       sNewBuffer=SPACE$(ngx*ngy*ngz*2)
                      
                       FOR recnum=1 TO ngz
                        'optimize this line
                        MID$(sNewBuffer,(ngz-recnum)*reclen+1,reclen)=MID$(sOldBuffer,(recnum-1)*reclen+1,reclen)
                      
                      
                        IF %ShowAll THEN
                         ? USING$("sNewBuffer=&  position=#  length=#",sNewBuffer, (recnum-1)*reclen+1 ,reclen) + $CR +_
                           USING$("Mid OldBuffer=&  position=#  length=#",MID$(sOldBuffer,(recnum-1)*reclen+1,reclen), (recnum-1)*reclen+1 ,reclen),,"OldBuffer"
                        END IF
                      
                       NEXT recnum
                       ? sNewBuffer,,"Done"
                       sOldBuffer=""
                      END FUNCTION
                      Use [ code ] [ /code ] in source code before pasting to keep indentation

                      Comment


                      • #91
                        Originally posted by dean goodman View Post
                        Paul, thanks for that education. I never knew these details about the hard disk and the reading issues....With the following code the 508mb file reads and gets reordered in about 2 seconds!!!

                        [code]
                        GET$#1,LOF(1),temp$
                        CLOSE#1
                        file3d$=SPACE$(ngx*ngy*ngz*2)
                        FOR i=1 TO ngz
                        MID$(file3d$,(ngz-i)*ngxngy2+1,ngxngy2)=MID$(temp$,(i-1)*ngxngy2+1,ngxngy2)
                        NEXT i
                        temp$=""
                        [c/ode]
                        Again you are doing doing unnecessary slow MID$() operations and a huge number of duplicate calculations in a simple sequential loop.

                        If I understand you correctly, you need to reverse the string before you do your little endian word resolution.

                        You don't need a separate string, you can swap bytes in situ.

                        That makes the original problem much easier to solve.The code below does both in one simple loop. It reverses and "little end"s the words in the string.

                        Code:
                        #COMPILE EXE
                        #DIM ALL
                        
                        FUNCTION PBMAIN () AS LONG
                            LOCAL file3d AS STRING
                            LOCAL q AS QUAD
                            file3d = "A" & STRING$(50000000,"B") & "..." & STRING$(50000000,"C") & "D"
                            ? LEFT$(file3d ,4)  & "..." &  RIGHT$(file3d ,4)
                            TIX q
                            ReverseString file3d
                            TIX END q
                            ? LEFT$(file3d ,4)  & "..." &  RIGHT$(file3d ,4)
                            ? "Reversal took " & STR$(q) & " tix"
                        END FUNCTION
                        
                        FUNCTION ReverseString (s AS STRING) AS LONG
                            LOCAL x,lngEnd AS LONG
                            LOCAL p AS BYTE PTR
                            p = STRPTR(s)
                            lngEnd = LEN(s) -1
                            FOR x = 0 TO FIX(LEN(s)/2) -1
                               SWAP @p[x], @p[lngEnd - x]
                            NEXT
                        END FUNCTION

                        Comment


                        • #92
                          Mike and Stuart, thank you for those other suggestions to speed up that reordering of a string. We are reaching "critical mass" in our data volumes at 500mb volume. And we have had to scale the volumes down to about 500mb to consistently work but our data volumes need to be 2-10 gigabytes or more data as collection with multichannel equipment is on the rise. 500Mb volumes are starting to look like small volumes and unless we can get PB 64 in the next year or two, the viability of our software into the future will be limited to small survey sites and decreasing market share. We solved immediate issues with breaking large sites in grid blocks...but that has not been fun...We need to operate on single volume for the future and the need for PB 64... Will be forced to naturally retire if it dont get done though.....Again, thanks for everyone's expertise in speeding up our code!

                          Comment


                          • #93
                            Dean,
                            If you send an email to support @ opttech.com or call them you may find a third party solution. I would attach post #90 and post #91.
                            It can read, sort and put back to disk files much faster than I could do with PowerBASIC.
                            There is a post somewhere on the BBS of differents ways to process and sort a 100-megabyte file.
                            It took minutes for some and hours for others and failed. OptTech blew everything out of the water.
                            Opttech doesn't just sort. It can transform files.
                            Since there is no sort it would be even faster.

                            www.opttech.com
                            Use [ code ] [ /code ] in source code before pasting to keep indentation

                            Comment


                            • #94
                              Thanks Mike... All in all we got the initial read and reordering of the 500mb volume done in 2 seconds...That is good enough considering that the 3D migration using a 20 degree angle juncture (around the hyperboloid) takes 23.5 minutes to process!! Those are really good times compared to the previous code design that was an overnite process and took 9 hours and made 3D migration of the data volume not a viable everyday working process...

                              Comment


                              • #95
                                >We need to operate on single volume for the future and the need for PB 64.
                                I based it on these comments. Opttech has no file limit and is normally 1-line of code.
                                Use [ code ] [ /code ] in source code before pasting to keep indentation

                                Comment


                                • #96
                                  Originally posted by dean goodman View Post
                                  Mike and Stuart, thank you for those other suggestions to speed up that reordering of a string. We are reaching "critical mass" in our data volumes at 500mb volume. And we have had to scale the volumes down to about 500mb to consistently work but our data volumes need to be 2-10 gigabytes or more data as collection with multichannel equipment is on the rise. 500Mb volumes are starting to look like small volumes and unless we can get PB 64 in the next year or two, the viability of our software into the future will be limited to small survey sites and decreasing market share. We solved immediate issues with breaking large sites in grid blocks...but that has not been fun...We need to operate on single volume for the future and the need for PB 64... Will be forced to naturally retire if it dont get done though.....Again, thanks for everyone's expertise in speeding up our code!
                                  Ok, we'e all all spent a lot of time optimising your commercial application for you. Normally I'd be charging a consultation fee given how much it's improved the value of your commercial application .

                                  So here's my last freebie for now.

                                  You are not limited to 500MB, you are limited to chunks of data of about that size. That doesn't mean you can't handle very large files.

                                  Here's a suggestion to handle handle a big file, work outward from the centre in pairs of chunks.

                                  Take your 10GB file.
                                  Open it for BINARY.
                                  Get its exact length and load its mid point into qdMidPoint

                                  1 Get a 100MB chunk below the midpoint:seek to (qdidpoint - 100000001) and read 100000000 bytes into strA,
                                  2 Get the corresponding 100MB chunk above the mid point: seek to qMidpoint and read 100000000 bytes into strB
                                  3. Use code in post #91 to swap first byte of strA with last byte of strB, second byte of strA with second last byte of StrB etc
                                  4. Write strA and strB back into the file at the appropriate locations.

                                  Repeat steps 1 to 4, processing (qMidpoint - 200000001) to (qMidPoint - 100000000) and qidPoint +100000001) to (qMidPoint + 200000000) etc until you'e processed the whole file working outwards from the center in 100MB chunks until your final two chunks which will be sized according to how much data is still unprocessed.

                                  The 100MB chunks is just a suggestion, the major bottleneck will be reading/writing the data; a bit of experimentation will reveal the optimum chunk size.

                                  Note: You could also work from the ends of the file to the centre doing the same thing.

                                  You now have the original file reversed and little endian. Presumably, it's now in a sequence that you can process. If not, then a similar process of moving chunks around could be used.

                                  Comment


                                  • #97
                                    Originally posted by dean goodman View Post
                                    Thanks Mike... All in all we got the initial read and reordering of the 500mb volume done in 2 seconds...That is good enough considering that the 3D migration using a 20 degree angle juncture (around the hyperboloid) takes 23.5 minutes to process!! Those are really good times compared to the previous code design that was an overnite process and took 9 hours and made 3D migration of the data volume not a viable everyday working process...
                                    Given the examples of code we've seen so far, what's the betting that the 23.5 minutes to progress the migration could also be reduced considerably

                                    Comment


                                    • #98
                                      'Without the little endian?
                                      '
                                      Code:
                                      FUNCTION PBMAIN () AS LONG
                                       LOCAL file3d AS STRING, q AS QUAD
                                       file3d = "A" & STRING$(50000000,"B") & "..." & STRING$(50000000,"C") & "D"
                                       TIX q
                                       file3d=STRREVERSE$(file3d) '  752,001,178
                                       'ReverseString(file3d)     '1,102,684,046
                                       TIX END q
                                        ? USING$("Reversal took #, tix",q)
                                       END FUNCTION
                                      
                                      FUNCTION ReverseString (s AS STRING) AS LONG
                                       LOCAL x,lngEnd AS LONG
                                       LOCAL p AS BYTE PTR
                                       p = STRPTR(s)
                                       lngEnd = LEN(s) -1
                                       FOR x = 0 TO FIX(LEN(s)/2) -1
                                        SWAP @p[x], @p[lngEnd - x]
                                       NEXT
                                      END FUNCTION'
                                      Use [ code ] [ /code ] in source code before pasting to keep indentation

                                      Comment


                                      • #99
                                        Originally posted by Mike Doty View Post
                                        'Without the little endian?
                                        DOh! Talk about re-inventing the wheel. Guess I had tunnel vision after all the word reversal logic. You are correct of course, a simple STRREVERSE$ does the job perfectly.

                                        Comment


                                        • Hi Everyone.... i actually thought this thread was finished but there still more good suggestions. I decided to extract the "final" code into a separate *.bas file that you can run and see the changes that i included based on all the suggestions. The set migration is for 10 degrees around the hyperboloid which gives a nicer migrations. On my computer this stand alone needs about 28seconds to run (which is really good). If you have more suggestions please have a look. I guess everyone is at home sheltering in programming! Thanks for your expertise...

                                          If you want some compensation let me know your rates...honestly (please send your invoices to [email protected].)..I think we need to start back with Bob Zale as his compiler is still so amazing. There are so many function I still will never use that as i grow into the software i am seeing were so many advancements. Also, Stuart thanks for the outline of large volumes, we "solved" our immediate issues by automatically creating hundreds of 3D volume blocks over large areas (3-10 kilometers) and then quickly place them all back into a 2D display in OpenGL very effectively...

                                          However, a note to Drake Software.... If you see - and i imagine many have - that YouTube videos showing the popularity of programming languages from about 1970 to the present day ( https://www.youtube.com/watch?v=Og847HVwRSI ) and then in the last couple years Python has taken over all languages and has put even c++ and c and Java to rest....There is always the opportunity that a PB 64 and some youth using the software, could help promote it to the forefront in a very short time... just thinking.... I would pay for a PB 64 compiler and subscribe to it for $100/month and that would be inexpensive for us...Perhaps Drake needs some more financial incentives from the dwindling masses to make this happen...
                                          Attached Files

                                          Comment

                                          Working...
                                          X