Announcement

Collapse
No announcement yet.

Improving the concatenation speed of strings

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Dan is building a bitmap from scratch based on radar data.
    The first part full code he posted is dib header information. The rest is assigning a value to bytes with padding as appropriate.

    For his purpose using a string is highly inefficient. A byte array of dimensions * 3 is much more efficient.
    Walt Decker

    Comment


    • #22
      This should be many times faster.
      Pointers, byte array or check-out StringBuilder

      Here is a byte array with a BINARY JOIN at the end for fun.
      Everything within CASE statement(s).
      Code:
      FUNCTION PBMAIN AS LONG
      
       LOCAL s, sFileName AS STRING
      
       sFileName = "junk.txt"
       OPEN sFileName FOR OUTPUT AS #1
       s = CHR$(0 TO 25)
       PRINT #1, s;
       CLOSE
      
       ? Test(sFileName),,"Binary Join Results"  'ABCDEFGHIJKLMNOPQRSTUVWXYZ
      
      END FUNCTION
      
      
      
      FUNCTION test(sFileName AS STRING) AS STRING
      
       LOCAL bytenum AS LONG
       LOCAL bytes AS LONG
       LOCAL b     AS LONG
       LOCAL hFile AS LONG
      
       hFile = FREEFILE
       OPEN sFileName FOR BINARY AS #hFile
       bytes = LOF(hFile)
       REDIM bArray(1 TO bytes) AS BYTE
       GET #hFile,1,bArray()
       CLOSE #hFile
      
       FOR bytenum = 1 TO bytes
        b = bArray(bytenum)
        SELECT CASE AS LONG b
         CASE 0 TO 25:bArray(bytenum)= b+65
         CASE 255
        END SELECT
       NEXT
      
       FUNCTION = JOIN$(bArray(),BINARY)
      
      END FUNCTION
      https://www.tesla.com/roadster

      Comment


      • #23
        As previousy mentioned, try PB "stringbuilder" object.

        Another way might be to simply handle your data as bytes, write to disk (use a LEN= big enough on the OPEN and it's really fast) and don't convert to "STRING" until the string data are complete.
        ]
        In other words, if concatenating strings is too slow, don't concatenate strings!

        Comment


        • #24
          MCM
          Indid not read the OP.
          If he is going to do that. I suggest also block writes in no less than than 4K bytes and preferably in 64k bytes to match 4K or larger sectors found sectors found on today's SSD drives. Networking is so slow with less than 64kbytes too. You never really know what is in store tomnorow and thinking of networking when it comes to storage is best though in today's programming
          p purvis

          Comment


          • #25
            Paul, my point was more "do not handle every three bytes as a string, handle it as three bytes."

            And FWIW, for a temp file like this, don't even worry about networking, just use a local temp file.

            Code:
            GetTempFilename()  TO TheFile
            OPEN "TheFile" FOR BINARY AS hFile LEN-=enough BASE=0
            DO
                ReceiveData                   ' three byte packet? Something else?
                Put #hFile, theData         ' write data to file without converting to STRING first
            LOOP UNTIL  Thedata = "last of one image" or "Contains EOF marker"
            SEEK hFile, 0                    ' back to start
            GET$ hFile,LOF(hFile) To TheStringvariable    ' load total data to a single STRING variable
            Since strings are (relatively) slow, I never even use a STRING data type until I need all the data as a string.

            World's second-worst reason to do something a certain way: " Because we've always done it that way."

            World's worst reason to do something a certain way " I read it on the internet."

            MCM

            Comment


            • #26
              Dean,

              In post 1 you mention both display and file in same sentence. Then 3 bytes (for RGB). The question is which? (display or file)

              For file (24 bit BMP) there is the file header then 3 bytes per pixel.

              But for display a DIB is 8 bytes for size ( X and Y) then 4 bytes per pixel.

              The suggested speed-up methods can be "adapted" to either (3 or 4), but you've got to know which.

              See GRAPHIC GET BITS, or GRAPHIC SET BITS Help; and RGB or BGR Help

              If you build the display DIB (and put it in a grahic), then GRAPHIC SAVE will convert to 3 byte format and add header for you.

              Cheers,
              Dale

              Comment


              • #27
                MM,

                I'm not sure the LEN statement is an optimization with OPEN for BINARY LEN = enough (post #23)
                I use OPEN for BINARY (see post #22), but never use LEN.
                It sounds to me like LEN is an optimization for sequential and random access files.
                Specifies the size of each record of a random access file. The default record length is 128 if not specified. If record_size is specified for a sequential file, it instructs PowerBASIC to use internal buffering to improve I/O performance. A random access file is limited to 32768 bytes per record, to ensure consistent behavior across all Win32 platforms.
                https://www.tesla.com/roadster

                Comment


                • #28
                  ((why let Help stand in the way of "shooting from the hip?"))
                  Dale

                  Comment


                  • #29
                    I'm not sure the LEN statement is an optimization with OPEN for BINARY LEN = enough (post #23)
                    Actuallly, LEN works not just for RANDOM and BINARY access, but for the sequential access modes as well. I'm not sure you can find it in the help but that post was made by Bob Zale many years ago and has worked well for me for a long time.

                    Hmmmm.... from 10x help for OPEN.... ...

                    record_size Specifies the size of each record of a random access file. The default record length is 128 if not specified.If record_size is specified for a sequential file, it instructs PowerBASIC to use internal buffering to improve I/O performance. A random access file is limited to 32768 bytes per record, to ensure consistent behavior across all Win32 platforms.
                    (Bold mine)

                    MCM

                    Comment


                    • #30
                      I believe you, but a BINARY file is of file type Binary.
                      I'll have to try to see if it improves performance.

                      Thanks, Stuart for running the tests with different LEN's and file names.
                      see post # 33 below
                      Conclusion- established empirically: LEN makes absolutely no difference in the speed of writing large files OPENed for BINARY.
                      Last edited by Mike Doty; 15 Apr 2018, 01:10 AM.
                      https://www.tesla.com/roadster

                      Comment


                      • #31
                        MCM,

                        I see "random access" in the quote from Help.

                        I see "sequential file" (and your bolding) in the quote from help.

                        I do not see any mention of "binary mode" or "binary access" in the quote. The question was on use of "LEN = reclen" with OPEN FOR BINARY.

                        So, why bring up sequential mode which we could already see? You say "Actuallly, LEN works not just for RANDOM and BINARY access, ..." (now I added the bold)

                        I'll re-ask - "What does 'LEN =' do for OPEN FOR BINARY?"
                        Dale

                        Comment


                        • #32
                          Originally posted by Michael Mattias View Post

                          Actuallly, LEN works not just for RANDOM and BINARY access, but for the sequential access modes as well. I'm not sure you can find it in the help but that post was made by Bob Zale many years ago and has worked well for me for a long time.

                          Hmmmm.... from 10x help for OPEN.... ...


                          (Bold mine)

                          MCM
                          You've been assuming/suggesting/stating this for 18 years now. (Your stance has hardened over time).

                          https://forum.powerbasic.com/forum/u...237#post210237

                          https://forum.powerbasic.com/forum/u...649#post509649

                          https://forum.powerbasic.com/forum/u...690#post509690

                          Yet I've never seen anyone else make the same claim and I've seen tests which show that it has no effect.

                          https://forum.powerbasic.com/forum/u...705#post509705

                          Comment


                          • #33
                            I've just "compiled and run" this code dozens of times:

                            Code:
                            #COMPILE EXE
                            #DIM ALL
                            
                            FUNCTION PBMAIN () AS LONG
                                LOCAL Cyclecount AS QUAD
                                LOCAL s AS STRING
                                s=STRING$(50000000,"a")
                                 TIX CycleCount
                                 OPEN "i.txt" FOR BINARY AS #1 LEN = 1 ' 'change filename and LEN each time you "Compile and Run"
                                PUT$ #1, s
                                CLOSE #1
                                TIX END CycleCount
                                ? STR$(cyclecount)
                            END FUNCTION
                            changing the filename for each run and varying LEN between 1, 32648, 99999 and 999999.

                            There was NO correlation between the size of LEN and the number of TIX it took to write the file. On average LEN =1 gave about the same time as LEN = 999999

                            Comment


                            • #34
                              Just for fun I tried 0 and -1 for LEN in the code above. No change in TIX.

                              I upped the string size from 50MB to 100MB and re-ran the above code multiple times with LENs varying between 1 and 9999999. Times on my laptop over a lot more tests ranged between 2.2 and 2.5 billion TIX. LENs of both 1 and 9999999 frequently returning between 2.2 and 2.3 billion TIX, Only a few runs exceeded 2.4 billion TIX and they were just as likely to be with LEN= 99999999 as with smaller LENs.

                              Conclusion- established empirically: LEN makes absolutely no difference in the speed of writing large files OPENed for BINARY.

                              Comment


                              • #35
                                Originally posted by Michael Mattias View Post

                                Actuallly, LEN works not just for RANDOM and BINARY access, but for the sequential access modes as well. I'm not sure you can find it in the help but that post was made by Bob Zale many years ago and has worked well for me for a long time.

                                Hmmmm.... from 10x help for OPEN.... ...


                                (Bold mine)

                                MCM
                                You bolded the section. "for sequential files" as though that was relevant to your assertion. Did you not read the table beside mode on the same Help page which explains that "sequential files" are files OPENed FOR INPUT or FOR OUTPUT, as opposed to FOR RANDOM and FOR BINARY?

                                ISTM logical that Help doesn't mention LEN with BINARY because it doesn't apply.

                                Comment


                                • #36
                                  Sorry... I did not realize this thread still was active. Dale, I am doing both display and file. I will give my background here. I have graphics in our 2D display menu where we can shown as many radar profiles in one graphic dialog as one wants. I developed this in Perfect Sync software years ago and am still updating this with features all these years. The display of the profiles would be delayed since I was originally drawining each individual pixel to the screen and would do my own stretching/decimation based on the radar profiles sizes to the screen. It could take up to 10 -20 seconds to draw a dialog with many radar profile and this was becoming to painstaking. So, I decided to experiment with making each radar profile a bitmap and then pasting to its location on the screen. And with everyone's help here we went from 10-20 seconds to 1 second or less in some instances... This new optimization will keep our software competitive in terms of graphic time... The idea for "pasting" bitmaps came from our more advanced 3D graphics built in OpenGL and working with texture methods to improve speed. So I did not know if the similar - but not identical - idea could work with our Perfect Sync routines which has an option to display a bitmap in an area on the screen. And it did work... Only difference is that in OpenGl with blending we have pixel interpolation and in Perfect Sync we get square pixels above the native resolution. It is not a problem since in general our data densities are usually larger than displays on the screen...

                                  Comment


                                  • #37
                                    Hi Dean,

                                    If you are still around. Which compiler are you using? What form do the GPR binary records take? Are they RGB values in a disk file?

                                    I wondered if you could utilize PB's Graphic Statements to simply; Create a memory bitmap, load it with Graphic Set Bits b$ then save as a bitmap file with Graphic Save "temp99.bmp"?
                                    Rgds, Dave

                                    Comment


                                    • #38
                                      Hi Dave, I am using PB 10.03 - waiting for PB 11 - 64!!!! The radar data is not rgb data it is either 16 bit binary or 32 bit binary and then I convert that to color based on other information such as the color transform and color table.. But I can have a look at PB graphic here to see if it would be more effiicient.... I have a lot of code in Perfect Sync but last year I actually ventured into PB graphics to write a simple display for some new feature in the software and it worked fine. I realize that the memory bitmap I could probably just paste directly to portions of the screen without having to actually write out a bmp and then paste it which could save some time. In Perfect Sync the stretchimage option can only read from a file.... Thanks for the hint...

                                      Comment


                                      • #39
                                        Dean,
                                        If you will allow me some liberty.
                                        I think you can Poke those values into a string and transfer up to 4 bytes(quad) at a time with the POKE statement.
                                        p purvis

                                        Comment


                                        • #40
                                          Yet I've never seen anyone else make the same claim [LEN= on OPEN supported for and can help improve perfromance of sequential operations ] and I've seen tests which show that it has no effect. ... [more]
                                          You know, sir, I think I'll just stand on my record and let you try it and make your own decisions.

                                          MCM

                                          Comment

                                          Working...
                                          X