Announcement

Collapse
No announcement yet.

Substrings from Offsets in a string

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Substrings from Offsets in a string

    I've got a situation where I've got a long string (dynamic) to which I've been passed a series of offset/length pairs to substrings within the long string.

    What's the best way to get at these substrings so they can be passed to other functions which themselves expect strings. In my case they're input to a GRAPHIC PRINT statement.

    MID$(longstring, offset, length) doesn't strike me as overly efficient (I have LOTS of these to do).

    Any other suggestions?

  • #2
    Since you have no sample code of what you are doing.

    A shot in the dark, would be to use an array ?
    A dozen what.

    Comment


    • #3
      Actually MID$ is very efficient howver depending on how the substrings are delineated you might consider PARSE

      Comment


      • #4
        Poke$

        There is always the Poke$ statement, if you feel it might be more efficient to avoid the memory allocations involved with dynamic strings and the Mid$ statement/function. It would be especially useful if you knew for example that none of the strings were longer than some specific amount. Then perhaps you could get away with just oe allocation?
        Fred
        "fharris"+Chr$(64)+"evenlink"+Chr$(46)+"com"

        Comment


        • #5
          FIELDs

          You could look up the FIELD statement and FIELD STRINGS in the help system. New in version 8.04
          This is the kind of thing they were (re)invented for.

          regards, Ian
          :) IRC :)

          Comment


          • #6
            Given that you have the starting positions and length, the only practical solution is to use the MID$ function to extract your sub-strings. The MID$ function is very efficient.
            Sincerely,

            Steve Rossell
            PowerBASIC Staff

            Comment


            • #7
              Code:
              #COMPILE EXE
              #DIM ALL
              
              FUNCTION PBMAIN () AS LONG
                  LOCAL origString, s2 AS STRING, ii, ii2, osPtr AS LONG
                  DIM offset(1000) AS LONG, sLen(1000) AS LONG, subStr(20) AS STRING
                  RANDOMIZE
                  origString = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZnowivesaidmyabcswontyou"
                  
                  'let's get a quick 1000 random offsets/len pairs:
                  FOR ii = 1 TO 1000
                     offset(ii) = RND(1, 41)
                     sLen(ii)   = RND(1, 41)
                  NEXT
                  
                  'now show 15 of them randomly picked
                  osPtr = STRPTR(origString)
                  FOR ii = 1 TO 15
                     ii2 = RND(1, 1000)
                     subStr(ii) = PEEK$(osPtr + offset(ii2), sLen(ii2))
                     ? """" & subStr(ii) & """ is the string from offset/len pair:" & STR$(ii2)
                  NEXT
                  
                  ? "done"
              END FUNCTION

              Comment


              • #8
                I did some timings and MID$ was nearly the same speed as the PEEK$ function above. Looks like you already had the answer:
                Code:
                       subStr(ii) = MID$(origString, offset(ii), sLen(ii)) 'or offset(ii) + 1 depending on 1 or zero based
                btw George, if you've already done the $MID technique and still need more speed, you could convert to an integer rather than string read. This would almost certainly give a significant speed boost, especially if the substring lengths are divisible by 4.
                Last edited by John Gleason; 4 Apr 2008, 03:25 PM. Reason: added btw...

                Comment


                • #9
                  I did some timings and MID$ was nearly the same speed as the PEEK$ function above
                  Um, that's because they are identical in what they do when you think about it:

                  Start at beginning of string
                  Advance "X" characters
                  PEEK the next "N" bytes into a new string.

                  MID$ does, however, offer better error checking, since it will never try to PEEK beyond the end of the string, and PEEK will, well, PEEK as far from the starting point as you tell it to, even if the length value puts you beyond the end of the string (i.e, into The Forbidden Memory Zone Where Your Likely Result is a General Protection Fault).

                  MCM
                  Michael Mattias
                  Tal Systems (retired)
                  Port Washington WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                  • #10
                    Originally posted by Steve Rossell View Post
                    the only practical solution is to use ... MID$ function is very efficient.
                    Steve, other fast techniques would include using a byte pointer, and using MoveMemory, is MID$ faster than these?

                    Comment


                    • #11
                      other fast techniques would include using a byte pointer, and using MoveMemory, is MID$ faster than these?
                      "IF" you are creating a string of the bytes required, the RELATIVE difference isn't worth screwing around with because most of the time is spent copying the bytes to the new string anyway.

                      Code not shown.
                      Michael Mattias
                      Tal Systems (retired)
                      Port Washington WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment


                      • #12
                        Originally posted by Chris Holbrook View Post
                        Steve, other fast techniques would include using a byte pointer, and using MoveMemory, is MID$ faster than these?

                        What timings did you get when you tried it, Chris?

                        Bob Zale
                        PowerBASIC Inc.

                        Comment


                        • #13
                          Originally posted by Bob Zale View Post
                          What timings did you get when you tried it, Chris?
                          I thought I'd save the trouble by asking One Who Knows!

                          Comment


                          • #14
                            4 functions FIELD in memory and FIELD from disk

                            Code:
                            'FIELD.BAS
                            '4 functions to use FIELD statement
                             
                              #COMPILE EXE
                              #DIM ALL
                            DECLARE FUNCTION GetOneFieldInMemory(s AS STRING, Offset AS LONG, Length AS LONG) AS STRING
                            DECLARE SUB      GetAllFieldsInMemory(s AS STRING, OffSet() AS LONG, Length() AS LONG,MyArray() AS STRING)
                            DECLARE FUNCTION GetOneDiskField(FileNumber AS LONG,StartPosition AS LONG,Length AS LONG) AS STRING
                            DECLARE SUB      GetAllDiskFields(hFile AS LONG, StartPosition() AS LONG, Length() AS LONG,MyArray() AS STRING)
                             
                            FUNCTION PBMAIN AS LONG
                              LOCAL s AS STRING, x AS LONG, Elements AS LONG, hFile AS LONG
                              Elements = 4
                              REDIM StartPosition(1 TO Elements) AS LONG
                              REDIM Length(1 TO Elements) AS LONG
                              REDIM MyArray(Elements) AS STRING
                              StartPosition(1) = 1:  Length(1) = 3
                              StartPosition(2) = 5:  Length(2) = 5
                              StartPosition(3) = 11: Length(3) = 2
                              StartPosition(4) = 14: Length(4) = 6
                              s = "The power of BASIC!"
                            'Test FIELD with dynamic string in memory
                              FOR x = 1 TO Elements: ? "GetOneFieldInMemory: " + GetOneFieldInMemory(s,StartPosition(x),Length(x)):NEXT
                              GetAllFieldsInMemory (s,StartPosition(),Length(),MyArray())
                              FOR x = 1 TO Elements: ? "GetAllFieldsInMemory: " + MyArray(x):NEXT
                            '----------------------------------------------------------------------------------------------------------
                            'Create some data on disk
                              hFile = FREEFILE
                              OPEN "ABC.TXT" FOR OUTPUT AS #hFile             'write some data to disk
                              PRINT #hFile, "The power of BASIC!"             '19 bytes
                              CLOSE #hFile
                             
                            'Open random file and get each field
                              hFile= FREEFILE
                              OPEN "ABC.TXT" FOR RANDOM AS #hFile LEN = 19
                              GET #hFile,1
                              FOR x = 1 TO UBOUND(StartPosition)
                                ? "GetOneDiskField: " + GetOneDiskField(hFile,StartPosition(x),Length(x))
                              NEXT
                              CLOSE #hFile
                            'Open random file and place all fields into an array
                              hFile= FREEFILE
                              OPEN "ABC.TXT" FOR RANDOM AS #hFile LEN = 19
                              GET #hFile,1
                              GetAllDiskFields(hFile, StartPosition(), Length(),MyArray())
                              CLOSE #hFile
                              FOR x = 1 TO Elements
                                  ? "GetAllDiskFields: " + MyArray(x)
                              NEXT
                              #IF %DEF(%PB_CC32)
                                WAITKEY$
                              #ENDIF
                            END FUNCTION
                            FUNCTION GetOneFieldInMemory(s AS STRING, Offset AS LONG, Length AS LONG) AS STRING
                                'Get a single field in memory using a dynamic string
                                LOCAL OffsetField AS FIELD
                                LOCAL DataField AS FIELD
                                FIELD s, Offset-1 AS OffsetField,Length AS DataField
                                FUNCTION = DataField
                            END FUNCTION
                            SUB GetAllFieldsInMemory(s AS STRING, StartPosition() AS LONG, Length() AS LONG,MyArray() AS STRING)
                                'Get all fields in memory using a dynamic string
                                LOCAL x AS LONG
                                LOCAL Elements AS LONG
                                LOCAL OffsetField AS FIELD
                                LOCAL FieldName() AS FIELD
                                Elements = UBOUND(StartPosition)
                                REDIM TheData(1 TO Elements) AS FIELD
                                FOR x = 1 TO UBOUND(StartPosition)
                                  FIELD s,StartPosition(x)-1 AS OffsetField, Length(x) AS TheData(x)
                                  MyArray(x) = TheData(x)
                                NEXT
                            END SUB
                            FUNCTION GetOneDiskField(FileNumber AS LONG, StartPosition AS LONG,Length AS LONG)AS STRING
                                'Get a single field from disk
                                LOCAL OffsetField AS FIELD
                                LOCAL TheData     AS FIELD
                                FIELD #FileNumber,StartPosition-1 AS OffsetField, Length AS TheData
                                FUNCTION = TheData
                            END FUNCTION
                             
                            SUB GetAllDiskFields(FileNumber AS LONG, StartPosition() AS LONG, Length() AS LONG, MyArray() AS STRING)
                                'Get all fields from disk
                                LOCAL x AS LONG
                                LOCAL Elements AS LONG
                                LOCAL OffsetField AS FIELD
                                LOCAL FieldName() AS FIELD
                                Elements = UBOUND(StartPosition)
                                REDIM TheData(1 TO Elements) AS FIELD
                                FOR x = 1 TO UBOUND(StartPosition)
                                  FIELD #FileNumber,StartPosition(x)-1 AS OffsetField, Length(x) AS TheData(x)
                                  MyArray(x) = TheData(x)
                                NEXT
                            END SUB

                            Comment


                            • #15
                              Since 'longstring' is in RAM then offset is irrelevant for timing considerations since seek time in RAM is zero. 'Length' will have relevance.

                              More importantly, however, is the number of expected transactions.

                              Your opening post, George, mentions a 'series'. What is the expected value of series?

                              If series is not large then it is unlikely that anything other than MID$ will be of real benefit and anything other than MID$ will not be anything like as readable.

                              Comment


                              • #16
                                Wow, what a response.

                                Maybe I should have been clearer. The long string has been passed to a parsing routine which returns an array of 'interesting' offset/length pairs.

                                However part of what I need to do is pass these interesting 'answers' to a GRAPHIC PRINT statement which expects a STRING as input.

                                The FIELD statement seems to be what I need as I suspect it simply builds a string control area pointing into the original string. I'm assuming (will test it) that I could continually re-issue the FIELD using each of my offset/length pairs.

                                MID$ is the simplest answer of course; I was mainly concerned with the overhead of creating of all these new, temporary strings and whether there was some way to better/faster/cheaper get them across to the GRAPHIC PRINT statement.

                                I may be worrying overly about this, for a user interface interaction I would have maybe 750 to 1500 pairs to process.

                                I'll go 'play' now, I was looking for suggestions before going off trying all the possibilities. Thanks for the pointer to FIELD, I was browsing STRING functions for alternatives and missed it totally.

                                George

                                Addition:
                                Yes, the FIELD seems to do just what I'm after and it can be re-issued for each of the offset/length pairs. I didn't try timing tests, but thinking of what it does, it's gotta be pretty efficient. Seems exactly what I was looking for.

                                Thanks again to all of you.
                                Last edited by George Deluca; 5 Apr 2008, 03:20 PM.

                                Comment


                                • #17
                                  Code:
                                  'FIELD3.BAS
                                  'This should be efficient, FIELD once and no function CALL overhead.
                                  'This applies to disk files, but food for thought.
                                  #COMPILE EXE
                                  #DIM ALL
                                  FUNCTION PBMAIN AS LONG
                                    LOCAL sFileName       AS STRING
                                    LOCAL FieldNumber     AS LONG
                                    LOCAL FieldsPerRecord AS LONG
                                    LOCAL hFile           AS LONG
                                    LOCAL RecordNumber    AS LONG
                                    LOCAL RecordLength    AS LONG
                                    LOCAL OffsetLength    AS LONG
                                    LOCAL TotalRecords    AS LONG
                                    LOCAL fDummy          AS FIELD
                                   
                                  'These could be read from disk to create a dynamic database
                                    sFileName = "ABC.TXT"
                                    KILL "ABC.TXT":ERRCLEAR
                                    RecordLength = 20         'required
                                    TotalRecords = 3          'required
                                    FieldsPerRecord = 4       'required
                                    REDIM StartPosition(1 TO FieldsPerRecord) AS LONG
                                    REDIM Length (1 TO FieldsPerRecord) AS LONG
                                   
                                    StartPosition(1) = 1:  Length(1) = 4
                                    StartPosition(2) = 5:  Length(2) = 6
                                    StartPosition(3) = 11: Length(3) = 3
                                    StartPosition(4) = 14: Length(4) = 7
                                  '-------------------------------------------------------------
                                  'Create some data for testing
                                    OPEN sFileName FOR RANDOM AS hFile LEN = RecordLength
                                    FIELD #hFile, RecordLength AS fDummy
                                    FOR RecordNumber = 1 TO TotalRecords
                                      fDummy = "The power of BASIC!" + FORMAT$(RecordNumber)
                                      PUT #hFile
                                    NEXT
                                    CLOSE #hFile
                                  '-------------------------------------------------------------
                                  'Open database
                                    hFile= FREEFILE
                                    OPEN sFileName FOR RANDOM AS #hFile LEN = RecordLength
                                   
                                  'Setup FIELD only once
                                    REDIM f(1 TO FieldsPerRecord) AS FIELD
                                    FOR FieldNumber = 1 TO FieldsPerRecord
                                      OffSetLength = StartPosition(FieldNumber)-1
                                      FIELD #hFile,OffsetLength AS fDummy, Length(FieldNumber) AS f(FieldNumber)
                                    NEXT
                                   
                                  'Get record and display
                                    FOR RecordNumber = 1 TO LOF(hFile)\RecordLength
                                      GET #hFile,RecordNumber
                                      FOR FieldNumber = 1 TO FieldsPerRecord
                                        ? f(FieldNumber)
                                       NEXT
                                    NEXT
                                    CLOSE hFile
                                    WAITKEY$
                                  END FUNCTION

                                  Comment


                                  • #18
                                    I was mainly concerned with the overhead of creating of all these new, temporary strings...
                                    Not often you answer your own question BEFORE you ask it....
                                    part of what I need to do is pass these interesting 'answers' to a GRAPHIC PRINT statement which expects a STRING as input.
                                    Michael Mattias
                                    Tal Systems (retired)
                                    Port Washington WI USA
                                    [email protected]
                                    http://www.talsystems.com

                                    Comment


                                    • #19
                                      Another possibulity, if you can see your way clear to forgoing the use of Graphic Priint statements, is DrawText(). That Api function call requires exactly the information you specified you have, i.e., length - offset data pairs. This is another way of eliminating the two time consuming operations of

                                      1) memory allocation (memory has already been allocated for the 'in place' larger string;

                                      2) movement of bytes from one buffer to another.

                                      I know you havn't been with PowerBASIC a really long time George, but you should be able to create your own Window with either a CreateWindowEx() Api call, or I imagine a DDT dialog would work too, although I've personally never DrawText()'ed or TextOut()'ed to DDT windows.
                                      Fred
                                      "fharris"+Chr$(64)+"evenlink"+Chr$(46)+"com"

                                      Comment


                                      • #20
                                        Originally posted by Fred Harris View Post
                                        ... I imagine a DDT dialog would work too
                                        it does, see Chris Boss's post in this thread

                                        Comment

                                        Working...
                                        X