This my way of giving back to all of you that posted.
besides, its allways fun to see what shakes out in a head to head
Input File = 5MB of 81 char lines derived from data available at:
at: ftp.cme.com/pub/time (July 01 files)
5 runs were made for each technique, the results averaged.
The exact same number of Processing statements were used for each technique.
If the technique put the line into an array, then a TempStr was substituted
and then used by the MID$ statements instead of the array to avoid a second loop.
Testing done on a 400mhz win98SE box - your mileage may vary
There were two basic methods suggested by you guys:
Read the whole file into a string in memory,
three different techniques:
1. work it with regular strings then Use MID$s
2. work it with pointers then Use MID$s
3. work it with a UDT (DIM AT)
Read the file line by line,
two techniques:
4. read one line at a time into a String then Use MID$s
5. read one line at a time into a UDT.
My first attempt was Method 4.
LineLength = 81
GET$ #100, LineLength, Buf
SymbStr = MID$(Buf, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
This took an average of 5.53 seconds
Next I learned that you can input to a UDT! Method 5
TYPE CMETimeAndSales
SymbStr AS STRING*3 ' 1 - 3
etc
END TYPE
GLOBAL TnSLine AS CMETimeAndSales
GET #100,,TnSLine
SymbStr = TnSLine.SymbStr
' etc (some function calls like DateToJulian)
This took an average of 5.42 secs
The I decided to try loading the whole file into memory and PARSE$ out a line at a time.
Method 1.
OPEN Filename FOR BINARY AS #1 LEN = 32768
GET$ #1, LOF(1), Buf
CLOSE #1
After Studying Scots code I used
REPLACE $LF WITH "|" IN a$ ' not sure why ?
N = TALLY(A$, "|") ' Count line feeds.
FOR L = 1 TO N ' Get N lines in total
TempStr = PARSE$(A$, "|", L) ' Extract line from M to K-1. Length = K - M.
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
It could not get throught the 5Mb file even in 5mins. It took 45secs for a 500k file.
(I expect the time taken to rise exponentially with file size cos PARSE$ starts
at the beginning each time)
Then there is the INSTR() method 1 technique:
Erik:
K = 0
N = TALLY(A$, CHR$(10)) ' Count line feeds.
FOR L = 1 TO N ' Get N lines in total
M = K + 1 ' Starting point of search for each successive line
K = INSTR(M, A$, CHR$(10)) ' Find next line feed
TempStr = MID$(A$, M, K - M) ' Extract line from M to K-1. Length = K - M.
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
NEXT
This took an average of 4.52 secs
Then you guys posted some clever variations of pointer techniques Method 2.
Borje:
p1 = 1
Letter = STRPTR(Buf) 'point Letter to beginning of string
FOR I = 1 TO LOF(1)
IF @Letter = 10 THEN 'Letter's value = 10, $LF
TempStr = MID$(Buf, p1, I - p1) 'pick out line to array element
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
p1 = I + 1 'store position + 1 for the LF we skip
END IF
INCR Letter 'set byte pointer to next char
This took an average of 4.45 secs
jc: (very similar to above)
Length = -1 ' for first line
Start = 1 ' for first line
bPtr = STRPTR(S1) ' S1 = Buf
DO WHILE NOT @bPtr = 0
IF @bPtr = 10 THEN
TempStr = MID$(S1, Start, Length)
Start = Start + Length + 2
Length = 0
INCR bPtr
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
ELSE
INCR Length
END IF
INCR bPtr
LOOP
I had to modify this a little. Length must= -1 on entry into the loop other wise
it is 1 char out on lines 2-n. I also removed the second IF statement inside the loop.
This took an average of 4.45 secs (there was a wider variance - 4.28 to 4.56secs)
Semen:
p = STRPTR(Buf): pp = p + LEN(Buf): @pp = 10
n = 0 ' number of lines
WHILE p < pp AND Done = 0
b = p: WHILE @b <> 10: INCR b: WEND: @b = 0
SymbStr = MID$(@p, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
p = b + 1
INCR n
WEND
This took an average of 5.51 secs (there was a wider variance - 5.33 to 5.66secs)
Finally there is the Ultra clever and downright sneaky cool Method 3:
This language amazes me. Whoever thought this up is a genius.
Overlay a UDT format on a string in memory and just read out the elements - brilliant!
John:
TYPE CMETimeAndSales
SymbStr AS STRING*3 ' 1 - 3
' etc
END TYPE
GLOBAL TnSLine AS CMETimeAndSales, TnsFile() AS CMETimeAndSales
'
FileLength = LOF(100)
LineLength = LEN(TnSLine) ' Length of a line
LinesInFile = FileLength / LineLength
GET$ 100, FileLength, Buf ' Read whole file into a string
CLOSE #100
Start = STRPTR(Buf)
REDIM TnSFile(LinesInFile-1) at Start ' Got to use REDIM cos its called many times
FOR n = 0 TO LinesInFile-1
SymbStr = TnSFile(n).SymbStr ' SP
' etc (Some more function calls like DateToJulian)
This took an average of 4.1 secs. A clear winner with the added advantage that all the data
is avaiable in the array.
Conclusion:
The only problem with Method 3 is that all files must contain the same line length.
Unfortunatly it turns out that there at least two different line lengths I need to work with
so I am forced to use a pointer Method 2 style
This will at least ensure that if they change the line length in the future
(but leave the position of the elements the same) , my prog will still work.
Final Note:
If the purpose is to load the file into an array, and subsequently process that array,
then the poiner methods would need an additional loop to go through the array. The UDT
method would not.
If any of you would like all the code I can e-mail it no prob.
Thank you all soooooooo much for posting and not just sitting back and watching.
------------------
Kind Regards
Mike
[This message has been edited by Mike Trader (edited July 17, 2001).]
besides, its allways fun to see what shakes out in a head to head

Input File = 5MB of 81 char lines derived from data available at:
at: ftp.cme.com/pub/time (July 01 files)
5 runs were made for each technique, the results averaged.
The exact same number of Processing statements were used for each technique.
If the technique put the line into an array, then a TempStr was substituted
and then used by the MID$ statements instead of the array to avoid a second loop.
Testing done on a 400mhz win98SE box - your mileage may vary

There were two basic methods suggested by you guys:
Read the whole file into a string in memory,
three different techniques:
1. work it with regular strings then Use MID$s
2. work it with pointers then Use MID$s
3. work it with a UDT (DIM AT)
Read the file line by line,
two techniques:
4. read one line at a time into a String then Use MID$s
5. read one line at a time into a UDT.
My first attempt was Method 4.
LineLength = 81
GET$ #100, LineLength, Buf
SymbStr = MID$(Buf, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
This took an average of 5.53 seconds
Next I learned that you can input to a UDT! Method 5
TYPE CMETimeAndSales
SymbStr AS STRING*3 ' 1 - 3
etc
END TYPE
GLOBAL TnSLine AS CMETimeAndSales
GET #100,,TnSLine
SymbStr = TnSLine.SymbStr
' etc (some function calls like DateToJulian)
This took an average of 5.42 secs
The I decided to try loading the whole file into memory and PARSE$ out a line at a time.
Method 1.
OPEN Filename FOR BINARY AS #1 LEN = 32768
GET$ #1, LOF(1), Buf
CLOSE #1
After Studying Scots code I used
REPLACE $LF WITH "|" IN a$ ' not sure why ?
N = TALLY(A$, "|") ' Count line feeds.
FOR L = 1 TO N ' Get N lines in total
TempStr = PARSE$(A$, "|", L) ' Extract line from M to K-1. Length = K - M.
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
It could not get throught the 5Mb file even in 5mins. It took 45secs for a 500k file.
(I expect the time taken to rise exponentially with file size cos PARSE$ starts
at the beginning each time)
Then there is the INSTR() method 1 technique:
Erik:
K = 0
N = TALLY(A$, CHR$(10)) ' Count line feeds.
FOR L = 1 TO N ' Get N lines in total
M = K + 1 ' Starting point of search for each successive line
K = INSTR(M, A$, CHR$(10)) ' Find next line feed
TempStr = MID$(A$, M, K - M) ' Extract line from M to K-1. Length = K - M.
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
NEXT
This took an average of 4.52 secs
Then you guys posted some clever variations of pointer techniques Method 2.
Borje:
p1 = 1
Letter = STRPTR(Buf) 'point Letter to beginning of string
FOR I = 1 TO LOF(1)
IF @Letter = 10 THEN 'Letter's value = 10, $LF
TempStr = MID$(Buf, p1, I - p1) 'pick out line to array element
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
p1 = I + 1 'store position + 1 for the LF we skip
END IF
INCR Letter 'set byte pointer to next char
This took an average of 4.45 secs
jc: (very similar to above)
Length = -1 ' for first line
Start = 1 ' for first line
bPtr = STRPTR(S1) ' S1 = Buf
DO WHILE NOT @bPtr = 0
IF @bPtr = 10 THEN
TempStr = MID$(S1, Start, Length)
Start = Start + Length + 2
Length = 0
INCR bPtr
SymbStr = MID$(TempStr, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
ELSE
INCR Length
END IF
INCR bPtr
LOOP
I had to modify this a little. Length must= -1 on entry into the loop other wise
it is 1 char out on lines 2-n. I also removed the second IF statement inside the loop.
This took an average of 4.45 secs (there was a wider variance - 4.28 to 4.56secs)
Semen:
p = STRPTR(Buf): pp = p + LEN(Buf): @pp = 10
n = 0 ' number of lines
WHILE p < pp AND Done = 0
b = p: WHILE @b <> 10: INCR b: WEND: @b = 0
SymbStr = MID$(@p, 1, 3)
' etc (7 more MID$ and function calls like DateToJulian)
p = b + 1
INCR n
WEND
This took an average of 5.51 secs (there was a wider variance - 5.33 to 5.66secs)
Finally there is the Ultra clever and downright sneaky cool Method 3:
This language amazes me. Whoever thought this up is a genius.
Overlay a UDT format on a string in memory and just read out the elements - brilliant!
John:
TYPE CMETimeAndSales
SymbStr AS STRING*3 ' 1 - 3
' etc
END TYPE
GLOBAL TnSLine AS CMETimeAndSales, TnsFile() AS CMETimeAndSales
'
FileLength = LOF(100)
LineLength = LEN(TnSLine) ' Length of a line
LinesInFile = FileLength / LineLength
GET$ 100, FileLength, Buf ' Read whole file into a string
CLOSE #100
Start = STRPTR(Buf)
REDIM TnSFile(LinesInFile-1) at Start ' Got to use REDIM cos its called many times
FOR n = 0 TO LinesInFile-1
SymbStr = TnSFile(n).SymbStr ' SP
' etc (Some more function calls like DateToJulian)
This took an average of 4.1 secs. A clear winner with the added advantage that all the data
is avaiable in the array.
Conclusion:
The only problem with Method 3 is that all files must contain the same line length.
Unfortunatly it turns out that there at least two different line lengths I need to work with
so I am forced to use a pointer Method 2 style

This will at least ensure that if they change the line length in the future
(but leave the position of the elements the same) , my prog will still work.
Final Note:
If the purpose is to load the file into an array, and subsequently process that array,
then the poiner methods would need an additional loop to go through the array. The UDT
method would not.
If any of you would like all the code I can e-mail it no prob.
Thank you all soooooooo much for posting and not just sitting back and watching.
------------------
Kind Regards
Mike
[This message has been edited by Mike Trader (edited July 17, 2001).]
Comment