There’s a lot of air in sequential files. Suppose you have data in two arrays
a() as string, b() as long
(with no double quotes inside the strings) and save it all in a sequential file:
For i = 0 to N
Write #1, a(i), b(i)
Next
In an editor the resulting file will look something like:
"asdasd",23
"fdfasf,affdsa",9
"",85
"asdfsadf",0
‘ffddsa",4
etc with a CRLF (carriage return, linefeed) after each line, including the last. A comma or CRLF (not both at the same time) mark the end of a field.
Though it may not be in the BASIC language specification the quotes about a string are unnecessary – when the program comes to input this sequential file – if the string contains no comma or CR. Both an empty string and a numeric zero can be omitted (as long as the comma is kept) and when input the nothingness will be considered "" or 0 depending on the data type of the input variable. Finally, all the CRLF can be replaced with commas, including the last one.
Thus the above data can be saved to a less airy file, one that in an editor looks like:
asdasd,23,"fdfasf,affdsa",9,,89,asdfsadf,,ffddsa,4,
with no CRLF at the end. And this file can be read using exactly the same code that reads the first file:
For i = 0 to N
Input #1, a(i), b(i)
Next
The problem is how to save the data the less airy way. The solution is to consider the file as a binary file simulating a sequential file. (Of course in the last analysis a file is just bytes and is called this or that type of file only because it is written or read a certain way.) The following code (PBCC) illustrates this idea.
Note the Kill instruction. Unlike when you open a file as sequential (“For Output”), opening it for binary, then writing to it, then closing it again doesn’t truncate the file to the length of what was written. If the written data contains fewer bytes than the original file, the old bytes lying beyond the written data will remain there. When the program later reads the file sequentially, you’re in trouble when it gets to the old data.
So before writing the binary file you must first either kill the original, or rename it, or open it for sequential output and immediately close it, which last sets its length to zero. (Another solution, if the file grows in the long run, is to have a special ending record that simply marks where the sequential input must stop.)
I shrank a “real world” sequential data file by 21% using this simple technique, and since only my program reads it, the format doesn’t matter.
(Of course going binary all the way would be the most compact because numbers could be represented by the number of bytes they take up in use, instead of by strings of the decimal version, and strings with commas or carriage returns could be delimited by a one byte special character. But it’s a lot of trouble for little improvement if most of the numbers are single digits and most of the strings are made of letters.)
Now I didn’t want the customer to feel cheated when he saw his shrunken data file, so I have my program pad it back up to it’s original size using a custom made bloat procedure.
a() as string, b() as long
(with no double quotes inside the strings) and save it all in a sequential file:
For i = 0 to N
Write #1, a(i), b(i)
Next
In an editor the resulting file will look something like:
"asdasd",23
"fdfasf,affdsa",9
"",85
"asdfsadf",0
‘ffddsa",4
etc with a CRLF (carriage return, linefeed) after each line, including the last. A comma or CRLF (not both at the same time) mark the end of a field.
Though it may not be in the BASIC language specification the quotes about a string are unnecessary – when the program comes to input this sequential file – if the string contains no comma or CR. Both an empty string and a numeric zero can be omitted (as long as the comma is kept) and when input the nothingness will be considered "" or 0 depending on the data type of the input variable. Finally, all the CRLF can be replaced with commas, including the last one.
Thus the above data can be saved to a less airy file, one that in an editor looks like:
asdasd,23,"fdfasf,affdsa",9,,89,asdfsadf,,ffddsa,4,
with no CRLF at the end. And this file can be read using exactly the same code that reads the first file:
For i = 0 to N
Input #1, a(i), b(i)
Next
The problem is how to save the data the less airy way. The solution is to consider the file as a binary file simulating a sequential file. (Of course in the last analysis a file is just bytes and is called this or that type of file only because it is written or read a certain way.) The following code (PBCC) illustrates this idea.
Code:
$comma = "," '======================================== 'This puts quotes around a string only if it contains a comma or 'carriage return, then adds a comma. Function Format1(a As String) As String If InStr(a, $comma) > 0 Or InStr(a, $Cr) > 0 Then Function = $Dq + a + $Dq + $comma Else Function = a + $comma 'empty strings go here End If End Function '======================================== 'This converts numbers to strings, but if zero 'makes it nothing, then adds a comma. Function Format2(n As Long) As String If n Then Function = Format$(n) + $comma Else Function = $comma 'zeros go here End If End Function '======================================== 'Create data and a standard sequential file. 'Then create a shrunken sequential file as a 'binary file, then read it as a sequential file. 'Are the printouts the same? Function PBMain Dim i As Long, k As Long, p As String Dim a(10) As String, b(10) As Long Dim aa As String, bb As Long Dim OurFileNew As String, OurFileOld As String OurFileNew = "fooNew.txt" OurFileOld = "fooOld.txt" 'create data a(), b() and save to a sequential file Open OurFileOld For Output As #1 For i = 1 To 10 k = Rnd(1,7) a(i) = Mid$("abcd,efg", k, Rnd(2, 9-k)) 'random string b(i) = Rnd(0,10) 'random number Write #1, a(i), b(i) Print a(i), b(i) Next Close #1 Print "-----------------" 'Must kill or rename or make zero length any 'any existing OurFileNew because binary output 'doesn't truncate. If Len(Dir$(OurFileNew)) Then Kill OurFileNew 'open/create another file as a binary file, 'write data a new way Open OurFileNew For Binary As #1 For i = 1 To 10 p = Format1(a(i)) + Format2(b(i)) Put #1, , p Next Close #1 'open as a sequential file, see if data is the same Open OurFileNew For Input As #1 Do Until Eof(1) Input #1, aa, bb Print aa, bb Loop Close #1 Print WaitKey$ End Function
So before writing the binary file you must first either kill the original, or rename it, or open it for sequential output and immediately close it, which last sets its length to zero. (Another solution, if the file grows in the long run, is to have a special ending record that simply marks where the sequential input must stop.)
I shrank a “real world” sequential data file by 21% using this simple technique, and since only my program reads it, the format doesn’t matter.
(Of course going binary all the way would be the most compact because numbers could be represented by the number of bytes they take up in use, instead of by strings of the decimal version, and strings with commas or carriage returns could be delimited by a one byte special character. But it’s a lot of trouble for little improvement if most of the numbers are single digits and most of the strings are made of letters.)
Now I didn’t want the customer to feel cheated when he saw his shrunken data file, so I have my program pad it back up to it’s original size using a custom made bloat procedure.
Comment