I have a voter registration file that is in a .CSV file format. Each individual's data is recorded in 48 fields. The fields are simply things like last name, first name, middle name, addresses, etc. There are 7,894,412 records in the file. I have discovered that 6 of the records are corrupt. By corrupt I mean that instead of 48 records for those 6 individuals there are 49. The reason for this is that some clerks recorded address information enclosed in quotes. Here's an example:
18 fields followed by: ,"4TH PLT "IMMORTALS"", ...> remaining fields. By enclosing the platoon nickname in quotes the record becomes corrupt.
I can use: INPUT #1, RECORDS
because each record ends with CR/LF. However, if I try to read in each field ie:
INPUT #1, LAST_NAME
INPUT #1, FIRST_NAME
INPUT #1, MIDDLE_NAME, etc. All goes well until I reach the record with the quoted nickname. Eventually the next record begins with the last field of the corrupt record so the first field of the new record, LAST NAME, is incorrect.
I tried to use the PBCC REPLACE command to get rid of the quote mark around the word: 'IMMORTALS'. I've tried various methods including using CHR$(34), for example
REPLACE "CHR$(34)IMMORTALS CHR$(34)" and "IMMORTALS""
but these don't work. I am applying the REPLACE command to the entire record of 48 / 49 fields.
So, what is a way to eliminate the extra " marks around the word IMMORTALS?
I'm asking just to learn how to handle this situation. I'm not working for any governmental agency or political party. My alternate path forward is just to eliminate the corrupt records, but then I learn nothing about programming this problem.
Thanks
Tim
18 fields followed by: ,"4TH PLT "IMMORTALS"", ...> remaining fields. By enclosing the platoon nickname in quotes the record becomes corrupt.
I can use: INPUT #1, RECORDS
because each record ends with CR/LF. However, if I try to read in each field ie:
INPUT #1, LAST_NAME
INPUT #1, FIRST_NAME
INPUT #1, MIDDLE_NAME, etc. All goes well until I reach the record with the quoted nickname. Eventually the next record begins with the last field of the corrupt record so the first field of the new record, LAST NAME, is incorrect.
I tried to use the PBCC REPLACE command to get rid of the quote mark around the word: 'IMMORTALS'. I've tried various methods including using CHR$(34), for example
REPLACE "CHR$(34)IMMORTALS CHR$(34)" and "IMMORTALS""
but these don't work. I am applying the REPLACE command to the entire record of 48 / 49 fields.
So, what is a way to eliminate the extra " marks around the word IMMORTALS?
I'm asking just to learn how to handle this situation. I'm not working for any governmental agency or political party. My alternate path forward is just to eliminate the corrupt records, but then I learn nothing about programming this problem.
Thanks
Tim
Comment