Announcement

Collapse

Forum Guidelines

This forum is for finished source code that is working properly. If you have questions about this or any other source code, please post it in one of the Discussion Forums, not here.
See more
See less

CSV Parser

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CSV Parser

    I needed a CSV parser as the PB Parse() function doesn't handle quoted strings so I put this together.
    It seems to work okay but I'd be glad to hear about improvements or bugs. If a comma is in quotation marks it should be ignored. I looked around for a CSV spec to find out if tabs should be treated special but I couldn't find anything, feel free to post a URL of a spec that may help me standardise this.

    Code:
    #Include "win32api.inc"
    
    Declare Function CSV$(CSV_line As String, Column As Word) As String
    
    ' CSV Parser Include file, used as the Parse$() command in PB except it will recognise quotation marks
    
    Function CSV$(CSV_line As String, Column As Word) As String
    
        Dim CSV_Text As String
        Dim i As Long
        Dim InQuote As Integer
        Dim CurrentCol As Integer
        Dim StartFound As Integer
        Dim StartChr As Long
        Dim ReturnLen As Long
    
        InQuote = %False
        CurrentCol = 1
        StartFound = %False
        ReturnLen = 0
        
        ' Catch from Beginning if Column = 1
        If Column = 1 Then
            StartFound = %true
            StartChr = 0
        End If
        
        'strip code to first $crlf
        If Instr(1,CSV_Line,$crlf) Then
            CSV_Text = Left$(CSV_Line, Instr(1,CSV_Line,$crlf))
        Else
            CSV_Text = CSV_Line & Chr$(0)
        End If
        
        'pass string left to right
        For i = 1 To Len(CSV_Text)
    
            If CurrentCol = Column Then Incr ReturnLen      'Increment length of Cell
            
            Select Case Mid$(CSV_Text,i,1)
    
                Case """"            'Entering or Exiting a quoted field or section of field
                    
                    If InQuote = %True Then
                        InQuote = %false
                    Else
                        InQuote = %True
                    End If
                    
                Case ","
                    
                    If InQuote = %True Then
                        ' ignore
                    Else
                        If CurrentCol = Column Then         'end of cell found
                            Exit For
                        End If
                        
                        If CurrentCol = Column -1 Then      'Beginning of Cell Found
                            StartChr = i                    'Mark Beginning of Cell
                            StartFound = %true
                        End If
                        
                        If CurrentCol < Column Then Incr CurrentCol
    
                    End If
    
            End Select
    
    
        Next
    
        If StartFound = %True Then                          'String pass is done, was the column found?
            
            If ReturnLen = 1 Then
                CSV$ = ""                                   'zero length cell, two comma's together
            Else
                CSV$ = Mid$(CSV_Text,StartChr + 1, ReturnLen -1 )
            End If
            
        Else
            CSV$ = ""                                       'error, field doesn't exist
        End If
    
    End Function
    ------------------

    Paul Dwyer
    Network Engineer
    Aussie in Tokyo
    (Paul282 at VB-World)

  • #2
    Paul,

    If you want raw speed you can try to adapt this code.
    I tried to extract the most important things, I'm not sure if it's complete. Also there's very little comments but I don't have time to add any...


    Code:
    $File1 = "Sample.txt"
    
    %Quote      = 34
    '%Separator  = 44      ',
    %Separator  = 59      ';
    %Return     = 13
    %EOFile     = 26
    %Comma      = 44
    %Point      = 46
    
    %TRUE = 1
    %FALSE = 0
    
    GLOBAL FileBuffer   AS STRING
    GLOBAL FilePoint    AS BYTE PTR
    GLOBAL FileEnd      AS BYTE PTR
    GLOBAL FileToStr    AS LONG
    GLOBAL sFldName()   AS STRING
    GLOBAL CurrentFile  AS STRING
    
    '------------------------------------------------------------------------------
    FUNCTION GetFieldNames(fIn AS LONG) AS LONG
      LOCAL p       AS LONG
      LOCAL p1      AS LONG
      LOCAL i       AS LONG
      LOCAL quotes  AS LONG
      LOCAL retflag AS LONG
      LOCAL s       AS STRING
    
      FUNCTION = 0
    
      DIM sFldName(%flds)
      i= LOF(fIn)
      FileBuffer = STRING$(i, 0)
      GET #fIn, , ABS FileBuffer
      IF ERR THEN
        Problem ERRCLEAR, "GetFieldNames " + CurrentFile
        EXIT FUNCTION
      END IF
      FilePoint = STRPTR(FileBuffer)
      FileEnd = FilePoint + i + 1        'points after last byte in string
      FileToStr = 1 - FilePoint           'offset for midstr
      i = 0
      retflag = %FALSE
      DO
        IF FilePoint >= FileEnd THEN
          EXIT DO
        END IF
        INCR i
        IF i > %flds THEN
          Problem -1, "GetFieldNames nFields > %flds in " + CurrentFile + STR$(i)+STR$(%flds)
          EXIT FUNCTION
        END IF
        quotes = 0
        p = 0
        p1 = 0
        DO
          IF @FilePoint = %Quote THEN
            INCR quotes
            IF quotes = 1 THEN
              p = FilePoint + 1
            ELSE
              p1 = FilePoint
            END IF
          ELSEIF @FilePoint = %Separator THEN
            IF (quotes AND 1) = 0 THEN
              EXIT DO
            END IF
          ELSEIF @FilePoint = %Return THEN
            IF (quotes AND 1) = 0 THEN
              INCR FilePoint    'skip linefeed
              retflag = %TRUE
              EXIT DO
            END IF
          END IF
          INCR FilePoint
          IF FilePoint >= FileEnd THEN
            EXIT DO
          END IF
        LOOP
        IF FilePoint >= FileEnd THEN
          EXIT DO
        END IF
        IF p1 = 0 THEN
          EXIT DO
        END IF
        p1 = p1 - p
        p = p + FileToStr
        sFldName(i) = MID$(FileBuffer, p, p1)
        INCR FilePoint
      LOOP UNTIL retflag
      'reached here so function was successful:
      FUNCTION = i
    END FUNCTION
    '------------------------------------------------------------------------------
    
    FUNCTION GetFields(fIn AS LONG, ndx AS LONG, nFlds AS LONG, sArray() AS STRING) AS LONG
      LOCAL p       AS LONG
      LOCAL p1      AS LONG
      LOCAL i       AS LONG
      LOCAL l       AS LONG
      LOCAL q1      AS LONG
      LOCAL quotes  AS LONG
      LOCAL retflag AS LONG
      LOCAL s       AS STRING
      LOCAL RecBuf  AS STRING
      LOCAL sp      AS BYTE PTR
    
      FUNCTION = 0
    
      IF ndx >= UBOUND(sArray(2)) THEN
        REDIM PRESERVE sArray(nFlds, ndx + 1000)
        IF ERR THEN
          Problem ERRCLEAR, "redim "
          EXIT FUNCTION
        END IF
      END IF
    
      i = 0
      retflag = %FALSE
      DO
        IF FilePoint >= FileEnd THEN
          EXIT DO
        END IF
        INCR i
        IF i > %flds THEN
          Problem -1, "GetFields nFields > %flds in " + CurrentFile + "  (" + FORMAT$(nFlds) + ")"
          EXIT FUNCTION
        END IF
        'scan to next separator outside quotes
        quotes = 0
        p = 0
        p1 = 0
        DO
          IF @FilePoint = %Quote THEN
            INCR quotes
            IF quotes = 1 THEN
              p = FilePoint + 1
            ELSE
              p1 = FilePoint
              sp = FilePoint
              INCR sp
              IF @sp <> %Separator THEN
                IF @sp <> %Return THEN
                  ' quote IN TEXT...
                  DECR quotes
                END IF
              END IF
            END IF
          ELSEIF @FilePoint = %Separator THEN
            IF (quotes AND 1) = 0 THEN
              EXIT DO
            ELSE
              'if separator = comma...
              IF @FilePoint = %Comma THEN
                @FilePoint = %Point
              END IF
            END IF
          ELSEIF @FilePoint = %Return THEN
            IF (quotes AND 1) = 0 THEN
              INCR FilePoint    'skip linefeed
              retflag = %TRUE
              EXIT DO
            END IF
          ELSEIF @FilePoint = %Comma THEN
            @FilePoint = %Point
          END IF
          INCR FilePoint
          IF FilePoint >= FileEnd THEN
            EXIT DO
          END IF
        LOOP
        IF FilePoint >= FileEnd THEN
          EXIT DO
        END IF
        IF p1 = 0 THEN
          sArray(i, ndx) = MID$(FileBuffer, p, p1)
        ELSE
          p1 = p1 - p
          p = p + FileToStr
          sArray(i, ndx) = MID$(FileBuffer, p, p1)
        END IF
        INCR FilePoint
      LOOP UNTIL retflag
    
      'reached here so function was successful:
      FUNCTION = i
    END FUNCTION
    
    '------------------------------------------------------------------------------
    FUNCTION ReadFiles() AS LONG
    LOCAL fIn     AS LONG
    LOCAL i AS LONG
    LOCAL s AS STRING
    
      'default to error state in case we exit the function:
      FUNCTION = %FALSE
    
      PRINT "Reading file 1..."
      CurrentFile = $File1
      fIn = FREEFILE
      OPEN CurrentFile FOR BINARY AS fIn
      IF ERR THEN
        Problem ERRCLEAR, "Open " + CurrentFile
        EXIT FUNCTION
      END IF
      fCount = GetFieldNames(fIn)
      IF fCount = 0 THEN
        EXIT FUNCTION
      END IF
      DIM sArray(fCount, 10)
      DO
        IF FilePoint >= FileEnd THEN
          EXIT DO
        ELSEIF @FilePoint = %EOFile THEN    'some still appear to be using this...
          EXIT DO
        END IF
        INCR nLeden
        IF GetFields(fIn, nLeden, fCount, sArray()) = 0 THEN
          EXIT FUNCTION
        END IF
      LOOP
      CLOSE fIn
      '
      'release global memory
      FileBuffer = ""
      '
      'function success
      FUNCTION = %TRUE
    END FUNCTION
    
    '------------------------------------------------------------------------------

    ------------------
    [email protected]
    [email protected]

    Comment


    • #3
      I sympathise with your quest.
      Basic is a unique language, it provides a WRITE staement which outputs CSV format, however its own INPUT statement can not reliably read it's own output!
      I have tested many versions of basic starting with the original IBM pc basic, GWBASIC, Visual basic from versions 1 to 6, Quick Basic, Power Basic from pbdll 1.1 to 6 and others. Not one will INPUT correctly it's WRITE output.
      The only program I have found that will always correctly read output from a WRITE statement is Excell!
      I know of no specification in the basic language which says it shouldn't INPUT its own WRITE (which is CSV) maybe one day Power Basic will break the unwritten rule and actually do this.


      ------------------

      Comment


      • #4
        They don't need to now, I've done it



        ------------------

        Paul Dwyer
        Network Engineer
        Aussie in Tokyo
        (Paul282 at VB-World)

        Comment

        Working...
        X