Announcement

Collapse
No announcement yet.

Unknown String Length

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unknown String Length

    I'm currently working with a file of HL7 (healthcare) messages and am trying to read in the messages in order to parse them. Each message - and each segment within a message - is a different length.

    What is the 'recommended' way to read in such a string to make it available for parsing?

    Tnx in advance.

    Tom

  • #2
    Well, I would first read all the file data in memory and store it in a string.


    LOCAL ff as LONG
    LOCAL AllData$

    ff = FREEFILE

    '=== Open file for (binary) data retrieval
    OPEN FileName FOR BINARY AS #ff
    GET$ #ff,LOF(ff), AllData$
    CLOSE #ff
    Then you can parse 'AllData$' but of course you would need some info about the file format.
    For example, do the different messages and segments have some kind of separator character .. I assume so, if you say they have different lengths.
    Unless the format uses fields to specify the message lengthts .. So, you definitely need file specifications.

    Does this help ?

    Kind regards
    Last edited by Eddy Van Esch; 22 Jan 2008, 05:24 PM.
    Eddy

    Comment


    • #3
      Source: http://www.uhc.com.pl/teksty/HL7/ch200019.htm#E11E16
      In constructing a message certain special characters are used. They are the segment terminator, the field separator, the component separator, subcomponent separator, repetition separator, and escape character. The segment terminator is always a carriage return (in ASCII, a hex 0D). The other delimiters are defined in the MSH segment, with the field delimiter in the 4th character position, and the other delimiters occurring as in the field called Encoding Characters, which is the first field after the segment ID
      Using "soft" delimiters like this makes the data self-describing, pretty much the same as parsing ANSI ASC X12 data, which I do a lot of. (Except in ANSI ASC X12, segment terminator is also soft, but it follows a known-length opening segment so you can always find it).

      Needless to say, since each segment is delimited by CR, LINE INPUT should get you exactly one segment at a time. (PB 's LINE INPUT does NOT look for CRLF as documented, it looks for CR only and ignores the LF). (Thru 8.03 anyway)

      If you want to change from HL7 to ANSI ASC X12 (eg use the '837' for claims), I have some tools available for licensing which would make your life a lot easier.



      [ADD]

      FWIW, I do not read the whole file into a string... I memory map the file and just use a BYTE PTR to read it character by character... but loading the whole string to memory is OK, too. Memory mapping is more efficient, but as long as you have enough memory relative to file size you'll be fine.
      Last edited by Michael Mattias; 22 Jan 2008, 05:59 PM.
      Michael Mattias
      Tal Systems (retired)
      Port Washington WI USA
      [email protected]
      http://www.talsystems.com

      Comment


      • #4
        Also FWIW, there's lots of commercial software tools available for HL7. I'm pretty sure there are "HL7 to XML" utilities... and you might find XML easier to handle than 'native' HL7.


        MCM
        Michael Mattias
        Tal Systems (retired)
        Port Washington WI USA
        [email protected]
        http://www.talsystems.com

        Comment


        • #5
          Thanks to both of you. I need this to stay in HL7 since, as an integration analyst, I'm constantly having to parse out individual messages to be able to point out to a non-HL7 conversant user where his/her error lies in the message. We have several other HL7 tools but they're cumbersome so I'm doing my own.

          Reading it all at once is, I think, the key here.

          Thanks again.

          Tom

          Comment


          • #6
            >Reading it all at once is, I think, the key here.

            ???

            PARSE'ing (or REGEXPR'ing) for the CR delimiter, you end up with one segment at a time.
            LINE INPUT'ing you end up with one segment at a time.
            LINE INPUTing ArrayName() gets you an array of all segments.
            PARSE'ing into ArrayName() also gets you an array of all segments.

            So where's the advantage in one method over the other?

            Inquiring Minds Want To Know!



            MCM
            Michael Mattias
            Tal Systems (retired)
            Port Washington WI USA
            [email protected]
            http://www.talsystems.com

            Comment

            Working...
            X