Author Topic: Versatile String Parsing Function by RhoSigma  (Read 1549 times)

0 Members and 1 Guest are viewing this topic.

Offline Junior Librarian

  • Moderator
  • Newbie
  • Posts: 19
Versatile String Parsing Function by RhoSigma
« on: September 19, 2021, 04:52:02 am »
Versatile String Parsing Function

Author: @RhoSigma
Source: qb64.org Forum
URL: https://www.qb64.org/forum/index.php?topic=4142.0
Version: 2021-08-27

Author's Description:
I guess every developer is sooner or later in need of such a parsing function: Doesn't matter if it's to split a simple text line into its single words, quickly reading CSV data into an array, break up a path specification into the single folder names or get the individual options of a given command line or of an URL query string.

Obviously such a function must be able to recognize several separator chars and needs to be able to suppress the splitting of components in quoted sections. Special to this function is the ability to optionally use different chars for opening quotes and closing quotes, which e.g. allows to read out sections in parenthesis or brackets.

For usage, see the full description available in separate HTML document (compressed file).



Source Code:
Code: (qb64) [Select]
     
    '--- Full description available in separate HTML document.
    '---------------------------------------------------------------------
    FUNCTION ParseLine& (inpLine$, sepChars$, quoChars$, outArray$(), minUB&)
    '--- option _explicit requirements ---
    DIM ilen&, icnt&, slen%, s1%, s2%, s3%, s4%, s5%, q1%, q2%
    DIM oalb&, oaub&, ocnt&, flag%, ch%, nest%, spos&, epos&
    '--- so far return nothing ---
    ParseLine& = -1
    '--- init & check some runtime variables ---
    ilen& = LEN(inpLine$): icnt& = 1
    IF ilen& = 0 THEN EXIT FUNCTION
    slen% = LEN(sepChars$)
    IF slen% > 0 THEN s1% = ASC(sepChars$, 1)
    IF slen% > 1 THEN s2% = ASC(sepChars$, 2)
    IF slen% > 2 THEN s3% = ASC(sepChars$, 3)
    IF slen% > 3 THEN s4% = ASC(sepChars$, 4)
    IF slen% > 4 THEN s5% = ASC(sepChars$, 5)
    IF slen% > 5 THEN slen% = 5 'max. 5 chars, ignore the rest
    IF LEN(quoChars$) > 0 THEN q1% = ASC(quoChars$, 1): ELSE q1% = 34
    IF LEN(quoChars$) > 1 THEN q2% = ASC(quoChars$, 2): ELSE q2% = q1%
    oalb& = LBOUND(outArray$): oaub& = UBOUND(outArray$): ocnt& = oalb&
    '--- skip preceding separators ---
    plSkipSepas:
    flag% = 0
    WHILE icnt& <= ilen& AND NOT flag%
        ch% = ASC(inpLine$, icnt&)
        SELECT CASE slen%
            CASE 0: flag% = -1
            CASE 1: flag% = ch% <> s1%
            CASE 2: flag% = ch% <> s1% AND ch% <> s2%
            CASE 3: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3%
            CASE 4: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3% AND ch% <> s4%
            CASE 5: flag% = ch% <> s1% AND ch% <> s2% AND ch% <> s3% AND ch% <> s4% AND ch% <> s5%
        END SELECT
        icnt& = icnt& + 1
    WEND
    IF NOT flag% THEN 'nothing else? - then exit
        IF ocnt& > oalb& GOTO plEnd
        EXIT FUNCTION
    END IF
    '--- redim to clear array on 1st word/component ---
    IF ocnt& = oalb& THEN REDIM outArray$(oalb& TO oaub&)
    '--- expand array, if required ---
    plNextWord:
    IF ocnt& > oaub& THEN
        oaub& = oaub& + 10
        REDIM _PRESERVE outArray$(oalb& TO oaub&)
    END IF
    '--- get current word/component until next separator ---
    flag% = 0: nest% = 0: spos& = icnt& - 1
    WHILE icnt& <= ilen& AND NOT flag%
        IF ch% = q1% AND nest% = 0 THEN
            nest% = 1
        ELSEIF ch% = q1% AND nest% > 0 THEN
            nest% = nest% + 1
        ELSEIF ch% = q2% AND nest% > 0 THEN
            nest% = nest% - 1
        END IF
        ch% = ASC(inpLine$, icnt&)
        SELECT CASE slen%
            CASE 0: flag% = (nest% = 0 AND (ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
            CASE 1: flag% = (nest% = 0 AND (ch% = s1% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
            CASE 2: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
            CASE 3: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
            CASE 4: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = s4% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
            CASE 5: flag% = (nest% = 0 AND (ch% = s1% OR ch% = s2% OR ch% = s3% OR ch% = s4% OR ch% = s5% OR ch% = q1%)) OR (nest% = 1 AND ch% = q2%)
        END SELECT
        icnt& = icnt& + 1
    WEND
    epos& = icnt& - 1
    IF ASC(inpLine$, spos&) = q1% THEN spos& = spos& + 1
    outArray$(ocnt&) = MID$(inpLine$, spos&, epos& - spos&)
    ocnt& = ocnt& + 1
    '--- more words/components following? ---
    IF flag% AND ch% = q1% AND nest% = 0 GOTO plNextWord
    IF flag% GOTO plSkipSepas
    IF (ch% <> q1%) AND (ch% <> q2% OR nest% = 0) THEN outArray$(ocnt& - 1) = outArray$(ocnt& - 1) + CHR$(ch%)
    '--- final array size adjustment, then exit ---
    plEnd:
    IF ocnt& - 1 < minUB& THEN ocnt& = minUB& + 1
    REDIM _PRESERVE outArray$(oalb& TO (ocnt& - 1))
    ParseLine& = ocnt& - 1
    END FUNCTION
     

Attachments:
  
« Last Edit: September 25, 2021, 06:22:26 am by Junior Librarian »