SQL based String Tokenizer – another approach using CONNECT BY

Just read a nice SQL puzzle on the AMIS blog called Writing a pure SQL based String Tokenizer and because Lucas wrote

Please share with me your thoughts on this – but I will share my attempt with you anyway

That’s why I thought I have to train my brain a little bit with a nice SQL statement… and here is my result using CONNECT BY instead of CUBE.

SELECT LEVEL
     , SUBSTR
         ( STRING_TO_TOKENIZE
         , DECODE(LEVEL, 1, 1, INSTR(STRING_TO_TOKENIZE, DELIMITER, 1, LEVEL-1)+1)
         , INSTR(STRING_TO_TOKENIZE, DELIMITER, 1, LEVEL) -
           DECODE(LEVEL, 1, 1, INSTR(STRING_TO_TOKENIZE, DELIMITER, 1, LEVEL-1)+1)
         )
  FROM ( SELECT '&String_To_Tokenize'||'&Delimiter' AS STRING_TO_TOKENIZE
              , '&Delimiter'                        AS DELIMITER
           FROM DUAL
       )
 CONNECT BY INSTR(STRING_TO_TOKENIZE, DELIMITER, 1, LEVEL)>0
 ORDER BY LEVEL ASC
;

Does anybody have another solution to solve this SQL puzzle with pure SQL?

5 thoughts on “SQL based String Tokenizer – another approach using CONNECT BY

  1. Hi,
    I found and already used a nice solution from O’Reillys “SQL Cookbook”.
    Those days I needed to convert delimited data into a multi-valued IN-List.
    I just changed the query a bit to work as a string tokenizer. See:

    SELECT RTRIM(
    SUBSTR( x.token
    , INSTR(x.token, ‘&Delimiter’, 1, iter.pos) + 1
    ,INSTR(x.token, ‘&Delimiter’, 1, iter.pos + 1) – INSTR(x.token, ‘&Delimiter’, 1, iter.pos)
    )
    , ‘&Delimiter’
    ) token
    , iter.pos AS Position
    FROM (SELECT ‘&Delimiter’ || ‘&StringToTokenize’ || ‘&Delimiter’ token
    FROM dual
    ) x
    , (SELECT rownum AS pos
    FROM all_objects
    ) iter
    WHERE iter.pos < = (LENGTH(x.token) - LENGTH(REPLACE(x.token, '&Delimiter'))) - 1;

    The key as explained in the book is to walk the whole string and then to tokenize it.
    The table used in the “iter” subquery must have at least as much records as the string to tokenize has tokens.

    Hope it helps.

  2. Hi, I must copy the attributes from one table to another but one of this attributes must be tokenized before the insert in the second table, for example:

    in the first table we have this content in an attribute
    a1.b2.c3.d4.e5.f6.g7.h8.i9

    in the second table the insert must stop at the 7th token like this
    a1.b2.c3.d4.e5.f6.g7

    do you have any solution?

    Thanks a lot…

  3. Raffaele,

    use the above SQL statement and adapt it to just return 7 tokens. eg:

    SELECT * FROM ( original sql ) WHERE LEVEL < = 7

    Patrick

  4. Hi, I need to tokenize a string from reverse, the reverse function does not work as the string has numbers in it like ’06,05′ becomes ’50,60′ but i need ’05,06′, please help , thxs

Comments are closed.