Monday, April 2, 2012

Some parsing fun

    So I ran into some issues the other day where I wanted to allow nested variable declarations  in a script parser I am working on. I'm sort of partial to ant and jsp var declarations that take the format of ${varname}, and have quite happily been using some regex to to handle these.
String varName = currentStringBuffer.toString().replaceFirst(".*\\$\\{(.+)}.*", "$1");
    Sure this is a little simplistic, but it worked, for the majority of cases. I would simply loop over the string until I had replaced all of the declarations in it with their appropriate values. Simplistic, but functional. Then I ran into a problem...

<server:when test=".['${currentModified}' != '${${attribute}}']"> 
    I was passing in a variable called attribute, who's value was intended to be the name of the variable to resolve. The first pass should convert ${${attribute}} to ${lastModified} as the attribute variable is equal to 'lastModified'. The second pass should then convert ${lastModifed} to the appropriate value.
  <server:when test=".['${currentModified}' != '${lastModified}']">
  <server:when test=".['1330459479383' != '1330459479383}']">
    Well, my regex just wouldn't handle it.  So it was time to do it properly. Just one problem, after some serious research on the net, I discovered that regexs, at least in Java, really aren't supposed to handle what are called Context Free Grammars.

    Searching the net I found a number of algorithms to make sure that the parenthesis were balanced, but nothing that was really useful for what I wanted, which was to replace all of the vars with values and keep the rest of the original string. After some adaptation, here's what I came up with, which may be of some help to people.

Stack<StringBuffer> stack = new Stack<StringBuffer>();
            StringBuffer currentStringBuffer = new StringBuffer();
            for (int index = 0; index < varStringBuffer.length(); index++)
            {
                
                if (varStringBuffer.charAt(index) == '$' && varStringBuffer.charAt(index+1) == '{')
                {
                    stack.push(currentStringBuffer);
                    currentStringBuffer = new StringBuffer();
                    currentStringBuffer.append(varStringBuffer.charAt(index));
                }
                else if (varStringBuffer.charAt(index) == '}' && varStringBuffer.charAt(index-1) != '\\' && stack.empty() == false)
                {
                    //pop, and evaluate
                    currentStringBuffer.append(varStringBuffer.charAt(index));
                    String varName = currentStringBuffer.toString().replaceFirst(".*\\$\\{(.+)}.*", "$1");
                    String value = getVarValue(varName);
                    if (value == null)
                    {
                        value = "";
                     }
                    currentStringBuffer = stack.pop();
                    currentStringBuffer.append(value);
                }
                else
                {                    currentStringBuffer.append(varStringBuffer.charAt(index));   
                } 
            }
  • There is a StringBuffer called varStringBuffer. This is where the original string is stored. 
  • We make a new StringBuffer called currentStringBuffer as a place to keep whatever we are working on. 
  • Walking the original, one char at a time, we append that char to the currentStringBuffer if it's not important. 
  • If it is the start of a var declaration, we push the currentStringBuffer onto the stack and allocate a new one. 
  • If we've found a closing bracket, it means there should be a variable declaration in the currentStringBuffer. 
  • We evaluate it, get the previous currentStringBuffer off of the stack. Then append our value to it. 
  • Rinse and repeat. 
There are at least two assumptions being made here. One is that the brackets will always be balanced. The second is that the resolution of a variable will never result in an additional variable declaration. We could probably handle the second by appending to the original StringBuffer, as opposed to the buffer we just popped off of the stack. The first assumption will just result in a failed variable resolution, which will probably throw some cryptic error someplace else. So unless we are feeling nice, we can ignore it for now and file it under Technical Debt.