Converting xml to json with a few nice touches

· Read in about 4 min · (710 words) ·

During my recent outings in heavyweight programming, one of the things we needed to do was converting a large XML structure from the server to JSON object on the browser to facilitate easy manipulation/inspection.

Also, the XML from the server was not the nice kind - what I mean is that tag names were consistent - but the content was wildly inconsistent. For ex, all of the following were recd:

<!-- different variations of a particular tag -->
<BgSize>100,23</BgSize>
<BgSize>0,0</BgSize>
<BgSize>,</BgSize>

Ideally, in this case, we wanted to parse and validate the node (and all its different variations) and convert it to an X,Y pair only if it was a valid data in it. Also, a lot of these were common tags as you might expect that showed up in various different entities in the XML, so we wanted that all these rules get applied sooner centrally rather than having to deal with them at disparate places later down the stream.

The other reason was that a lot of the nodes really had structured data crammed into a single tag - which we ideally wanted parsed as a javascript object so that we could manipulate it easily

<!-- xml data with structured content -->
<!-- font, size, color, bold, italic-->
<Font>Arial;Lucida,14,0x0044,True,False</Font>

So that brought up a search for the best way to convert XML to JSOn -and of course stackoverflow had a question. THe article in the answer makes for very interesting reading into all the different conditions that have to be handled. The associated script at http://goessner.net/download/prj/jsonxml/ is the solution I picked. Really not much going on below other than to use the xml2json function to convert the xml to a raw json object.

@parseXML2Json: (xmlstr) ->
    log xmlstr
    json = $.parseJSON (xml2json $.parseXML (xmlstr)
    destObj = Utils.__parseTypesInJson(json)
    log "raw and parsed objects", json, destObj
    return destObj

But now to the more interesting part - once the xml is converted to a JSON, we need to do our magic on top of it - of applying validations and conversions. This is where the Utils.__parseTypesInJson method comes in

What we’re doing here is walking through the JSON object recursively. At each step, we keep track of the path of the xml that we have descended into so that we can check the path and based on the path, apply validations or conversions. At each step, we also need to check the type of JSOn object we’re dealing with - starting with undefined, null, string, array or object

If its a string, we further delegate to a __parseString function to convert the string to an object if needed.

@__parseTypesInJson: (obj, path = "") ->
 if typeof obj is "undefined"
  return undefined
 else if obj is null
  return null
 else if typeof obj is "string"
  newObj =  Utils.__parseString(obj, path)
  validator = _.find Utils.CUSTOM_VALIDATORS, (v)->
  v.regex.test path
  return validator.fn(newObj)  if validator?
  return newObj
 else if Object.prototype.toString.call(obj) is '[object Array]'
  destObj = (Utils.__parseTypesInJson(o, path) for o,i in obj)
  destObj = _.reject destObj,  (obj) ->
  obj == null
  return destObj
 else if typeof obj is "object"
  destObj = {}
  destObj[k]  = Utils.__parseTypesInJson(obj[k],  "#{path}.#{k}") for k of obj
  validator = _.find Utils.CUSTOM_VALIDATORS, (v)->
  v.regex.test path
  return validator.fn(obj)  if validator?
  return destObj
 else
  return obj

At each step, once the object is formed, we see if there’s a custom validator defined in the array of custom Validators. Each validator is a regex and a callback function - if the regex matches the path, then the callback is passed the json object which it may manipulate before returning

@CUSTOM_VALIDATORS = [ choice =
                        regex: /choice$/
                        fn: (obj)->
                            if obj["#text"]?
                                return obj
                            else
                                log "returning null"
                                return null
                        ]

THe parseString method for completeness - you can really tweak this to your taste and there’s nothing complicated going on in this.

@__parseString : (str,  path) ->
    if not str?
        return str
    if _.any(Utils.SKIP_STRING_PARSING_REGEXES, (r)->
                                                    r.test path)
        log "Skipping string parsing for:" , path, str
        return  str
    if str
        if /^\d+$/.test str
        return parseInt str
    else if /^\d+,\d+$/.test str
        [first,second] = str.split(",")
        return  {"x": parseInt(first), "y": parseInt(second)}
    else if str == ','
        return null
    else if /^true$/i.test str
        return true
    else if /^false$/i.test str
        return false
    else if   /^[^,]+,\d+,(0x[0-9a-f]{0,6})?,((True|False),(True|False))?$/i.test str
        log "Matched font: ", str
        return  Utils.parseFontSpec(str)
    else
        return str