Wednesday, August 03, 2005

SEXML

XML has the unfortunate side effect of being bulky. While bulk is good for the large intestine, its pretty much useless in computing. Which is why I devised SEXML. That's Simple Elemental XML. This is how it was derived: First, we get rid of attributes. What good are attributes for? They can easily be replaced by elements:
<order qty="12" price="13.5" brokerage="15%">
<order-type>buy</order-type>
<effective-time>market-close</effective-time>
</order>
can be rewritten as:
<order>
<qty>12</qty>
<price>13.5</price>
<brokerage>15%</brokerage>
<order-type>buy</order-type>
<effective-time>market-close</effective-time>
</order>
Next, we get rid of the irritating double tags. We accomplish this by turning <qty>12</qty> into qty<12>. The XML above becomes:
order<
qty<12>
price<13.5>
brokerage<15%>
order-type<buy>
effective-time<market-close>
>
To avoid confusion between XML and SEXML, and to enable embedding one in the other, lets replace <> with something else, say {} and while we are at it, put them on just one line:
order{qty{12}price{13.5}brokerage{15%}order-type{buy}effective-time{market-close}}
Perhaps we should separate the items with a comma, in case there are empty elements:
order{qty{12},price{13.5},brokerage{15%},order-type{buy},effective-time{market-close}}
There we go. If we had used parantheses, it would look too much like lisp and scare people off. Of course, you would put everything on one line only for use in URLs and the like.

The above has the same informational content of the XML above, but is much more compact and simpler to parse.

"But wait!" I hear screams. "What about schemas? What about the XML header? Character encoding? Entities?"

Nothing prevents you from using a schema. After all, it only describes structure and type information, not whether you use braces or angle brackets. Your schema just wont mention any attributes. As an exercise, try writing a schema for the above in SEXML.

As for the header, I dont see why you cant SEXML it up.

sexml {
 version{1.0}, encoding{utf-8},
 order{
     qty{12},price{13.5},brokerage{15%},order-type{buy},effective-time{market-close}
 }
}
Entities such as &lt; can live as is.

I believe CDATA is quite unnecessary for SEXML and in fact, seems to be rarely used in XML. If you believe otherwise, please enlighten me.

/*
Since the original reason the comment in xml became the unweildy <!-- --> to maintain compatibility with HTML, we can break it here to use C++ style comments. (Do you really need 4 chars to start a comment?)
*/

Lets see what it looks like with parantheses:

sexml (
 version(1.0), encoding(utf-8),
 order (
     qty(12),price(13.5),brokerage(15%),order-type(buy),effective-time(market-close)
 )
)

Guess the Lisp folks were right. Everything is a list of lists.

5 comments:

Anonymous said...

I find your proposal interesting but you still need a CDATA-like mechanism since without it you cannot include the parenthesis as legal content.

Maybe an escape mechanism a-la C \( \) could suffice.

Michael Chermside said...

Here are two not-quite-valid xhtml documents. Please convert them to your SEXML format.


<html><body><pre>This is in <i>italics</i>.</pre></body></html>

<html><body><pre>This
is
in
<i>italics</i>.</pre></body></html>

I think you'll discover a weakness in your proposed format.

code said...

html (
body (
pre (
This is in, i (italics)
)
)
)

html (
body (
pre (
This
is
in,
i (italics)
)
)
)

Actually, I have updated the format to use braces and replaced commas with {}

So the above becomes:
html { body { pre {
This is in {} i { italics }
} } }

I also have a parser less than 100 lines in java which supports entities. Eg: < &rt; for { }

Anonymous said...

What if you replace curly braces by newline+indentation? You'd have... YAML!

Which is a great alternative to XML, too.

code said...

The big reason to use SEXML for me was to easily pass it in a URL, ie. http://blogger.com/do?a=test{one{}two{}}