суббота, 18 июля 2015 г.

ToDo. Changing text EVD parsing to parsing of special axiomatic scripts

Original in Russian: http://programmingmindstream.blogspot.ru/2015/07/todo-evd.html


This is a follow-up to - ToDo. Parsing the “old patterns” using parse tree forming (in Russian).

Actually, everything fits within the common scheme there.

Since, historically, scripts are rooted in EVD, at the moment, “circle has been closed”.

I promise to update you on the results.

The whole axiomatic fits within the current EVD-scheme AND processing of reserved characters like = { } and %XXX.

We would define them as IMMEDIATE-characters that use the stack and call the interface of the EVD generator.

Moreover (+) we would have to “lay a trap” in the form of UnknownWordHook and trap “unknown” tokens with it.

This “trap” would determine which token is passed (basing on the existing EVD-scheme I have already published) and what are the stack contents and call appropriate methods of generator interface.

We should keep in mind CheckBrackets in parser.

May be, we’d get to HTML and RTF afterwards.

However, most likely, we’d have to do non-greedy quantification or, possibly, recursive parsing in the active token.

Perhaps we can simply set the lists WordChars, DelimChars and SpaceChars correctly.

Perhaps these lists can change depending on the stack machine contents.

We have to do something with it.

On the contrary, XML freely fits within the existing scheme by means of formal grammar. But we do not really need it now.

Afterwards, it would be possible to make over the parsing of binary EVD by adding a special parser to pick out the “tokens” with fixed structure.

Then, we can think about another binary formats like DOC, etc.

Actually, quite the same stack structures are used all over the place there or processing the token with mandatory parameters on the right.

In other words, there are not many binary formats namely of the following kinds:

1. Either XML:
<a>
 <b>
 ...
 </b>
</a>

2. Or asm:
[prefix1 .. prefixK] instruction I1 [param1 .. paramM]
...
[prefix1 .. prefixK] instruction IN [param1 .. paramM]

3. Or the combination.

In this context, the less preferable format is HTML.

Moreover, usually, we DO NOT NEED to do “backtracking” (Backtracking) of the data we’ve already parsed.

Generally, people don’t bother themselves with data formats and prefer not to deal with backtracking.

If, however, backtracking is possible, then we have a point of stack fixation and a point of “returning back to the fixed value” or “value resetting”.

In other words, we merely need the “second stack” or “stack of stacks”.

It does not seem to be possible.

Namely:

Input flow parser for tokens -> Stack machine (+) Axiomatics -> Filter1 .. FilterN -> Generator

We also extend the scheme – Text processing. Generators, filters, transformers and "SAX developed on its own" (in Russian).

1. Parser splits the input stream for minimal tokens.
2. Stack machine converts the set of tokens into the parse tree.
3. Filters transform the parse tree.
4. Generator converts the parse tree to target language.

It corresponds much with (in Russian):

"Болье Л. Методы построения компиляторов. В кн.: Языки программирования /Под ред.Ф.Женюи, м., Мир, 1972, с.87-276.Научная библиотека диссертаций и авторефератов disserCat http://www.dissercat.com/content/razrabotka-adaptivnogo-metoda-postroeniya-i-organizatsii-kross-kompilyatorov-protsedurno-ori#ixzz3g5rXwthj"

In two or so years I’d have a “universal parsing”. If anyone would need it by that time.

Actually, the time has come for me to translate implementation of “my scripts” to some “sane” and “immortal” language like C++.

Again, do not forget about stl and boost as well as cross-platforms.

If anybody is interested in this project – come join me.

Комментариев нет:

Отправить комментарий