Data Management Blog: Code lists, parsers, and passes

Tuesday

Code lists, parsers, and passes

I was reading about the Genericode work being done and started to get a bit worried. First, the theory of what they are doing is very good. The problem of how to create flexible, validatable, and extensible code lists has been a perennial problem for schema designers. None of the solutions tried have the feel of a "best practice".

What the Genericode folks are saying is that the code lists need to be handled differently. They need to have a design that will take the validation out of the role of the parser and into a second layer of validation. This second layer would be one in which management of code lists could once and for all have a best practice and meet all requirements (flexibility, validatability, and extensibility).

I have no problem with their identification of the problem nor their solution. The problem I had was wondering how my customers would react to the solution. I've heard consistently that they want the parser to do more and to do it in one pass. In short, put as much of the data structure integrity and validation workload on the parser in one pass. This goes against the trend toward two-stage validation as described by Genericode. I had a similar concern years ago when OAGIS proposed two -stage validation as a way to add additional restrictions onto a schema using the Schematron standard.

So I find myself in the unenviable position of liking the thinking behind these trends that are pulling workload away from the parser in order to solve some perennial problems. However, I hear from clients that they are interested in putting more workload onto the parser and would never go to the effort needed to do second stage validation.

Ultimately of course the clients decide for all of us, no matter if they are clients of an individual consultant or one of a large ERP vendor. I think there will need to be some real training before the marketplace will be ready to decide what is right and what is not. In the meantime, I hear a clear message from clients to make the parser do more work and not less.

2 comments:

Unknown5:22 PM EST
I'm the editor of the OASIS genericode specification. The 2-pass validation is proposed as a way to do genericode-based code list validation in conjunction with existing XML validating parsers. It's actually consistent with application of business rules engines post-validation to check rules that XML schemas can't check.

That said, the genericode specification and committee don't have a fixed opinion on this. If someone builds a piece of software that does schema and genericode validation in a single pass, that will be great. People worry too much about how many passes have to be done. The important questions are (i) how long does the total validation take, (ii) how big a percentage of your total end-to-end business process time is taken up by validation, and (iii) how much time can you same in writing validation code by moving data validation out into separate layers that simplify the coding and debugging of core application code.

If someone wants to use genericode files as a way to generate XML Schemas that just contain simple type enumerations for code lists, that is also fine, as long as they have an appropriate process for managing change to the code lists and republication of the affected Schemas to users and/or systems. There is no right or wrong way. Genericode is just a way to communicate that code list information, and I'll be disappointed if we don't end up with a few different ways of using that code list information, so that users can choose the approach that best suits their particular situation.

Cheers, Tony.
http://kontrawize.blogs.com/kontrawize/
http://www.genericode.org/
ReplyDelete
Replies
Paul12:34 PM EST
I agree with you Tony that the important questions are as you stated. This is spot on. Using application logic to validate this code data is not the best approach.

I may respectfully disagree with your characterization that people worry too much about how many passes however. Simplicity often wins whether it is the best method or not. This just matters.

There are some folks looking at single pass validation of code lists in a different light. And some of the architects in OAGi are doing similar.

I'll end with total agreement with your statement "I'll be disappointed if we don't end up with a few different ways of using that code list information, so that users can choose the approach that best suits their particular situation" So true. I'll be learning and reading up on genericode as I do with other approaches.
Paul
ReplyDelete
Replies

Add comment

Pages

Tuesday

Code lists, parsers, and passes

2 comments: