DEMML Logo
DEMML

Distributable Educational Material Markup LanguageTM

News:

First Alpha version of schema published.

Though it is still rough and only covers the fundamental constituents of a DEMML™ topic, the DEMML_0.1 schema is available for viewing here.

Created DEMML™ blog site.

It took me a while to get around to creating a blog but it is finally up. (Updated July 8, 2009)
click here...

Added new Features and Benefits page.

DEMML is truely unique but I seem to have a hard time getting people to see that. Hopefully this will help. (Updated Dec. 10, 2007)
full story...

New Powerpoint about Communications Systems

full story...


History:

How DEMML™ was Invented

Necessity truly is the mother of invention.
full story...

Assigning the Codes to all the Branches of the Tree

Once the structure of the tree has been decided upon, all those code numbers can be assigned. The way this is done is yet another unique aspect of DEMCS™ that may take some getting used to. When most people assign code numbers to a list of things they just start at 1 and count up from there. This is quick, easy and simple. Unfortunately it leads to problems later down the line. Even with all the work that will have been done to define the tree in as much detail as possible there are still going to be things that need to be added to the list as time goes on. Most of those things will need to be squeezed in between other things that already have code numbers. If everything has been numbered sequentially then the only way to squeeze something in between is to add another character to the end of the codes for that level. On the other hand, if the codes used are spread out evenly among the range of values available then there will be plenty of room in between used codes for adding in more things.

Diagram showing list with sequential codes and how it is hard to add things in between next to diagram showing spread out codes and how easy it is to add things in between.

Generating Subject Code Numbers

For this discussion, it is important to understand the context within which we are assigning the numbers. Within any one parent-subject there are a number of child-subjects. For this discussion, we do not care about the code number for the parent-subject. It already has its number assigned based on either the LCC system or previous iterations of this very procedure. At this point, all we are concerned with is the numbering of all the child-subjects so as to distinguish them from their siblings and allow room for additional siblings to be added later.

Choosing the Number of Digits to Use

Before we can start assigning numbers we have to figure out how many digits to use. As stated above, if we just start at 1 and count up from there, then it would be natural to just add an additional digit once we run out of variations using only one digit. In other words, once we hit 9 we add a digit and move up to 10. But in DEMCS™ we want to spread the code numbers used evenly over the range of possible values. To do this, we need to know how many digits there will be. Fortunately that is a relatively simple process.

  1. Determine the estimated maximum number of child-subjects that will be children of that subject.
  2. Express that as a base-36 number.
  3. Count the number of base-36 digits.
  4. Take 36n - 1 to determine how many different combinations can be expressed by that many digits, not counting zero (where n is the number of digits counted in step 3).
  5. If the estimated maximum number of child-subjects is well under the number of different possible combinations then use that many digits.
  6. If the estimated maximum number of sub-subjects is within about 10 or 20% of the number of different possible combinations then use one more digit. This will leave more room within the range of possible values and lead to better consistency down the line. Always assume there may be more additional child-subjects created later.

Since most subjects will only have a dozen or so different child-subjects, the code for most subject branches will likely be only one character. Since a two-digit base-36 number has 1,296 different possibilities, it is highly unlikely that any subject code will be more than two characters. This means that a 10 level subject tree with only 10 child-subjects per subject would still have10,000,000,000 different individual topics and only use up 20 characters in the path name including the slashes. A 10 level subject tree with just 15 child-subjects per subject would create 5.76 x 1011 different subjects while still only using up that same 20 characters in the path name. This would still leave room for an additional 3610 - 1510 or 3.66 x 1015 (or 3,655,581,790,000,000) topics to be added later without disturbing the existing topics. Again, all within the same 20 character code number.

Determining the code for each child-subject

The next step is to choose the actual codes to use. The whole idea is to spread out the codes used throughout the possible values so that there is plenty of room between codes to insert things later if necessary. You want to avoid using the absolute first or last code because you could not squeeze anything past those ends. The following algorithm seems complicated but a simple chart can tell people what codes to use for the most commonly occurring number of sub-subjects.

  1. If the order of the sub-subjects matters then sort them according to whatever logical order is desired.
  2. The code for any subject in the list is {[(36D -1) / (N+1)] * n} rounded up to the nearest integer then expressed as a base 36 number.
      • D = the number of digits determined above.
      • N = the total number of child-subjects
      • n = the position of any particular sub-subject in the list starting with 1.

The following table shows the code numbers to use for up to 20 different child-subjects.

   
Number of Subjects
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N
th

S
u
b
j
e
c
t

1 i c 9 7 6 5 4 4 4 3 3 3 3 2 2 2 2 2 2 2
2   n i e c a 9 8 7 6 6 5 5 5 4 4 4 4 4 3
3     q l i f d c b a 9 8 8 7 7 6 6 6 5 5
4       s n k i g e d c b a 9 9 8 8 7 7 7
5         u p m j i g f e d c b a a 9 9 8
6           u q n l j i g f e d c c b b a
7             v r p m k j i g f e e d c c
8               v s q n m k j i h g f e d
9                 w t q o n l k j i h g f
10                   w u r p n m l j i i h
11                     w u s q o n l k j i
12                       w u s q p n m l k
13                         x u s r p o n m
14                           x v u r q p n
15                             x v t s q p
16                               x v t s r
17                                 x v u s
18                                   x w u
19                                     x w
20                                       x

Notice that once the number of subjects reaches 15 that there is only one unused value between most subject codes. When there are 20 different subjects then many consecutive subjects have no unused values between them. This means there is no room to insert additional codes without adding a digit to the number. Therefore, it makes sense to try to keep the number of child-subjects below 15 when building the list of child-subjects for any one subject. If there are more than 20 different child-subjects then it may be time to move up to 2-digit code numbers. On the other hand, it may be better to rethink the organization of the tree so that there aren't so many different child-subjects. Some subjects may require thousands of child-subjects, but most do not.

Adding Additional Child Subjects Later

When the time comes to insert additional child-subjects for a given branch in the tree, new numbers are chosen which lie between the appropriate two pre-existing child-subjects.

If all the possible positions are used up, or if something absolutely needs to be inserted between two sequentially numbered child-subjects, then a digit is added the code numbers are spread out as above.

For instance: If there are already child-subjects coded 3D and 3E and an additional child-subject absolutely must be squeezed in between them, then simply add a digit and code that new subject as 3DH which will be half-way between 3D and 3E just as if they were 3D0 and 3E0. Remember that in base-36 numbers we use the digits 0-9 before using letters. H is the 8th letter but the 18th base-36 digit. If you need to squeeze 2 subjects in between 3D and 3E then you would use 3DB and 3DM because B and M are the 12th and 23rd base-36 digits.

Only add additional digits where they are needed. Just because you created 3DH is no reason why all the other codes in a branch need to have 3 digits. They will all still sort alphabetically anyway and 3DH will show up right between 3D and 3E as you would expect. This means it is perfectly legal to have a different number of digits for different child-subjects within a branch. The point is to keep the total number of characters used in a path-name to a minimum.

Next: Final Notes...

First Published: May 15, 2007 — Last Modified: Dec. 7, 2009
DEMML Logo  About Us | Contact Us | ©2010 Grant Sheridan Robertson