You need to be logged in to post in the forums. If you do not have an account, please sign up first.
Mathematics in X(HT)ML
Recently I wrote XML DTD that can be used to put mathematical articles on web, but it turned out that a lot of users prefer to have some XHTML based solution rather then pure XML one.So after some discussions on this issue I decided to write mathematics oriented XHTML extension, that will enrich XHTML with set of new elements/attributes that will be able to capture basic structure of mathematical formulae and general structure of math articles.
Generally speaking it is difficult to do. For example W3C math WD wasted more then ten years on HTML+ and MathML markup languages but failed to write reasonable DTD that would address needs of mathematical community.
There are a lot of other markup languages that can be used to publish mathematical articles but so far none of them managed to gain momentum. So I decided to try it once again.
But now I am not alone, together with Grand Moose and Pragma Inline we decided to join forces and write new XHTML based markup language that will be able to carry mathematical formulae. It will be entirely based on web standards like XML, XHTML, Unicode and CSS.
Currently we have to agree on basic principles of this markup language and to determine who is interested in issue and who else can help us. Later we will switch on more concrete tasks.
Progress will be documented on Moose's site
http://www.literarymoose.info/-/category/css-mathematics
(and may be on some other sites that will be posted later).
Most of related discussions will be public and maybe will take place on forum (if Moose and PI agree).
We hope that Opera community will actively participate in process.
I uploaded quick remade of your example: http://kwi.pb.bialystok.pl/~sdkf/firem/mathxml/zxml_pi.xml . And also sent you minor question hoping you could reply before I finish my work.
pi
Java Sun Java Runtime Environment version 5.0
Currently ignoring: MidNiteRaver, lazik_s, operetka.
The purpose of gallery is just to provide markup samples, and your sample is interesting because it uses separate markup for powers and upper indices, this is why I think it should be included in gallary.
Once I'll update gallery link will be posted here.
Thanks once again.
http://www.geocities.com/csssite/gal.xml
that basically contains one simple sample page written in different markup languages from more then ten years old HTML+ to XML MAIDEN released just several hours ago (at the moment gallery does not contain ultrasemantical languages like content MathML, OpenMath and OMDoc).
Sample 7 was prepared by Tomasz Wojcikowski (PragmaInline) while TPML sample was corrected by Philip Feinsilver.
@PI If something should be changed on gal.xml page (link, title of Sample7 etc.) let me know.
only few notes. The link to my samples is also the link to working draft - the content will be changing constantly. I don't know if that is what you want. Also if you could add to the title my name (Tomasz Wójcikowski with a 0x00F3 character inside) I would be very happy. Thanks.
pi
Java Sun Java Runtime Environment version 5.0
Currently ignoring: MidNiteRaver, lazik_s, operetka.
I've noticed that in most languages the dividers are "misaligned". Is that a language-specific problem or just a styling problem?
For HTML+ it is in some sense language specific problem as it is hard to design style sheets that keeps them aligned correctly, for the rest it is "a matter of styles to be taken care". That pages are rendered with minimal style sheet that does not take care of some subtle issues.
@PI I will correct page ASAP.
Edit. Page is corrected, alignment is also corrected (for all samples excluding HTML+)
I. First of all we have to design XML markup that will be used to capture basic
structure and probably some general semantics of mathematical formulae.
I think we agree that markup must fit well in general scope of current web standards,
in particular XHTML, XML, Unicode and CSS. I also assume that we agree that markup
must be human processable (in fact markup language is human processable by definition,
but since not everybody agrees with this definition I prefer to state it explicitly).
We have to determine is what mathematical issues should be addressed first,
as requirements of mathematicians vary from very essential things like indices and fractions
to highly sophisticated constructions that require extra efforts.
My suggestion is to start with most essential things that colud be
1. subscripts, superscripts
2. fractions
3. common operators like sums, integrals, products etc.
4. under and over scripts
5. matrices, vectors and similar stuff
6. brackets and fences
7. radicals
I expect that rendering of diacritical marks will be handled via
Unicode standard (combining diacritical marks and precomposed glyphs)
so no special markup for is necessary to settle this issue.
We also have agree to what extent markup should capture semantical information.
For example in most of markup languages listed in gallery there is only one
element for supersripts and one for subscripts, if it is necessary to provide more
detailed semantical information (for example to distinguish powers, upper indices
and tokens, or prescripts and postscripts) then one can add semantically oriented
attributes, but in PI's sample this is done by introducing different elements for
different superscripts. We have to decide which way to go. Both ways have their
(dis)advantages.
II. Second, we have to determine what markup should be used to outline document structure
(headers, paragraphs, theorems, remarks, references). Shoud it be pure XHTML?
If so which one (1.0|1.1|2.0)? Or should we introduce extra markup?
If so should it be done by intoducing new elements, or just adding attributes is enough?
III. If we will reach agreement on first two issues, we will be able to write DTD.
Here we also have to determine what kind of DTD we need. Should it be strict DTD that
accurately describes markup language, should it be more liberal but slightly inconsistent
one (like XHTML DTDs that allow you to write things like
<span><object><div>Block inside inline</div></object></div> or
<a href="back.html"><span><a href="forward.html">click me</a></span></a>)
or we don't need DTD at all. Of course there are other options like
RELAX NG and XML Schema, but I think DTD is needed in any case as notion of validity in
XML is defined by DTD.
IV. Last but not the least thing is to write style sheets, samples, tutorials and maybe
some other resources.
I expect that we will write CSS style sheets for Opera, Safari and maybe Mozilla,
XSLT + CSS for M$IE, Mozilla and cross browser one that could be used on server side.
V. If we manage to write something decent then we
can think about formal standartization of DTD (maybe via OASIS, or otherwise).
This is basically my vision on issue. Let me know what do you think and what else can we do.
Also it is better to know in advance how much time we plan to waste on all this stuff.
I hope to tackle it in several months, what do ya think?
Yes, our main tool should be unicode, but I have problems with finding font that covers all the characters. So the user will have to download it before they can see anything (as with MathML in Gecko)?
I have provided separate markup for power because I believe that it differs from mere index - power index is only an abbreviation (a^3 for a*a*a).
Ad.II. I think that extending XHTML 1.1 is a thought to consider - mixing common markup (p, h1, h2, del, ins) but then is the problem of namespaces, validation etc. Pure XML is pretty straight forward - developing language, then DTD(optional), then CSS.
Ad.III. If we create language for describing mathematics let the rules be strict.
Ad.IV. I have lost hope for cross browser stylesheet. Even Gecko is driving me crazy.
So, these are my <money country="pl" unit="0.01">3</money>.
pi
Java Sun Java Runtime Environment version 5.0
Currently ignoring: MidNiteRaver, lazik_s, operetka.
Ad.I. My idea of whole project is to keep it as simple (in language structure) as posible.
Agree
But that brings the question: what are we trying to do?
There is no way one could catch up all semantics mathematic equation has.
Definitely we can not capture all semantics. I prefer to have is simple
core markup that reflects basic structure of math expression and some mechanism that
allows to enrich this general structure with semantics. For example
ax<sup>2</sup> + bx + c = 0
is quite sufficient to reflect general structure of equation in the way
suitable for further rendering with style sheets.
But the question is should markup carry additional information,
for example should it say that 2 is power? If so should it be done via semantical
attributes
ax<sup role="power">2</sup> + bx + c = 0
or we have to reserve elements for this purpose?
ax<pow>2</pow> + bx + c = 0
If so to what extend should we go?
Should we introduce separate elements for prescripts?
The problem with attributes is verbosity that people try to avoid.
The problem with elements is that they make markup fragile. When one introduces
different elements for objects that are structurally and presentationally the same
then it is difficult to track errors in document. For instance, suppose we use
different notations like 'pow' for powers and 'sup' other superscripts.
Then someone will definetely abuse notations and either intentionally or
unintentionally write something like
ax<sup>2</sup> + bx + c = 0
So what it is? Markup says that 2 is not power, when in fact it is.
Thus we end up with situation when markup provides detailed but unreliable semantics.
In my opinion it is better to keep semantics general but reliable.
In case of attribute based solution this is not a problem but the problem is
verbosity that basically means that practically no one will attach semantical attributes.
I think that general structure of mathemathics lies in difference between
variables and operators. The common operators are quite simple (=, +, -, power, factorial,
determinal) and they can be easily transformed into markup.
But later on one get into trouble: how to describe lower and upper brackets,
graphs, logical expressions (they have the meaning by themselfs,
is the strucure needed then? <if>a</if><then>b</then> or a => b).
I think it is better to put as much burden on Unicode as possible,
in this way one can keep markup simple and readable.
a => b should be quite sufficent as corresponding Unicode character is reserved
for this purpose.
Yes, our main tool should be unicode, but I have problems with finding font that
covers all the characters. So the user will have to download it before they can
see anything (as with MathML in Gecko)?
Yep, it is common problem for everyone. But it is the matter of time,
as strategic direction is clear, Unicode slowly but definetely gains momentum.
We don't yet have good quality fonts with really comprehensive coverage of math ranges.
STIX project should produce these fonts http://www.stixfonts.org but they are not ready yet
(I hope they will be ready later this year).
I have provided separate markup for power because I believe that it differs from mere index -
power index is only an abbreviation (a^3 for a*a*a).
Yep, it differs. The only question here is error tracking.
Do we need general reliable markup like in LaTeX, EMS etc.
or more specific but fragile one.
Ad.II. I think that extending XHTML 1.1 is a thought to consider - mixing common markup
(p, h1, h2, del, ins) but then is the problem of namespaces, validation etc.
Yep, problem with elements can be solved by declaring appropriate namespaces
in DTD then source will be free from namespace prefixes. But ther problem
with attributes remains, they must be prefixed in any case (if we want to use XHTML
attributes on custom elements). So in overall markup will not look uniform.
Thus on document structure level it is probably better to keep XHTML as is.
Pure XML is pretty straight forward - developing language, then DTD(optional), then CSS.
True. But when it comes to things that they besides abilities of CSS one encounters
a lot of problems. Simpliest sample is hyperlink. The only bullet proof way to link
one XML document to another is to use XHTML hyperlinks as XLink seems to be dead.
Ad.III. If we create language for describing mathematics let the rules be strict.
Agree. But it requires extra efforts. DTD imposes some constraints on architecture
of markup language. For example suppose you prohibit nesting matrices in indices
(logical as indices may be nested almost anywhere while matrices due to natural
reasons can not be). Then you can not introduce general purpose elements that
may be nested anywhere and may contain anything.
Apparently if you prohibit <sup><matrix>...</matrix></sup>
but allow <group><matrix>...</matrix></group> and <sup><group>...</group></sup>
then DTD will be consistent on parent/child level but broken on ancestor/descendant level
<sup><group><matrix>...</matrix></group></sup>.
So in this case writting DTD requires extra efforts and in some cases one has
to introduce two or more elements that carry the same semantics but differ by content model.
This is how XML MAIDEN is constructed. For example it defines two kind of fractions
normal ones and compact ones, they basically differ by content model.
In addition holes that alrady exist in XHTML DTD and that were mentioned
above will remain uncovered (as we can't drastically change W3C's DTD).
In overall I also prefer to keep DTD consistent, even if it requires extra efforts.
Ad.IV. I have lost hope for cross browser stylesheet. Even Gecko is driving me crazy.
Take a look at http://www.freepgs.com/maiden/xsl/ in MSIE6 (or 5.5 with MSXML 3.0 parser).
Style sheet is unfinished, may contain tons of errors as idea was abandoned on early stage (we can reuse it once we finish work on markup) but in principle it is possible (note that main problems
are Unicode related, not CSS).
Of course XSLT is not perfect solution but do MSIE users deserve more?
For what it's worth, I spent a little time on a related project.
The plan was to make use of existing HTML as best as possible to code math that
was readable in a wide variety of browsers, and to use CSS to make it beautiful.
Sounds like most of you have also tried this, but here's my effort:
http://www.goldang.org/docs/math/math.html
I also put some pressure on browser writers to clean up their act. I think I
had some effect on Opera and Mozilla (the other major player has in no way
responded).
It would be nice if you'll take part in discussions.
Our goal is to enrich XHTML with simple XML markup needed to outline basic structure of mathematical formulae.
From formatting point of view XHTML itself is OK
http://www.geocities.com/csssite/zxhtml.xml
but it is not the best solution for authoring as it does not have math oriented elements. Using general purpose containers like
<span class="fraction">
<span>numerator</span>
<span>denominator</span>
</span>
is possible and quite viable solution of course but it
makes markup more difficult to read, edit, search, transform, validate etc.
So original idea was to replace XHTML with XML that is more simple and allows us to use more convenient notations. XML MAIDEN was designed for this purpose.
http://www.geocities.com/csssite/zxml.xml
But most of people prefer to have some XHTML extension rather then pure XML solution, and currently we want to add math oriented XML markup to XHTML. Of course we want to preserve basic principles of XHTML and keep markup human processable and compatible with core web standards like CSS, DOM etc. In this case MathML is not an option as it is neither human processable nor CSS compatible.
http://www.geocities.com/csssite/zmathml.xml
First of all it would be nice to reach agreement on basic architecture of DTD.
In particular we can start with the following two questions:
1. Do we need to modify XHTML on document structure level?
In other words will we use
<div class="theorem">...</div>
or
<theorem>...</theorem>
One opinion is that on document level it is better to preserve XHTML as is.
In this way one can avoid a lot of troubles with namespaces, in addition
it is better to use <div class="theorem">...</div> as browser knows that
div is block level element, while <theorem> will be displayed inline (until style sheet loads).
So possible way is not to touch XHTML (on document level).
Sample of document written in this manner is available at
http://geocities.com/csssite/xhtml.xml
Another way is to introduce new elements for stuff like theorems, proofs,
statements, remarks, abstracts etc. We can invent this elements ourselves or
reuse existing notations used in EMS, DocBook, TEI etc.
Such an elements essentially enrich markup but the problem is that ugly namespace prefixes (damn them) would inevitably emerge everywhere.
We have to decide which way to go (or cover both directions).
Also we have to decide whether we will stick to XHTML only or
allow users to combine markup with other markup languages (DocBook, TEI etc.)
In this case it is better to decide in advance which markup languages to keep in mind
(to avoid possible namespace overlaps).
2. What kind of DTD we need?
That is another important question.
Let us temporary suppose that we write DTD that renders only indices and fractions.
Then it is easier to provide concrete samples. I have two questions here.
2.1 Do we need strict or liberal DTD?
For example do we need DTD that says you can nest elements
wherever you want (you can nest fractions inside nested indices,
nest fractions infinitely in inline equations etc.)
or strict DTD that says you may not nest fractions inside nested
indices like <m>base<t>superscript<t>nested superscript<f>...</f></t></t></m>
in inline mode you may not nest fractions into fractions like <m><f><r><f>...</f></r><r>den</r></f></m>
etc. I hope you understand what I mean.
The problem with liberal DTD is that it is
not quite consistent and in addition it is impossible to implement it (to write more or less general style sheet), also if DTD does not describe language accurately it is necessary to write long specs that fill all gaps.
However it is easier to write such a DTD and for average user it is easier to use it.
Simple sample of such DTD is available at
http://geocities.com/csssite/temp/draft1.xml
(ISO EMS is written in this manner, MathML is written in this manner but is broken much heavily then EMS, XHTML or any normal markup language).
Strict DTD is more preferable since it describes language more accurately
and is much easier to implement. Also it tracks possible errors in markup more effectively.
The problem is that it is more difficult to write it and for average user it is
more difficult to understand it.
Example of acurate DTD is here
http://geocities.com/csssite/temp/draft2.xml
(XML MAIDEN is written in more or less this manner)
Note that here we have two type of fractions: normal (f) and compact (c);
and two type of indices: normal (l|t) and extended (xl|xt).
Normal indices may be nested anywhere to any level, but may not contain
fractions, extended indices may appear only in block level equations and
normal fractions (can not be used in inline equations as there is no space
among lines too accomodate indices with fractions or possibly other complex content inside,
and can not be used in compact fractions for the same reason) but may carry complex content
including compact fractions. Note that normal and compact fractions are distinguished
in other languages too, some LaTeX classes define two environments \frac{}{} and
\case{}{} and ISO EMS uses two type of fractions and two type of indices (they are distinguished by additional attributes ("built|compact" and "compact|stagger" for indices), there difference is presentational in draft2 difference between fraction is mainly structural (content model).
Possible solution is to write DTD that at user option can validate document in both strict manner and liberal way (when necessary).
Sample of such a dtd (no annotations yet) is available at
http://geocities.com/csssite/xhtml.txt
2.2 To what extent we have to capture semantics?
This is another important issue on which I would like to receive explicit answer.
Should we distinguish between fractions and derivatives, upper indices, powers, raised prescripts and raised tokens, lower indices, lower prescripts and lowered tokens?
To illustrate what I mean look at
http://geocities.com/csssite/temp/draft3.xml
It uses separate notations for fractions (f|c) derivatives (der), upper indices (t), powers (pow), raised prescripts (pt), raised postscripts (top), lower indices (t), lowered prescripts (pl), lowered postscripts (low) and extended indices (xl|xt) that may carry any content. Should we go in this way?
There is one definite benefit: structural markup is enriched with semantics.
The problem is that this semantics is not reliable as it is easy to made mistake
and there is no mechanism to detect such a mistakes during formal validation.
Also it is not hard to imagine how avarage users will abuse notations.
Again possible solution is to write compound DTD that at user option distingushes
between different types of indices. Then those user that does not want or can not
(e.g. transformations from LaTeX to XML) distinfguish powers and superscripts
may use general notations while ulta semantical radicals can turn on extra elements when necessary.
Any opinions on these issues would be appreciated.
http://geocities.com/csssite/index.xml
now they include XHTML samples too.
New style sheet that renders maths as plain Unicode text is added.
DTD is slightly modified now. Changelog for DTD looks as follows
1. core attributes: ref, refs, token and url added
2. xmlns attribute added to root element
(namespace is now http://google.com/search?q=xml-maiden)
3. markup for tables is simplified and resembles xhtml.
4. elements br, b and i remaned to lf, bf and it to avoid
overlap with xhtml namespace.
5. markup for indices is enriched with prescripts and conditional postscripts. Base of indices is better defined now and writing style sheets is easier.
6. empty element 'rad' added to carry simple radicals (experimental)
7. breach in XML MAIDEN 1.0 DTD that allowed to embed block level elements 'box' and 'seq' into inline container 'env' is fixed now. New inline contaioner 'enc' is introduced.
8. DTD tutorial updated and several typos fixed in annotated DTD
9. compound DTD now can combine XML MAIDEN with XHTML 1.1 and XHTML Basic 1.0 (XHTML 1.0 Strict is still there).
New DTD is fully integrated with CSS2 in sense that native support can be completely replaced with default CSS style sheet.
With Tomasz Wójcikowski we try to find out what else can be done with XML and CSS but so far there is no tangible progress comparing to what is already known.
Needless to say there is still no joy with STIX fonts and buggy browsers.
I am not expert in CSS. I would like to write some HTML articles with
mathematic expression so i am looking for related information. By means this topic
I have found the useful links here http://freewebtown.com/csssite/rel.xml
Do you know the additional links to tutorials about CSS mathematic ?
I need some step by step instructions about like html pages development
There are some general articles about formatting XML with CSS
http://tech.irt.org/articles/js198/
http://www.w3.org/Style/styling-XML
http://www-106.ibm.com/developerworks/xml/library/x-tipcss2/
http://developer.apple.com/internet/css/xmltransformations.html
http://www.w3.org/People/Janne/porject/paper.html
http://academ.hvcc.edu/~kantopet/xml/index.php?page=xml+and+css
but I don't know any additional resources about rendering mathematics with CSS. If someone knows please post them here.
If you have any concrete questions fill free to post them and we will try to help. Also let us know what kind of math expression you need to embedd in web page, then we will be able to say something more specific.
Just one comment. The standard in scientific journals is that math variables are in italic(roman), while numbers are in normal non-italic font. Is it possible to make the style sheet distinguish between numbers and variables, or do I have to put in the italic manually?
Being used to read scientific papers, there was something wrong about the layout, and it took a while to realize that it was the missing italic on variables.
The standard in scientific journals is that math variables are in italic(roman), while numbers are in normal non-italic font. Is it possible to make the style sheet distinguish between numbers and variables, or do I have to put in the italic manually?
I see, LaTeX formatters usually put letters in italic and keep numbers/operators in roman. In this respect current web standards are slightly awkward.
W3C Math WG 'addresses' this issue by putting individual characters in tags that makes formulae difficult to read and edit: <mn>2</mn><mo>+</mo><mn>2</mn><mo>=</mo><mn>4</mn>
Unicode consortium suggested different but still unpractical solution by introducing
new Unicode characters located in Unicode Plane 1:
Latin bold math characters (𝐀-𝐳)
Latin bold italic math characters (𝑨-𝒛)
Latin italic math characters (𝐴-𝑧)
Greek bold math characters (𝚨-𝛡)
Greek bold italic math characters (𝜜-𝝕)
Greek italic math characters (𝛢-𝜛)
Bold face numbers (𝟎-𝟗)
These characters are specially shaped and kerned for usage in mathematical formulae
(and semantically they are expected to represent math variables).
But the problem is that Unicode Plane 1 is poorly supported by current browsers, text editors and even printer drivers. So at the moment this solution does not work.
Quoted from http://www.geocities.com/csssite/xml.txt
Note that XML MAIDEN provides no markup for diacritical marks
(over dots, hats, tildes, caps) as rendering of diacritical marks is entirely
governed by Unicode standard (via combining diacritical marks and
precomposed glyphs) and no extra markup is needed for these purpose.
Shaping and kerning of glyphs is also addressed by Unicode
standard that defines (in Plane 1) set of mathematical alphanumerical characters
that are specially shaped (usually those Latin letters that represent mathematical
variables are slightly slanted and look smother then ordinary Latin glyphs)
and kerned (kerning of characters in mathematics differs from plain text kerning)
for usage in mathematical formulae http://www.unicode.org/reports/tr25
Shaping and kerning of individual glyphs as well as rendering of diacritical marks
can not be adjusted via style sheets, as CSS does not provide mechanism
for styling individual characters (in addition CSS selectors apply only
to (pseudo) elements).
Third option taken by jsMath author is to use Javascript/DOM for this purpose.
See details at http://www.math.union.edu/~dpvc/jsMath/
Forums » General Opera topics » Opera and cross-browser Web design