About Constructor

A constructor looks more like a method but without return type. Moreover, the name of the constructor and the class name should be the same. The advantage of constructors over methods is that they are called implicitly whenever an object is created. In case of methods, they must be called explicitly. To create an object, the constructor must be called. Constructor gives properties to an object at the time of creation only. programmer uses constructor to initialize variables, instantiating objects and setting colors. Constructor is equivalent to init() method of an applet.
Default Constructor – No Argument Constructor
A constructor without parameters is called as "default constructor" or "no-args constructor". It is called default because if the programmer does not write himself, JVM creates one and supplies. The default constructor supplied by the JVM does not have any functionality (output).
1
2
3
4
5
6
7
8
9
10
11
12
public class Demo
{
  public Demo()
  {
    System.out.println("From default constructor");
  }
  public static void main(String args[])
  {
    Demo d1 = new Demo();
    Demo d2 = new Demo();
  }
}
Output screen of Demo.java
public Demo()
"public" is the access specifier and "Demo()" is the constructor. Notice, it does not have return type and the name is that of the class name.
Demo d1 = new Demo();
In the above statement, d1 is an object of Demo class. To create the object, the constructor "Demo()" is called. Like this, any number of objects can be created like d2 and for each object the constructor is called.

Constructor Overloading

Just like method overloading, constructors also can be overloaded. Same constructor declared with different parameters in the same class is known as constructor overloading. Compiler differentiates which constructor is to be called depending upon the number of parameters and their sequence of data types.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public class Perimeter
{
  public Perimeter()                                                     // I
  {
    System.out.println("From default");
  }
  public Perimeter(int x)                                                // II
  {
    System.out.println("Circle perimeter: " + 2*Math.PI*x);
  }
  public Perimeter(int x, int y)                                       // III
  {
    System.out.println("Rectangle perimeter: " +2*(x+y));
  }
  public static void main(String args[])
  {
    Perimeter p1 = new Perimeter();                     // I
    Perimeter p2 = new Perimeter(10);                  // II
    Perimeter p3 = new Perimeter(10, 20);            // III
  }
}
Output screen of Perimeter.java

Perimeter constructor is overloaded three times. As per the parameters, an appropriate constructor is called. To call all the three constructors three objects are created. Using this(), all the three constructors can be called with a single constructor. this() with Constructors
Suppose by accessing one constructor, the programmer may require the functionality of other constructors also but by creating one object only. For this, Java comes with this(). "this()" is used to access one constructor from another "within the same class". Depending on the parameters supplied, the suitable constructor is accessed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public class Perimeter
{
  public Perimeter()                                                      // I
  {
    System.out.println("From default");
  }
  public Perimeter(int x)                                                // II
  {
    this();
    System.out.println("Circle perimeter: " + 2*Math.PI*x);
  }
  public Perimeter(int x, int y)                                         // III
  {
    this(100);
    System.out.println("Rectangle perimeter: " +2*(x+y));
  }
  public static void main(String args[])
  {
    Perimeter p3 = new Perimeter(10, 20);            // III
  }
}
Output screen of Perimeter.java

In the code, creating object p3, the III constructor is accessed. From III, with "this(100)" statement, the II constructor is accessed. Again from II, the I is accessed without the statement "this()". As per the parameter supplied to this(), the appropriate or corresponding constructor is accessed. Rules of using this()
A few restrictions exist for the usage of this().
  1. If included, this() statement must be the first one in the constructor. You cannot write anything before this() in the constructor.
  2. With the above rule, there cannot be two this() statements in the same constructor (because both cannot be the first).
  3. this() must be used with constructors only, that too to call the same class constructor (but not super class constructor).

multitasking and multithreading

In this paper, you will understand the importance of leveraging multitasking and multithreading in your application.

Table of Contents

  1. Background
  2. Multitasking
  3. Multithreading
  4. Multithreading with LabVIEW
  5. Multitasking in LabVIEW
  6. More Resources on Multicore Programming

Background

A multicore system is a single-processor CPU that contains two or more cores, with each core housing independent microprocessors. A multicore microprocessor performs multiprocessing in a single physical package. Multicore systems share computing resources that are often duplicated in multiprocessor systems, such as the L2 cache and front-side bus.
Multicore systems provide performance that is similar to multiprocessor systems but often at a significantly lower cost because a motherboard with support for multiple processors, such as multiple processor sockets, is not required.

Multitasking

In computing, multitasking is a method by which multiple tasks, also known as processes, share common processing resources such as a CPU. With a multitasking OS, such as Windows XP, you can simultaneously run multiple applications. Multitasking refers to the ability of the OS to quickly switch between each computing task to give the impression the different applications are executing multiple actions simultaneously.
As CPU clock speeds have increased steadily over time, not only do applications run faster, but OSs can switch between applications more quickly. This provides better overall performance. Many actions can happen at once on a computer, and individual applications can run faster.

Single Core

In the case of a computer with a single CPU core, only one task runs at any point in time, meaning that the CPU is actively executing instructions for that task. Multitasking solves this problem by scheduling which task may run at any given time and when another waiting task gets a turn.


[+] Enlarge Image
Figure 1. Single-core systems schedule tasks on 1 CPU to multitask

Multicore

When running on a multicore system, multitasking OSs can truly execute multiple tasks concurrently. The multiple computing engines work independently on different tasks.
For example, on a dual-core system, four applications - such as word processing, e-mail, Web browsing, and antivirus software - can each access a separate processor core at the same time. You can multitask by checking e-mail and typing a letter simultaneously, thus improving overall performance for applications.


[+] Enlarge Image
Figure 2. Dual-core systems enable multitasking operating systems to execute two tasks simultaneously

The OS executes multiple applications more efficiently by splitting the different applications, or processes, between the separate CPU cores. The computer can spread the work - each core is managing and switching through half as many applications as before - and deliver better overall throughput and performance. In effect, the applications are running in parallel.

Multithreading

Multithreading extends the idea of multitasking into applications, so you can subdivide specific operations within a single application into individual threads. Each of the threads can run in parallel. The OS divides processing time not only among different applications, but also among each thread within an application.
In a multithreaded National Instruments LabVIEW program, an example application might be divided into four threads - a user interface thread, a data acquisition thread, network communication, and a logging thread. You can prioritize each of these so that they operate independently. Thus, in multithreaded applications, multiple tasks can progress in parallel with other applications that are running on the system.


[+] Enlarge Image
Figure 3. Dual-core system enables multithreading
Applications that take advantage of multithreading have numerous benefits, including the following:
  • More efficient CPU use
  • Better system reliability
  • Improved performance on multiprocessor computers
In many applications, you make synchronous calls to resources, such as instruments. These instrument calls often take a long time to complete. In a single-threaded application, a synchronous call effectively blocks, or prevents, any other task within the application from executing until the operation completes. Multithreading prevents this blocking.
While the synchronous call runs on one thread, other parts of the program that do not depend on this call run on different threads. Execution of the application progresses instead of stalling until the synchronous call completes. In this way, a multithreaded application maximizes the efficiency of the CPU because it does not idle if any thread of the application is ready to run.




Multithreading with LabVIEW

LabVIEW automatically divides each application into multiple execution threads. The complex tasks of thread management are transparently built into the LabVIEW execution system.


[+] Enlarge Image
Figure 4. LabVIEW uses multiple execution threads

Multitasking in LabVIEW

LabVIEW uses preemptive multithreading on OSs that offer this feature. LabVIEW also uses cooperative multithreading. OSs and processors with preemptive multithreading employ a limited number of threads, so in certain cases, these systems return to using cooperative multithreading.
The execution system preemptively multitasks VIs using threads. However, a limited number of threads are available. For highly parallel applications, the execution system uses cooperative multitasking when available threads are busy. Also, the OS handles preemptive multitasking between the application and other tasks.

Difference between Interfaces and Abstract Classes


We’ll first discuss what Interfaces and Abstract Classes are all about to understand the differences between the two more clearly and completely.

Interface: Java Interfaces are equivalent to protocols. They basically represent an agreed-upon behavior to facilitate interaction between unrelated objects. For example, the buttons on a Remote Controller form the interface for outside world to interact with TV. How this interface is implemented by different vendors is not specified and you’re hardly aware of or bothered about how these buttons have been implemented to work internally. Interface is plainly a contract between the producer and the consumer. How the producer implements the exposed behavior is normally not cared by the consumer.

In Java, an Interface is normally a group of methods with empty bodies. You can have constant declarations in a Java Interface as well. A class that implements the interface agrees to the exposed behavior by implementing all the methods of the interface.

interface TVRemoteController{
void power();
void setChannel(int channelNumber);
void upChannel();
void downChannel();
void upVolume();
void downVolume();
……
}

A sample implementation of this interface by a vendor, say Sony:

public class SonyTVRemoteController implements TVRemoteController{
/*…this class can have other methods, properties as well …*/
……
void power(){
//implementation of power() method of the interface
}
void setChannel(int channelNumber){
//implementation of setChannel(int) method of the interface
}
//similarly, implementation of other methods of the interface
……
}

Implementing an interface means the class will support at least the exposed behavior. It can definitely add any number of extra behaviors/properties for its clients. That’s why few Remote Controllers have hell lot of buttons :-)

Abstract Class: In Java, abstract class is a class which has been declared ‘abstract’. By declaring ‘abstract’ we ensure that the class can’t be instantiated. Why to have such a class then? Because, you would not be having implementation of all the methods in that class and you need to leave it to the subclass to decide how to implement them. In this case, there is no point instantiating an incomplete class.

An abstract method is a method which doesn’t have any implementation. If a class has even a single abstract method, then you got to declare the class ‘abstract’. Though, you don’t need to have at least one abstract method to declare a class abstract. You can declare a complete class as ‘abstract’ as well. This practice is seldom used. One possible reason may be that you never want your clients to instantiate your class directly even though you’ve already provided default implementation of all the methods. Strange! Yeah… it is. The designer of such a class may like to provide the default implementation of at least one method just to serve as a template (and not the actual implementation) for the client and thus making the class incomplete. So, a client first needs to subclass and implement the method(s) by overriding them. Now the subclass will be a concrete/complete class. Does it make some sense? Okay… Let me try to give another example. Think of a hypothetical situation, where you need to design a class, which will have ‘n’ methods and ‘n’ clients, where every single client wants default implementation of ‘n-1’ methods and it needs to implement only one (unique to every client) of the methods. In such a situation, you may not like to declare any of the methods ‘abstract’ as it’ll be required to be a non-complete method only for one of the clients and a complete implementation for other ‘n-1’ clients. If you declare it ‘abstract’ then every client will need to implement it and you’ll end up getting ‘n-1’ same piece of code. On the other hand, if you don’t declare ‘abstract’ then you simply need to override this method in corresponding sub class. Since, the base class is incomplete in all the ‘n’ cases. Assuming that this class will have only these many forms of usage, you’ll never require having an instance of it. That’s why you would declare it ‘abstract’. Confused? Read this paragraph once more :-)

public abstract class SampleAbstractClass{
//…fields
……
//…non-abstract methods, if any
……
//…abstract method, if any J
abstract void sampleAbstractMethod(); //… ends with ‘;’
}

public class SubClassOfSampleAbstractClass extends SampleAbstractClass{
//… fields, and non-abstract methods (if any)
……
//…implementation of the abstract method
void sampleAbstractMethod(){
……
}
}

Difference between Interfaces and Abstract Classes: From the language perspective, there are several differences, few of them are:-

  • An abstract class may contain fields, which are not ‘static’ and ‘final’ as is the case with interfaces.
  • It may have few (or all) implemented methods as well, whereas Interfaces can’t have any implementation code. All the methods of an interface are by default ‘abstract’. Methods/Members of an abstract class may have any visibility: public, protected, private, none (package). But, those of an interface can have only one type of visibility: public.
  • An abstract class automatically inherits the Object class and thereby includes methods like clone(), equals(), etc. There is no such thing with an interface. Likewise, an abstract class can have a constructor, but an interface can’t have one…
  • Another very famous difference is that Interfaces are used to implement multiple inheritance in Java as a class in Java can explicitly have only one super class, but it can implement any number of interfaces… blah blah… :-)
From the performance perspective, the different is that Interfaces may be little slower as they require extra indirection to find the corresponding method in the actual class. Though, modern JVMs have already made that difference very little.

If you want to add a new method to an interface, then you either need to track all the classes implementing that interface or you’ll extend that interface to make a new interface having that extra method(s). In case of an abstract class, you’ll simply add the default implementation of that method and all the code will continue to work.

Many differences are listed already, but the main difference lies in the usage of the two. They are not rivals, but in most of the cases they are complimentary. We need to understand when to use what.

When to use an Interface: it asks you to start everything from scratch. You need to provide implementation of all the methods. So, you should use it to define the contract, which you’re unsure of how the different vendors/producers will implement. So, you can say that Interfaces can be used to enforce certain standards.

When to use an Abstract Class: it is used mostly when you’ve partial implementation ready with you, but not the complete. So, you may declare the incomplete methods as ‘abstract’ and leave it to the clients to implement it the way they actually want. Not all the details can be concrete at the base class level or different clients may like to implement the method differently.


When to use both: if you want to implement multiple inheritance where you have the luxury of providing partial implementation as well. You’ll then put all that code in an abstract class (this can be a concrete class as well… but here we assume that the class is also only partially implemented and hence an abstract class), extend that class, and implement as may interfaces as you want.

Update[June 25, 2008]: Do interfaces in Java really inherit the Object class? If NOT then how do we manage to call Object methods on the references of an interface type? Read more in this article - Do Interfaces inherit the Object class in Java?

Liked the article? You may like to Subscribe to this blog for regular updates. You may also like to follow the blog to manage the bookmark easily and to tell the world that you enjoy GeekExplains. You can find the 'Followers' widget in the rightmost sidebar.
........................................................................................................................................................................

Introduction

In this article along with the demo project I will discuss Interfaces versus Abstract classes. The concept of Abstract classes and Interfaces is a bit confusing for beginners of Object Oriented programming. Therefore, I am trying to discuss the theoretical aspects of both the concepts and compare their usage. And finally I will demonstrate how to use them with C#.

Background

An Abstract class without any implementation just looks like an Interface; however there are lot of differences than similarities between an Abstract class and an Interface. Let's explain both concepts and compare their similarities and differences.

What is an Abstract Class?

An abstract class is a special kind of class that cannot be instantiated. So the question is why we need a class that cannot be instantiated? An abstract class is only to be sub-classed (inherited from). In other words, it only allows other classes to inherit from it but cannot be instantiated. The advantage is that it enforces certain hierarchies for all the subclasses. In simple words, it is a kind of contract that forces all the subclasses to carry on the same hierarchies or standards.

What is an Interface?

An interface is not a class. It is an entity that is defined by the word Interface. An interface has no implementation; it only has the signature or in other words, just the definition of the methods without the body. As one of the similarities to Abstract class, it is a contract that is used to define hierarchies for all subclasses or it defines specific set of methods and their arguments. The main difference between them is that a class can implement more than one interface but can only inherit from one abstract class. Since C# doesn�t support multiple inheritance, interfaces are used to implement multiple inheritance.

Both Together

When we create an interface, we are basically creating a set of methods without any implementation that must be overridden by the implemented classes. The advantage is that it provides a way for a class to be a part of two classes: one from inheritance hierarchy and one from the interface.
When we create an abstract class, we are creating a base class that might have one or more completed methods but at least one or more methods are left uncompleted and declared abstract. If all the methods of an abstract class are uncompleted then it is same as an interface. The purpose of an abstract class is to provide a base class definition for how a set of derived classes will work and then allow the programmers to fill the implementation in the derived classes.
There are some similarities and differences between an interface and an abstract class that I have arranged in a table for easier comparison:
Feature
Interface
Abstract class
Multiple inheritance
A class may inherit several interfaces.
A class may inherit only one abstract class.
Default implementation
An interface cannot provide any code, just the signature.
An abstract class can provide complete, default code and/or just the details that have to be overridden.
Access Modfiers An interface cannot have access modifiers for the subs, functions, properties etc everything is assumed as public An abstract class can contain access modifiers for the subs, functions, properties
Core VS Peripheral
Interfaces are used to define the peripheral abilities of a class. In other words both Human and Vehicle can inherit from a IMovable interface.
An abstract class defines the core identity of a class and there it is used for objects of the same type.
Homogeneity
If various implementations only share method signatures then it is better to use Interfaces.
If various implementations are of the same kind and use common behaviour or status then abstract class is better to use.
Speed
Requires more time to find the actual method in the corresponding classes.
Fast
Adding functionality (Versioning)
If we add a new method to an Interface then we have to track down all the implementations of the interface and define implementation for the new method.
If we add a new method to an abstract class then we have the option of providing default implementation and therefore all the existing code might work properly.
Fields and Constants No fields can be defined in interfaces An abstract class can have fields and constrants defined

Using the Code

Let me explain the code to make it a bit easier. There is an Employee abstract class and an IEmployee interface. Within the Abstract class and the Interface entity I am commenting on the differences between the artifacts.
I am testing both the Abstract class and the Interface by implementing objects from them. From the Employee abstract class, we have inherited one object: Emp_Fulltime. Similarly from IEmployee we have inherited one object: Emp_Fulltime2.
In the test code under the GUI, I am creating instances of both Emp_Fulltime and Emp_Fulltime2 and then setting their attributes and finally calling the calculateWage method of the objects.

Abstract Class Employee

Collapse
using System;

namespace AbstractsANDInterfaces
{
    /// 


    /// Summary description for Employee.

    /// 

    
    public abstract class Employee
    {
        //we can have fields and properties 


        //in the Abstract class

        protected String id;
        protected String lname;
        protected String fname;

        //properties


        public abstract String ID
        {
            get;
            set;
        }

        public abstract String FirstName
        {
            get;
            set;
        }
        
        public abstract String LastName
        {
            get;
            set;
        }
        //completed methods


        public String Update()
        {
            return "Employee " + id + " " + 
                      lname + " " + fname + 
                      " updated";
        }
        //completed methods


        public String Add()
        {
            return "Employee " + id + " " + 
                      lname + " " + fname + 
                      " added";
        }
        //completed methods


        public String Delete()
        {
            return "Employee " + id + " " + 
                      lname + " " + fname + 
                      " deleted";
        }
        //completed methods


        public String Search()
        {
            return "Employee " + id + " " + 
                      lname + " " + fname + 
                      " found";
        }

        //abstract method that is different 


        //from Fulltime and Contractor

        //therefore i keep it uncompleted and 

        //let each implementation 

        //complete it the way they calculate the wage.


        public abstract String CalculateWage();
        
    }
}

Interface Employee

Collapse
using System;


namespace AbstractsANDInterfaces
{
    /// <summary>


    /// Summary description for IEmployee.

    /// </summary>

    public interface IEmployee
    {
        //cannot have fields. uncommenting 


        //will raise error!
        //        protected String id;
        //        protected String lname;
        //        protected String fname;


        //just signature of the properties 

        //and methods.

        //setting a rule or contract to be 

        //followed by implementations.


        String ID
        {
            get;
            set;
        }

        String FirstName
        {
            get;
            set;
        }
        
        String LastName
        {
            get;
            set;
        }
        
        // cannot have implementation


        // cannot have modifiers public 

        // etc all are assumed public

        // cannot have virtual


        String Update();

        String Add();

        String Delete();

        String Search();

        String CalculateWage();
    }
}

Inherited Objects

Emp_Fulltime:
Collapse
using System;

namespace AbstractsANDInterfaces
{
    /// 


    /// Summary description for Emp_Fulltime.

    /// 

     
    //Inheriting from the Abstract class

    public class Emp_Fulltime : Employee
    {
        //uses all the properties of the 


        //Abstract class therefore no 

        //properties or fields here!


        public Emp_Fulltime()
        {
        }


        public override String ID
        {
            get

            {
                return id;
            }
            set
            {
                id = value;
            }
        }
        
        public override String FirstName
        {
            get

            {
                return fname;
            }
            set
            {
                fname = value;
            }
        }

        public override String LastName
        {
            get

            {
                return lname;
            }
            set
            {
                lname = value;
            }
        }

        //common methods that are 

        //implemented in the abstract class

        public new String Add()
        {
            return base.Add();
        }
        //common methods that are implemented 


        //in the abstract class

        public new String Delete()
        {
            return base.Delete();
        }
        //common methods that are implemented 


        //in the abstract class

        public new String Search()
        {
            return base.Search();
        }
        //common methods that are implemented 


        //in the abstract class

        public new String Update()
        {
            return base.Update();
        }
        
        //abstract method that is different 


        //from Fulltime and Contractor

        //therefore I override it here.

        public override String CalculateWage()
        {
            return "Full time employee " + 
                  base.fname + " is calculated " + 
                  "using the Abstract class...";
        }
    }
}
Emp_Fulltime2:
Collapse
using System;

namespace AbstractsANDInterfaces
{
    /// 

    /// Summary description for Emp_fulltime2.


    /// 

    
    //Implementing the interface

    public class Emp_fulltime2 : IEmployee
    {
        //All the properties and 


        //fields are defined here!

        protected String id;
        protected String lname;
        protected String fname;

        public Emp_fulltime2()
        {
            //


            // TODO: Add constructor logic here

            //

        }

        public String ID
        {
            get

            {
                return id;
            }
            set
            {
                id = value;
            }
        }
        
        public String FirstName
        {
            get
            {
                return fname;
            }
            set

            {
                fname = value;
            }
        }

        public String LastName
        {
            get
            {
                return lname;
            }
            set
            {
                lname = value;
            }
        }

        //all the manipulations including Add,Delete, 


        //Search, Update, Calculate are done

        //within the object as there are not 

        //implementation in the Interface entity.

        public String Add()
        {
            return "Fulltime Employee " + 
                          fname + " added.";
        }

        public String Delete()
        {
            return "Fulltime Employee " + 
                        fname + " deleted.";
        }

        public String Search()
        {
            return "Fulltime Employee " + 
                       fname + " searched.";
        }

        public String Update()
        {
            return "Fulltime Employee " + 
                        fname + " updated.";
        }
        
        //if you change to Calculatewage(). 


        //Just small 'w' it will raise 

        //error as in interface

        //it is CalculateWage() with capital 'W'.

        public String CalculateWage()
        {
            return "Full time employee " + 
                  fname + " caluculated using " + 
                  "Interface.";
        }
    }
}

Code for Testing

Collapse
//This is the sub that tests both 

//implementations using Interface and Abstract

private void InterfaceExample_Click(object sender, 
                                System.EventArgs e)
{
    try

    {

        IEmployee emp;

        Emp_fulltime2 emp1 = new Emp_fulltime2();

        emp =  emp1;
        emp.ID = "2234";
        emp.FirstName= "Rahman" ;
        emp.LastName = "Mahmoodi" ;
        //call add method od the object


        MessageBox.Show(emp.Add().ToString());
        
        //call the CalculateWage method

        MessageBox.Show(emp.CalculateWage().ToString());


    }
    catch(Exception ex)
    {
        MessageBox.Show(ex.Message);
    }

}

private void cmdAbstractExample_Click(object sender, 
                                   System.EventArgs e)
{

    Employee emp;

    emp = new Emp_Fulltime();
    

    emp.ID = "2244";
    emp.FirstName= "Maria" ;
    emp.LastName = "Robinlius" ;
    MessageBox.Show(emp.Add().ToString());

    //call the CalculateWage method


    MessageBox.Show(emp.CalculateWage().ToString());

}

Conclusion

In the above examples, I have explained the differences between an abstract class and an interface. I have also implemented a demo project which uses both abstract class and interface and shows the differences in their implementation.

XML Interview Questions with Answers


1. What is XML?
XML is the Extensible Markup Language. It improves the functionality
of the Web by letting you identify your information in a more accurate,
flexible, and adaptable way. It is extensible because it is not
a fixed format like HTML (which is a single, predefined markup language).
Instead, XML is actually a meta language—a language for describing
other languages—which lets you design your own markup languages
for limitless different types of documents. XML can do this because
it’s written in SGML, the international standard meta language for
text document markup (ISO 8879).

2. What is a markup language?
A markup language is a set of words and symbols for describing
the identity of pieces of a document (for example ‘this is
a paragraph’, ‘this is a heading’, ‘this
is a list’, ‘this is the caption of this figure’,
etc). Programs can use this with a style sheet to create output
for screen, print, audio, video, Braille, etc.

Some markup languages (eg those used in word processors) only describe
appearances (’this is italics’, ‘this is bold’),
but this method can only be used for display, and is not normally
re-usable for anything else.

3. Where should I use XML?
Its goal is to enable generic SGML to be served, received, and
processed on the Web in the way that is now possible with HTML.
XML has been designed for ease of implementation and for interoperability
with both SGML and HTML.
Despite early attempts, browsers never allowed other SGML, only
HTML (although there were plugins), and they allowed it (even encouraged
it) to be corrupted or broken, which held development back for over
a decade by making it impossible to program for it reliably. XML
fixes that by making it compulsory to stick to the rules, and by
making the rules much simpler than SGML.

But XML is not just for Web pages: in fact it’s very rarely used
for Web pages on its own because browsers still don’t provide reliable
support for formatting and transforming it. Common uses for XML
include:
Information identification because you can define your own markup,
you can define meaningful names for all your information items.
Information storage because XML is portable and non-proprietary,
it can be used to store textual information across any platform.
Because it is backed by an international standard, it will remain
accessible and processable as a data format. Information structure

XML can therefore be used to store and identify any kind of (hierarchical)
information structure, especially for long, deep, or complex document
sets or data sources, making it ideal for an information-management
back-end to serving the Web. This is its most common Web application,
with a transformation system to serve it as HTML until such time
as browsers are able to handle XML consistently. Publishing the
original goal of XML as defined in the quotation at the start of
this section. Combining the three previous topics (identity, storage,
structure) means it is possible to get all the benefits of robust
document management and control (with XML) and publish to the Web
(as HTML) as well as to paper (as PDF) and to other formats (eg
Braille, Audio, etc) from a single source document by using the
appropriate stylesheets. Messaging and data transfer XML is also
very heavily used for enclosing or encapsulating information in
order to pass it between different computing systems which would
otherwise be unable to communicate. By providing a lingua franca
for data identity and structure, it provides a common envelope for
inter-process communication (messaging). Web services Building on
all of these, as well as its use in browsers, machine-processable
data can be exchanged between consenting systems, where before it
was only comprehensible by humans (HTML). Weather services, e-commerce
sites, blog newsfeeds, AJaX sites, and thousands of other data-exchange
services use XML for data management and transmission, and the web
browser for display and interaction.

4. Why is XML such an important development?
It removes two constraints which were holding back Web developments:
1. dependence on a single, inflexible document type (HTML) which
was being much abused for tasks it was never designed for;

2. the complexity of full SGML, whose syntax allows many powerful
but hard-to-program options.
XML allows the flexible development of user-defined document types.
It provides a robust, non-proprietary, persistent, and verifiable
file format for the storage and transmission of text and data both
on and off the Web; and it removes the more complex options of SGML,
making it easier to program for.

5. Describe the differences between XML and HTML.
It’s amazing how many developers claim to be proficient programming
with XML, yet do not understand the basic differences between XML
and HTML. Anyone with a fundamental grasp of XML should be able
describe some of the main differences outlined in the table below.

XMLUser definable tags
Content driven
End tags required for well formed documents
Quotes required around attributes values
Slash required in empty tags

HTMLDefined set of tags designed for web display
Format driven
End tags not required
Quotes not required
Slash not required

6. Describe the role that XSL can play when dynamically
generating HTML pages from a relational database.

Even if candidates have never participated in a project involving
this type of architecture, they should recognize it as one of the
common uses of XML. Querying a database and then formatting the
result set so that it can be validated as an XML document allows
developers to translate the data into an HTML table using XSLT rules.
Consequently, the format of the resulting HTML table can be modified
without changing the database query or application code since the
document rendering logic is isolated to the XSLT rules.

7. What is SGML?
SGML is the Standard Generalized Markup Language (ISO 8879:1986),
the international standard for defining descriptions of the structure
of different types of electronic document. There is an SGML FAQ
from David Megginson at http://math.albany.edu:8800/hm/sgml/cts-faq.htmlFAQ;
and Robin Cover’s SGML Web pages are at http://www.oasis-open.org/cover/general.html.
For a little light relief, try Joe English’s ‘Not the SGML
FAQ’ at http://www.flightlab.com/~joe/sgml/faq-not.txtFAQ.

SGML is very large, powerful, and complex. It has been in heavy
industrial and commercial use for nearly two decades, and there
is a significant body of expertise and software to go with it.
XML is a lightweight cut-down version of SGML which keeps enough
of its functionality to make it useful but removes all the optional
features which made SGML too complex to program for in a Web environment.

8. Aren’t XML, SGML, and HTML all the same thing?
Not quite; SGML is the mother tongue, and has been used for describing
thousands of different document types in many fields of human activity,
from transcriptions of ancient Irish manuscripts to the technical
documentation for stealth bombers, and from patients’ clinical records
to musical notation. SGML is very large and complex, however, and
probably overkill for most common office desktop applications.

XML is an abbreviated version of SGML, to make it easier to use
over the Web, easier for you to define your own document types,
and easier for programmers to write programs to handle them. It
omits all the complex and less-used options of SGML in return for
the benefits of being easier to write applications for, easier to
understand, and more suited to delivery and interoperability over
the Web. But it is still SGML, and XML files may still be processed
in the same way as any other SGML file (see the question on XML
software).
HTML is just one of many SGML or XML applications—the one
most frequently used on the Web.
Technical readers may find it more useful to think of XML as being
SGML– rather than HTML++.

9. Who is responsible for XML?
XML is a project of the World Wide Web Consortium (W3C), and the
development of the specification is supervised by an XML Working
Group. A Special Interest Group of co-opted contributors and experts
from various fields contributed comments and reviews by email.
XML is a public format: it is not a proprietary development of any
company, although the membership of the WG and the SIG represented
companies as well as research and academic institutions. The v1.0
specification was accepted by the W3C as a Recommendation on Feb
10, 1998.

10. Why is XML such an important development?
It removes two constraints which were holding back Web developments:

1. dependence on a single, inflexible document type (HTML) which
was being much abused for tasks it was never designed for;
2. the complexity of full question A.4, SGML, whose syntax allows
many powerful but hard-to-program options.
XML allows the flexible development of user-defined document types.
It provides a robust, non-proprietary, persistent, and verifiable
file format for the storage and transmission of text and data both
on and off the Web; and it removes the more complex options of SGML,
making it easier to program for.

11. Give a few examples of types of applications that can
benefit from using XML.

There are literally thousands of applications that can benefit
from XML technologies. The point of this question is not to have
the candidate rattle off a laundry list of projects that they have
worked on, but, rather, to allow the candidate to explain the rationale
for choosing XML by citing a few real world examples. For instance,
one appropriate answer is that XML allows content management systems
to store documents independently of their format, which thereby
reduces data redundancy. Another answer relates to B2B exchanges
or supply chain management systems. In these instances, XML provides
a mechanism for multiple companies to exchange data according to
an agreed upon set of rules. A third common response involves wireless
applications that require WML to render data on hand held devices.

12. What is DOM and how does it relate to XML?
The Document Object Model (DOM) is an interface specification maintained
by the W3C DOM Workgroup that defines an application independent
mechanism to access, parse, or update XML data. In simple terms
it is a hierarchical model that allows developers to manipulate
XML documents easily Any developer that has worked extensively with
XML should be able to discuss the concept and use of DOM objects
freely. Additionally, it is not unreasonable to expect advanced
candidates to thoroughly understand its internal workings and be
able to explain how DOM differs from an event-based interface like
SAX.

13. What is SOAP and how does it relate to XML?
The Simple Object Access Protocol (SOAP) uses XML to define a protocol
for the exchange of information in distributed computing environments.
SOAP consists of three components: an envelope, a set of encoding
rules, and a convention for representing remote procedure calls.
Unless experience with SOAP is a direct requirement for the open
position, knowing the specifics of the protocol, or how it can be
used in conjunction with HTTP, is not as important as identifying
it as a natural application of XML.

14. Why not just carry on extending HTML?
HTML was already overburdened with dozens of interesting but incompatible
inventions from different manufacturers, because it provides only
one way of describing your information.
XML allows groups of people or organizations to question C.13, create
their own customized markup applications for exchanging information
in their domain (music, chemistry, electronics, hill-walking, finance,
surfing, petroleum geology, linguistics, cooking, knitting, stellar
cartography, history, engineering, rabbit-keeping, question C.19,
mathematics, genealogy, etc).
HTML is now well beyond the limit of its usefulness as a way of
describing information, and while it will continue to play an important
role for the content it currently represents, many new applications
require a more robust and flexible infrastructure.

15. Why should I use XML?
Here are a few reasons for using XML (in no particular order).
Not all of these will apply to your own requirements, and you may
have additional reasons not mentioned here (if so, please let the
editor of the FAQ know!).
* XML can be used to describe and identify information accurately
and unambiguously, in a way that computers can be programmed to
‘understand’ (well, at least manipulate as if they could
understand).

* XML allows documents which are all the same type to be created
consistently and without structural errors, because it provides
a standardized way of describing, controlling, or allowing/disallowing
particular types of document structure. [Note that this has absolutely
nothing whatever to do with formatting, appearance, or the actual
text content of your documents, only the structure of them.]
* XML provides a robust and durable format for information storage
and transmission. Robust because it is based on a proven standard,
and can thus be tested and verified; durable because it uses plain-text
file formats which will outlast proprietary binary ones.
* XML provides a common syntax for messaging systems for the exchange
of information between applications. Previously, each messaging
system had its own format and all were different, which made inter-system
messaging unnecessarily messy, complex, and expensive. If everyone
uses the same syntax it makes writing these systems much faster
and more reliable.
* XML is free. Not just free of charge (free as in beer) but free
of legal encumbrances (free as in speech). It doesn’t belong to
anyone, so it can’t be hijacked or pirated. And you don’t have to
pay a fee to use it (you can of course choose to use commercial
software to deal with it, for lots of good reasons, but you don’t
pay for XML itself).
* XML information can be manipulated programmatically (under machine
control), so XML documents can be pieced together from disparate
sources, or taken apart and re-used in different ways. They can
be converted into almost any other format with no loss of information.
* XML lets you separate form from content. Your XML file contains
your document information (text, data) and identifies its structure:
your formatting and other processing needs are identified separately
in a style sheet or processing system. The two are combined at output
time to apply the required formatting to the text or data identified
by its structure (location, position, rank, order, or whatever).

16. Can you walk us through the steps necessary to parse
XML documents?

Superficially, this is a fairly basic question. However, the point
is not to determine whether candidates understand the concept of
a parser but rather have them walk through the process of parsing
XML documents step-by-step. Determining whether a non-validating
or validating parser is needed, choosing the appropriate parser,
and handling errors are all important aspects to this process that
should be included in the candidate’s response.

17. Give some examples of XML DTDs or schemas that you
have worked with.

Although XML does not require data to be validated against a DTD,
many of the benefits of using the technology are derived from being
able to validate XML documents against business or technical architecture
rules. Polling for the list of DTDs that developers have worked
with provides insight to their general exposure to the technology.
The ideal candidate will have knowledge of several of the commonly
used DTDs such as FpML, DocBook, HRML, and RDF, as well as experience
designing a custom DTD for a particular project where no standard
existed.

18. Using XSLT, how would you extract a specific attribute
from an element in an XML document?

Successful candidates should recognize this as one of the most
basic applications of XSLT. If they are not able to construct a
reply similar to the example below, they should at least be able
to identify the components necessary for this operation: xsl:template
to match the appropriate XML element, xsl:value-of to select the
attribute value, and the optional xsl:apply-templates to continue
processing the document.

Extract Attributes from XML Data
Example 1.
&lt;xsl:template match=”element-name”&gt;
Attribute Value:
&lt;xsl:value-of select=”@attribute”/&gt;
&lt;xsl:apply-templates/&gt;

&lt;/xsl:template&gt;
19. When constructing an XML DTD, how do you create an
external entity reference in an attribute value?

Every interview session should have at least one trick question.
Although possible when using SGML, XML DTDs don’t support defining
external entity references in attribute values. It’s more important
for the candidate to respond to this question in a logical way than
than the candidate know the somewhat obscure answer.

20. How would you build a search engine for large volumes
of XML data?

The way candidates answer this question may provide insight into
their view of XML data. For those who view XML primarily as a way
to denote structure for text files, a common answer is to build
a full-text search and handle the data similarly to the way Internet
portals handle HTML pages. Others consider XML as a standard way
of transferring structured data between disparate systems. These
candidates often describe some scheme of importing XML into a relational
or object database and relying on the database’s engine for searching.
Lastly, candidates that have worked with vendors specializing in
this area often say that the best way the handle this situation
is to use a third party software package optimized for XML data.

21. What is the difference between XML and C or C++ or
Java? Updated

C and C++ (and other languages like FORTRAN, or Pascal, or Visual
Basic, or Java or hundreds more) are programming languages with
which you specify calculations, actions, and decisions to be carried
out in order:
mod curconfig[if left(date,6) = “01-Apr”,
t.put “April googlel!”,
f.put days(’31102005′,’DDMMYYYY’) -
days(sdate,’DDMMYYYY’)
” more shopping days to Samhain”];

XML is a markup specification language with which you can design
ways of describing information (text or data), usually for storage,
transmission, or processing by a program. It says nothing about
what you should do with the data (although your choice of element
names may hint at what they are for):
&lt;part num=”DA42″ models=”LS AR DF HG KJ”

update=”2001-11-22″&gt;
&lt;name&gt;Camshaft end bearing retention circlip&lt;/name&gt;
&lt;image drawing=”RR98-dh37″ type=”SVG” x=”476″

y=”226″/&gt; &lt;maker id=”RQ778″&gt;Ringtown
Fasteners Ltd&lt;/maker&gt;
&lt;notes&gt;Angle-nosed insertion tool &lt;tool
id=”GH25″/&gt; is required for the removal

and replacement of this part.&lt;/notes&gt;
&lt;/part&gt;
On its own, an SGML or XML file (including HTML) doesn’t do anything.
It’s a data format which just sits there until you run a program
which does something with it.

22. Does XML replace HTML?
No. XML itself does not replace HTML. Instead, it provides an alternative
which allows you to define your own set of markup elements. HTML
is expected to remain in common use for some time to come, and the
current version of HTML is in XML syntax. XML is designed to make
the writing of DTDs much simpler than with full SGML. (See the question
on DTDs for what one is and why you might want one.)

23. Do I have to know HTML or SGML before I learn XML?
No, although it’s useful because a lot of XML terminology and practice
derives from two decades’ experience of SGML.
Be aware that ‘knowing HTML’ is not the same as ‘understanding
SGML’. Although HTML was written as an SGML application, browsers
ignore most of it (which is why so many useful things don’t work),
so just because something is done a certain way in HTML browsers
does not mean it’s correct, least of all in XML.

24. What does an XML document actually look like (inside)?
The basic structure of XML is similar to other applications of
SGML, including HTML. The basic components can be seen in the following
examples. An XML document starts with a Prolog:
1. The XML Declaration which specifies that this is an XML document;
2. Optionally a Document Type Declaration which identifies the type
of document and says where the Document Type Description (DTD) is
stored;

The Prolog is followed by the document instance:
1. A root element, which is the outermost (top level) element (start-tag
plus end-tag) which encloses everything else: in the examples below
the root elements are conversation and titlepage;
2. A structured mix of descriptive or prescriptive elements enclosing
the character data content (text), and optionally any attributes
(’name=value’ pairs) inside some start-tags.
XML documents can be very simple, with straightforward nested markup
of your own design:
&lt;?xml version=”1.0″ standalone=”yes”?&gt;
&lt;conversation&gt;&lt;br&gt;

&lt;greeting&gt;Hello, world!&lt;/greeting&gt;
&lt;response&gt;Stop the planet, I want to get
off!&lt;/response&gt;
&lt;/conversation&gt;
Or they can be more complicated, with a Schema or question C.11,
Document Type Description (DTD) or internal subset (local DTD changes
in [square brackets]), and an arbitrarily complex nested structure:

&lt;?xml version=”1.0″ encoding=”iso-8859-1″?&gt;
&lt;!DOCTYPE titlepage
SYSTEM “http://www.google.bar/dtds/typo.dtd”
[&lt;!ENTITY % active.links “INCLUDE”&gt;]&gt;
&lt;titlepage id=”BG12273624″&gt;

&lt;white-space type=”vertical” amount=”36″/&gt;
&lt;title font=”Baskerville” alignment=”centered”
size=”24/30″&gt;Hello, world!&lt;/title&gt;
&lt;white-space type=”vertical” amount=”12″/&gt;
&lt;!– In some copies the following

decoration is hand-colored, presumably
by the author –&gt;
&lt;image location=”http://www.google.bar/fleuron.eps”

type=”URI” alignment=”centered”/&gt;
&lt;white-space type=”vertical” amount=”24″/&gt;
&lt;author font=”Baskerville” size=”18/22″

style=”italic”&gt;Vitam capias&lt;/author&gt;
&lt;white-space type=”vertical” role=”filler”/&gt;
&lt;/titlepage&gt;

Or they can be anywhere between: a lot will depend on how you want
to define your document type (or whose you use) and what it will
be used for. Database-generated or program-generated XML documents
used in e-commerce is usually unformatted (not for human reading)
and may use very long names or values, with multiple redundancy
and sometimes no character data content at all, just values in attributes:
&lt;?xml version=”1.0″?&gt; &lt;ORDER-UPDATE AUTHMD5=”4baf7d7cff5faa3ce67acf66ccda8248″

ORDER-UPDATE-ISSUE=”193E22C2-EAF3-11D9-9736-CAFC705A30B3″
ORDER-UPDATE-DATE=”2005-07-01T15:34:22.46″ ORDER-UPDATE-DESTINATION=”6B197E02-EAF3-11D9-85D5-997710D9978F”
ORDER-UPDATE-ORDERNO=”8316ADEA-EAF3-11D9-9955-D289ECBC99F3″&gt;
&lt;ORDER-UPDATE-DELTA-MODIFICATION-DETAIL ORDER-UPDATE-ID=”BAC352437484″&gt;
&lt;ORDER-UPDATE-DELTA-MODIFICATION-VALUE ORDER-UPDATE-ITEM=”56″
ORDER-UPDATE-QUANTITY=”2000″/&gt;

&lt;/ORDER-UPDATE-DELTA-MODIFICATION-DETAIL&gt;
&lt;/ORDER-UPDATE&gt;

25. How does XML handle white-space in my documents?
All white-space, including linebreaks, TAB characters, and normal
spaces, even between ’structural’ elements where no
text can ever appear, is passed by the parser unchanged to the application
(browser, formatter, viewer, converter, etc), identifying the context
in which the white-space was found (element content, data content,
or mixed content, if this information is available to the parser,
eg from a DTD or Schema). This means it is the application’s responsibility
to decide what to do with such space, not the parser’s:
* insignificant white-space between structural elements (space which
occurs where only element content is allowed, ie between other elements,
where text data never occurs) will get passed to the application
(in SGML this white-space gets suppressed, which is why you can
put all that extra space in HTML documents and not worry about it)
* significant white-space (space which occurs within elements which
can contain text and markup mixed together, usually mixed content
or PCDATA) will still get passed to the application exactly as under
SGML. It is the application’s responsibility to handle it correctly.

The parser must inform the application that white-space has occurred
in element content, if it can detect it. (Users of SGML will recognize
that this information is not in the ESIS, but it is in the Grove.)

&lt;chapter&gt;
&lt;title&gt;
My title for
Chapter 1.

&lt;/title&gt;
&lt;para&gt;
text
&lt;/para&gt;
&lt;/chapter&gt;

In the example above, the application will receive all the pretty-printing
linebreaks, TABs, and spaces between the elements as well as those
embedded in the chapter title. It is the function of the application,
not the parser, to decide which type of white-space to discard and
which to retain. Many XML applications have configurable options
to allow programmers or users to control how such white-space is
handled.

26. Which parts of an XML document are case-sensitive?
All of it, both markup and text. This is significantly different
from HTML and most other SGML applications. It was done to allow
markup in non-Latin-alphabet languages, and to obviate problems
with case-folding in writing systems which are caseless.
* Element type names are case-sensitive: you must follow whatever
combination of upper- or lower-case you use to define them (either
by first usage or in a DTD or Schema). So you can’t say &lt;BODY&gt;…&lt;/body&gt;:
upper- and lower-case must match; thus &lt;Img/&gt;, &lt;IMG/&gt;,
and &lt;img/&gt; are three different element types;

* For well-formed XML documents with no DTD, the first occurrence
of an element type name defines the casing;
* Attribute names are also case-sensitive, for example the two width
attributes in &lt;PIC width=”7in”/&gt; and &lt;PIC WIDTH=”6in”/&gt;
(if they occurred in the same file) are separate attributes, because
of the different case of width and WIDTH;
* Attribute values are also case-sensitive. CDATA values (eg Url=”MyFile.SGML”)
always have been, but NAME types (ID and IDREF attributes, and token
list attributes) are now case-sensitive as well;
* All general and parameter entity names (eg A), and your
data content (text), are case-sensitive as always.

27. How can I make my existing HTML files work in XML?
Either convert them to conform to some new document type (with
or without a DTD or Schema) and write a stylesheet to go with them;
or edit them to conform to XHTML. It is necessary to convert existing
HTML files because XML does not permit end-tag minimisation (missing
, etc), unquoted attribute values, and a number of other SGML shortcuts
which have been normal in most HTML DTDs. However, many HTML authoring
tools already produce almost (but not quite) well-formed XML.
You may be able to convert HTML to XHTML using the Dave Raggett’s
HTML Tidy program, which can clean up some of the formatting mess
left behind by inadequate HTML editors, and even separate out some
of the formatting to a stylesheet, but there is usually still some
hand-editing to do.

28. Is there an XML version of HTML?
Yes, the W3C recommends using XHTML which is ‘a reformulation
of HTML 4 in XML 1.0′. This specification defines HTML as
an XML application, and provides three DTDs corresponding to the
ones defined by HTML 4.* (Strict, Transitional, and Frameset). The
semantics of the elements and their attributes are as defined in
the W3C Recommendation for HTML 4. These semantics provide the foundation
for future extensibility of XHTML. Compatibility with existing HTML
browsers is possible by following a small set of guidelines (see
the W3C site).

29. If XML is just a subset of SGML, can I use XML files
directly with existing SGML tools?

Yes, provided you use up-to-date SGML software which knows about
the WebSGML Adaptations TC to ISO 8879 (the features needed to support
XML, such as the variant form for EMPTY elements; some aspects of
the SGML Declaration such as NAMECASE GENERAL NO; multiple attribute
token list declarations, etc).
An alternative is to use an SGML DTD to let you create a fully-normalised
SGML file, but one which does not use empty elements; and then remove
the DocType Declaration so it becomes a well-formed DTDless XML
file. Most SGML tools now handle XML files well, and provide an
option switch between the two standards.

30. Can XML use non-Latin characters?
Yes, the XML Specification explicitly says XML uses ISO 10646,
the international standard character repertoire which covers most
known languages. Unicode is an identical repertoire, and the two
standards track each other. The spec says (2.2): ‘All XML
processors must accept the UTF-8 and UTF-16 encodings of ISO 10646…’.
There is a Unicode FAQ at http://www.unicode.org/faq/FAQ.
UTF-8 is an encoding of Unicode into 8-bit characters: the first
128 are the same as ASCII, and higher-order characters are used
to encode anything else from Unicode into sequences of between 2
and 6 bytes. UTF-8 in its single-octet form is therefore the same
as ISO 646 IRV (ASCII), so you can continue to use ASCII for English
or other languages using the Latin alphabet without diacritics.
Note that UTF-8 is incompatible with ISO 8859-1 (ISO Latin-1) after
code point 127 decimal (the end of ASCII).
UTF-16 is an encoding of Unicode into 16-bit characters, which lets
it represent 16 planes. UTF-16 is incompatible with ASCII because
it uses two 8-bit bytes per character (four bytes above U+FFFF).

31. What’s a Document Type Definition (DTD) and where do
I get one?

A DTD is a description in XML Declaration Syntax of a particular
type or class of document. It sets out what names are to be used
for the different types of element, where they may occur, and how
they all fit together. (A question C.16, Schema does the same thing
in XML Document Syntax, and allows more extensive data-checking.)

For example, if you want a document type to be able to describe
Lists which contain Items, the relevant part of your DTD might contain
something like this:
&lt;!ELEMENT List (Item)+&gt;
&lt;!ELEMENT Item (#PCDATA)&gt;

This defines a list as an element type containing one or more items
(that’s the plus sign); and it defines items as element types containing
just plain text (Parsed Character Data or PCDATA). Validators read
the DTD before they read your document so that they can identify
where every element type ought to come and how each relates to the
other, so that applications which need to know this in advance (most
editors, search engines, navigators, and databases) can set themselves
up correctly. The example above lets you create lists like:

&lt;List&gt;
&lt;Item&gt;Chocolate&lt;/Item&gt;
&lt;Item&gt;Music&lt;/Item&gt;
&lt;Item&gt;Surfingv&lt;/Item&gt;

&lt;/List&gt;
(The indentation in the example is just for legibility while editing:
it is not required by XML.)
A DTD provides applications with advance notice of what names and
structures can be used in a particular document type. Using a DTD
and a validating editor means you can be certain that all documents
of that particular type will be constructed and named in a consistent
and conformant manner.
DTDs are not required for processing the tip in question Bwell-formed
documents, but they are needed if you want to take advantage of
XML’s special attribute types like the built-in ID/IDREF cross-reference
mechanism; or the use of default attribute values; or references
to external non-XML files (’Notations’); or if you simply
want a check on document validity before processing.
There are thousands of DTDs already in existence in all kinds of
areas (see the SGML/XML Web pages for pointers). Many of them can
be downloaded and used freely; or you can write your own (see the
question on creating your own DTD. Old SGML DTDs need to be converted
to XML for use with XML systems: read the question on converting
SGML DTDs to XML, but most popular SGML DTDs are already available
in XML form.
The alternatives to a DTD are various forms of question C.16, Schema.
These provide more extensive validation features than DTDs, including
character data content validation.

32. Does XML let me make up my own tags?
No, it lets you make up names for your own element types. If you
think tags and elements are the same thing you are already in considerable
trouble: read the rest of this question carefully.

33. How do I create my own document type?
Document types usually need a formal description, either a DTD
or a Schema. Whilst it is possible to process well-formed XML documents
without any such description, trying to create them without one
is asking for trouble. A DTD or Schema is used with an XML editor
or API interface to guide and control the construction of the document,
making sure the right elements go in the right places.
Creating your own document type therefore begins with an analysis
of the class of documents you want to describe: reports, invoices,
letters, configuration files, credit-card verification requests,
or whatever. Once you have the structure correct, you write code
to express this formally, using DTD or Schema syntax.

34. How do I write my own DTD?
You need to use the XML Declaration Syntax (very simple: declaration
keywords begin with
&lt;!ELEMENT Shopping-List (Item)+&gt;
&lt;!ELEMENT Item (#PCDATA)&gt;

It says that there shall be an element called Shopping-List and
that it shall contain elements called Item: there must be at least
one Item (that’s the plus sign) but there may be more than one.
It also says that the Item element may contain only parsed character
data (PCDATA, ie text: no further markup).
Because there is no other element which contains Shopping-List,
that element is assumed to be the ‘root’ element, which
encloses everything else in the document. You can now use it to
create an XML file: give your editor the declarations:
&lt;?xml version=”1.0″?&gt;

&lt;!DOCTYPE Shopping-List SYSTEM “shoplist.dtd”&gt;
(assuming you put the DTD in that file). Now your editor will let
you create files according to the pattern:
&lt;Shopping-List&gt;

&lt;Item&gt;Chocolate&lt;/Item&gt;
&lt;Item&gt;Sugar&lt;/Item&gt;

&lt;Item&gt;Butter&lt;/Item&gt;
&lt;/Shopping-List&gt;

It is possible to develop complex and powerful DTDs of great subtlety,
but for any significant use you should learn more about document
systems analysis and document type design. See for example Developing
SGML DTDs: From Text to Model to Markup (Maler and el Andaloussi,
1995): this was written for SGML but perhaps 95% of it applies to
XML as well, as XML is much simpler than full SGML—see the
list of restrictions which shows what has been cut out.
Warning
Incidentally, a DTD file never has a DOCTYPE Declaration in it:
that only occurs in an XML document instance (it’s what references
the DTD). And a DTD file also never has an XML Declaration at the
top either. Unfortunately there is still software around which inserts
one or both of these.

35. Can a root element type be explicitly declared in the
DTD?

No. This is done in the document’s Document Type Declaration, not
in the DTD.

36. I keep hearing about alternatives to DTDs. What’s a
Schema?

The W3C XML Schema recommendation provides a means of specifying
formal data typing and validation of element content in terms of
data types, so that document type designers can provide criteria
for checking the data content of elements as well as the markup
itself. Schemas are written in XML Document Syntax, like XML documents
are, avoiding the need for processing software to be able to read
XML Declaration Syntax (used for DTDs).
There is a separate Schema FAQ at http://www.schemavalid.com.
The term ‘vocabulary’ is sometimes used to refer to
DTDs and Schemas together. Schemas are aimed at e-commerce, data
control, and database-style applications where character data content
requires validation and where stricter data control is needed than
is possible with DTDs; or where strong data typing is required.
They are usually unnecessary for traditional text document publishing
applications.
Unlike DTDs, Schemas cannot be specified in an XML Document Type
Declaration. They can be specified in a Namespace, where Schema-aware
software should pick it up, but this is optional:

&lt;invoice id=”abc123″
xmlns=”http://example.org/ns/books/”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”http://acme.wilycoyote.org/xsd/invoice.xsd”&gt;

&lt;/invoice&gt;

More commonly, you specify the Schema in your processing software,
which should record separately which Schema is used by which XML
document instance.
In contrast to the complexity of the W3C Schema model, Relax NG
is a lightweight, easy-to-use XML schema language devised by James
Clark (see http://relaxng.org/) with development hosted by OASIS.
It allows similar richness of expression and the use of XML as its
syntax, but it provides an additional, simplified, syntax which
is easier to use for those accustomed to DTDs.

37. How do I get XML into or out of a database?
Ask your database manufacturer: they all provide XML import and
export modules to connect XML applications with databases. In some
trivial cases there will be a 1:1 match between field names in the
database table and element type names in the XML Schema or DTD,
but in most cases some programming will be required to establish
the desired match. This can usually be stored as a procedure so
that subsequent uses are simply commands or calls with the relevant
parameters.
In less trivial, but still simple, cases, you could export by writing
a report routine that formats the output as an XML document, and
you could import by writing an XSLT transformation that formatted
the XML data as a load file.

38. Can I encode mathematics using XML?Updated
Yes, if the document type you use provides for math, and your users’
browsers are capable of rendering it. The mathematics-using community
has developed the MathML Recommendation at the W3C, which is a native
XML application suitable for embedding in other DTDs and Schemas.

It is also possible to make XML fragments from other DTDs, such
as ISO 12083 Math, or OpenMath, or one of your own making. Browsers
which display math embedded in SGML existed for many years (eg DynaText,
Panorama, Multidoc Pro), and mainstream browsers are now rendering
MathML. David Carlisle has produced a set of stylesheets for rendering
MathML in browsers. It is also possible to use XSLT to convert XML
math markup to LATEX for print (PDF) rendering, or to use XSL:FO.

Please note that XML is not itself a programming language, so concepts
such as arithmetic and if-statements (if-then-else logic) are not
meaningful in XML documents.

39. How will XML affect my document links?
The linking abilities of XML systems are potentially much more
powerful than those of HTML, so you’ll be able to do much more with
them. Existing href-style links will remain usable, but the new
linking technology is based on the lessons learned in the development
of other standards involving hypertext, such as TEI and HyTime,
which let you manage bidirectional and multi-way links, as well
as links to a whole element or span of text (within your own or
other documents) rather than to a single point. These features have
been available to SGML users for many years, so there is considerable
experience and expertise available in using them. Currently only
Mozilla Firefox implements XLink.
The XML Linking Specification (XLink) and the XML Extended Pointer
Specification (XPointer) documents contain the details. An XLink
can be either a URI or a TEI-style Extended Pointer (XPointer),
or both. A URI on its own is assumed to be a resource; if an XPointer
follows it, it is assumed to be a sub-resource of that URI; an XPointer
on its own is assumed to apply to the current document (all exactly
as with HTML).

An XLink may use one of #, ?, or |. The # and ? mean the same as
in HTML applications; the | means the sub-resource can be found
by applying the link to the resource, but the method of doing this
is left to the application. An XPointer can only follow a #.
The TEI Extended Pointer Notation (EPN) is much more powerful than
the fragment address on the end of some URIs, as it allows you to
specify the location of a link end using the structure of the document
as well as (or in addition to) known, fixed points like IDs. For
example, the linked second occurrence of the word ‘XPointer’
two paragraphs back could be referred to with the URI (shown here
with linebreaks and spaces for clarity: in practice it would of
course be all one long string):

http://xml.silmaril.ie/faq.xml#ID(hypertext)
.child(1,#element,’answer’)
.child(2,#element,’para’)
.child(1,#element,’link’)

This means the first link element within the second paragraph within
the answer in the element whose ID is hypertext (this question).
Count the objects from the start of this question (which has the
ID hypertext) in the XML source:
1. the first child object is the element containing the question
();
2. the second child object is the answer (the element);
3. within this element go to the second paragraph;
4. find the first link element.
Eve Maler explained the relationship of XLink and XPointer as follows:

XLink governs how you insert links into your XML document, where
the link might point to anything (eg a GIF file); XPointer governs
the fragment identifier that can go on a URL when you’re linking
to an XML document, from anywhere (eg from an HTML file).
[Or indeed from an XML file, a URI in a mail message, etc…Ed.]
David Megginson has produced an xpointer function for Emacs/psgml
which will deduce an XPointer for any location in an XML document.
XML Spy has a similar function.

40. How does XML handle metadata?
Because XML lets you define your own markup languages, you can
make full use of the extended hypertext features of XML (see the
question on Links) to store or link to metadata in any format (eg
using ISO 11179, as a Topic Maps Published Subject, with Dublin
Core, Warwick Framework, or with Resource Description Framework
(RDF), or even Platform for Internet Content Selection (PICS)).

There are no predefined elements in XML, because it is an architecture,
not an application, so it is not part of XML’s job to specify how
or if authors should or should not implement metadata. You are therefore
free to use any suitable method. Browser makers may also have their
own architectural recommendations or methods to propose.

41. Can I use JavaScript, ActiveX, etc in XML files?
This will depend on what facilities your users’ browsers implement.
XML is about describing information; scripting languages and languages
for embedded functionality are software which enables the information
to be manipulated at the user’s end, so these languages do not normally
have any place in an XML file itself, but in stylesheets like XSL
and CSS where they can be added to generated HTML.
XML itself provides a way to define the markup needed to implement
scripting languages: as a neutral standard it neither encourages
not discourages their use, and does not favour one language over
another, so it is possible to use XML markup to store the program
code, from where it can be retrieved by (for example) XSLT and re-expressed
in a HTML script element.
Server-side script embedding, like PHP or ASP, can be used with
the relevant server to modify the XML code on the fly, as the document
is served, just as they can with HTML. Authors should be aware,
however, that embedding server-side scripting may mean the file
as stored is not valid XML: it only becomes valid when processed
and served, so care must be taken when using validating editors
or other software to handle or manage such files. A better solution
may be to use an XML serving solution like Cocoon, AxKit, or PropelX.

42. Can I use Java to create or manage XML files?
Yes, any programming language can be used to output data from any
source in XML format. There is a growing number of front-ends and
back-ends for programming environments and data management environments
to automate this. Java is just the most popular one at the moment.

There is a large body of middleware (APIs) written in Java and other
languages for managing data either in XML or with XML input or output.

43. How do I execute or run an XML file?
You can’t and you don’t. XML itself is not a programming language,
so XML files don’t ‘run’ or ‘execute’. XML
is a markup specification language and XML files are just data:
they sit there until you run a program which displays them (like
a browser) or does some work with them (like a converter which writes
the data in another format, or a database which reads the data),
or modifies them (like an editor).
If you want to view or display an XML file, open it with an XML
editor or an question B.3, XML browser.
The water is muddied by XSL (both XSLT and XSL:FO) which use XML
syntax to implement a declarative programming language. In these
cases it is arguable that you can ‘execute’ XML code,
by running a processing application like Saxon, which compiles the
directives specified in XSLT files into Java bytecode to process
XML.

44. How do I control formatting and appearance?
In HTML, default styling was built into the browsers because the
tagset of HTML was predefined and hardwired into browsers. In XML,
where you can define your own tagset, browsers cannot possibly be
expected to guess or know in advance what names you are going to
use and what they will mean, so you need a stylesheet if you want
to display formatted text.
Browsers which read XML will accept and use a CSS stylesheet at
a minimum, but you can also use the more powerful XSLT stylesheet
language to transform your XML into HTML—which browsers, of
course, already know how to display (and that HTML can still use
a CSS stylesheet). This way you get all the document management
benefits of using XML, but you don’t have to worry about your readers
needing XML smarts in their browsers.

45. How do I use graphics in XML?
Graphics have traditionally just been links which happen to have
a picture file at the end rather than another piece of text. They
can therefore be implemented in any way supported by the XLink and
XPointer specifications (see question C.18, ‘How will XML
affect my document links?’), including using similar syntax
to existing HTML images. They can also be referenced using XML’s
built-in NOTATION and ENTITY mechanism in a similar way to standard
SGML, as external unparsed entities.
However, the SVG specification (see the tip below, by Peter Murray-Rust)
lets you use XML markup to draw vector graphics objects directly
in your XML file. This provides enormous power for the inclusion
of portable graphics, especially interactive or animated sequences,
and it is now slowly becoming supported in browsers.
The XML linking specifications for external images give you much
better control over the traversal and activation of links, so an
author can specify, for example, whether or not to have an image
appear when the page is loaded, or on a click from the user, or
in a separate window, without having to resort to scripting.

XML itself doesn’t predicate or restrict graphic file formats: GIF,
JPG, TIFF, PNG, CGM, EPS, and SVG at a minimum would seem to make
sense; however, vector formats (EPS, SVG) are normally essential
for non-photographic images (diagrams).
You cannot embed a raw binary graphics file (or any other binary
[non-text] data) directly into an XML file because any bytes happening
to resemble markup would get misinterpreted: you must refer to it
by linking (see below). It is, however, possible to include a text-encoded
transformation of a binary file as a CDATA Marked Section, using
something like UUencode with the markup characters ], & and
&gt; removed from the map so that they could not occur as an erroneous
CDATA termination sequence and be misinterpreted. You could even
use simple hexadecimal encoding as used in PostScript. For vector
graphics, however, the solution is to use SVG (see the tip below,
by Peter Murray-Rust).
Sound files are binary objects in the same way that external graphics
are, so they can only be referenced externally (using the same techniques
as for graphics). Music files written in MusiXML or an XML variant
of SMDL could however be embedded in the same way as for SVG.
The point about using entities to manage your graphics is that you
can keep the list of entity declarations separate from the rest
of the document, so you can re-use the names if an image is needed
more than once, but only store the physical file specification in
a single place. This is available only when using a DTD, not a Schema.

46. How do I include one XML file in another?
This works exactly the same as for SGML. First you declare the
entity you want to include, and then you reference it by name:
&lt;?xml version=”1.0″?&gt;
&lt;!DOCTYPE novel SYSTEM “/dtd/novel.dtd” [
&lt;!ENTITY chap1 SYSTEM “mydocs/chapter1.xml”&gt;
&lt;!ENTITY chap2 SYSTEM “mydocs/chapter2.xml”&gt;
&lt;!ENTITY chap3 SYSTEM “mydocs/chapter3.xml”&gt;

&lt;!ENTITY chap4 SYSTEM “mydocs/chapter4.xml”&gt;
&lt;!ENTITY chap5 SYSTEM “mydocs/chapter5.xml”&gt;
]&gt;
&lt;novel&gt;
&lt;header&gt;
…blah blah…

&lt;/header&gt;
&chap1;
&chap2;
&chap3;
&chap4;
&chap5;

&lt;/novel&gt;
The difference between this method and the one used for including
a DTD fragment (see question D.15, ‘How do I include one DTD
(or fragment) in another?’) is that this uses an external
general (file) entity which is referenced in the same way as for
a character entity (with an ampersand).
The one thing to make sure of is that the included file must not
have an XML or DOCTYPE Declaration on it. If you’ve been using one
for editing the fragment, remove it before using the file in this
way. Yes, this is a pain in the butt, but if you have lots of inclusions
like this, write a script to strip off the declaration (and paste
it back on again for editing).

47. What is parsing and how do I do it in XML
Parsing is the act of splitting up information into its component
parts (schools used to teach this in language classes until the
teaching profession collectively caught the anti-grammar disease).

‘Mary feeds Spot’ parses as

1. Subject = Mary, proper noun, nominative case
2. Verb = feeds, transitive, third person singular, present tense
3. Object = Spot, proper noun, accusative case
In computing, a parser is a program (or a piece of code or API that
you can reference inside your own programs) which analyses files
to identify the component parts. All applications that read input
have a parser of some kind, otherwise they’d never be able to figure
out what the information means. Microsoft Word contains a parser
which runs when you open a .doc file and checks that it can identify
all the hidden codes. Give it a corrupted file and you’ll get an
error message.
XML applications are just the same: they contain a parser which
reads XML and identifies the function of each the pieces of the
document, and it then makes that information available in memory
to the rest of the program.
While reading an XML file, a parser checks the syntax (pointy brackets,
matching quotes, etc) for well-formedness, and reports any violations
(reportable errors). The XML Specification lists what these are.

Validation is another stage beyond parsing. As the component parts
of the program are identified, a validating parser can compare them
with the pattern laid down by a DTD or a Schema, to check that they
conform. In the process, default values and datatypes (if specified)