Friday, August 15, 2008

Thing to Know Before You Code, Part Three: In the Body

Right now, with the code we've gone over, you should understand what the below does.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>My Page</title>
</head>
<body>

</body>
<html>

That's all well and good, but that doesn't show a single bit of anything in your web browser. We're going to start by building a plain (ugly) page and then I'll show you how to come back and fancy it up.

The Document Structure

The document structure in a web page is based on the typographical structure of printed documents. That's not really surprising since the impetus for HTML was to electronically reproduce printed documents for academic consumption. HTML was created for sharing annotated reports, not for hyping the next best buy. Frankly, I expect that most writers have at least a rudimentary understanding of the difference between a heading, body text, block quotes, data tables, and citations. If you're not exactly sure what I'm talking about, try reading some introductions to Semantic Typography for web designers.


So, all that to say, don't worry about the uglies of the document: tell us what the structure is. We can make it pretty with Cascading Style Sheets, but we have to know what we're doing. And on that note, here are your basic content tags.


<

Tag

Denotes

<h1>...<h6>
These are your heading tags. H1 is your top level heading--like the title of your document. H2 works for sections, H3 for sub-sections, etc. It's really, really rare to get down to H5, let alone H6, but, they're there just in case.
<p>
Paragraphs! Need I say more?
<q>
Inline quotes, like <q>No man is an island...</q> (which looks like: No man is an island...)
<blockquote>
Structurally, text contained by a <blockquote> tag pair will be indented left and right, as most style manuals indicate to do for long quotations.
<em>
Emphasized text, usually shown italicized or oblique
<b>
Bolded text
<span>
The span tag almost needs a tutorial all on its own. Pretty much, it's a way to introduce new structure elements that are not covered in the HTML vocabulary. When you use it, you need to add an attribute. We'll start with the class attribute, because we'll use that a lot with CSS. The tag pair will look like:
<span class=""></span>
We define (name and describe) classes inside our style sheets (later article).
<a>
A stands for anchor, and it has one unique attribute that we care about: href. href is short for Hyperlink Reference. In use, it looks like:
<a href="http://some.webpage.com/">Linked Text</a>

Now, armed with these tags, let's go show off our opus.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>My Page</title>
</head>
<body>
<h1>My Ugly Page</h1>
<p><b>Thank you!</b> ... for visiting my ugly page. Well, it's not <em>really</em> ugly; it just hasn't put on its <q>war paint</q> yet, but we'll get to that in a bit. Read on for an excerpt from my upcoming novel, <span class="bookTitle">What I Wrote, No, Seriously</span>.</p>
<blockquote>
It wasn't dark and stormy, or some other tired cliche. It was Caterday, and I read the <a href="http://www.lolcat.com/">lolcat's blog</a> with glee. The fur ball otherwise known as Cat decided my keyboard would make a great place to lay down his weary carcass. After all, those fingers poking keys belonged to his human, and they had better things to do than entertain their owner. Or maybe I was reading too much into the cat's actions. Maybe all he really wanted was another set of flying lessons.
</blockquote>
<p><a href="#">Buy it now!</a></p>
</body>
<html>


That should now look like:

Want to know more of the nifty tags you can use? Go browse through SitePoint's beta HTML Reference or the W3 Schools tag lists.

Thing to Know Before You Code, Part Two: HTML

Thing to Know Before You Code, Part Two: HTML

Now, we dive into the part that scares the bejebbers out of most folks: the coding.

Open up Notepad (or slide on over to the Real-Time HTML Editor).

Copy the below text and paste it into Notepad. Save the file as "index.html".

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>my first page</title>
</head>
<body>
<h1>Hey!</h1>
<p>Well, look at me! I made a web page!</p>
</body>
<html>

Go to the directory in your computer where you saved "index.html" and double click on it--it should now open up in your browser and look something like:

Now that you've done the hard part and gotten started, let's go over what you did.

HTML stands for Hyper Text Markup Language. "Hyper" means it bounces, via links, from one page to another. "Text" is thrown in because this is all written in text, instead of like, ick, machine code (binary is *soooo* not my friend). "Markup" comes into play because you're using <tags> to "mark up" the stuff you're writing. "Language" just means that there is a method to the madness, a structure and a syntax. On the plus side, the vocabulary here is really limited, so there's not much you have to learn to get started "talking" in HTML.

So, on to the syntax!

Basic rules of HTML

  1. Use the vocabulary
    W3 Schools is a great reference put together by the guys writing the web development standards. However, I prefer the SitePoint HTML (beta) reference because the writing is clearer and there are comments for each article, which allow for peer review (and extra "how to use it" tips).
  2. Every "markup" has an opening tag and a closing tag, framed by less than & greater than brackets.
    <tag>stuff the tag wraps around</tag>, or in actual practice:
    <p>Everything between the "p" tag at the front and the "/p" tag at the back is part of one paragraph. If you wanted to separate this into two paragraphs, you would need to close the first "p" tag (with a "/p") and open up a new one.</p>
  3. "Markup" should be properly "nested"
    RIGHT: <b><em>Bold and Emphasized Text</em></b>
    WRONG: <b><em>Bold and Emphasized Text</b></em>

That "should be" hits a key point in html. So long as you're using the right tags, most browsers--where your web pages are interpreted--will understand the wrong example and show it the way the author intended -- well, for text formatting markup. When you start nesting tags that tell the browser about the framework of your page, getting it wrong screws up the page's format.

There are also these things called validators that are rather like automated proofreaders for your HTML coding. They're great for helping you figure out where the problems are in your pages--and, hey, everyone typos at some point, right?

One last point to make before we go back to explanations: Cascading Style Sheets (CSS) are for beautifying the page; HTML is for telling it like it is. Headings are headings, paragraphs are paragraphs, bulletted lists and block quotes and inline quotes and ... well, you get the picture. There are still some presentational tags in HTML, but these are being weeded out. Using these tags--font, strike-through, underline, et all--means setting yourself up to re-write your entire site when the support for these tags finally gets dropped. I'm going to keep it simple and show you the stuff off the latest implemented HTML standard--version 4.01.

So, back to what you did.


Every page should start with a DOCTYPE declaration. A DOCTYPE tells the browser, "Here are the rules this page plays by." Without the DOCTYPE, most modern browsers are going to assume your page was written in a free-for-all style, which means it'll treat the page like it came out of the 1990s. It's called "Quirks Mode" because each browser developed then had its own quirky way of interpreting HTML. It was icky--not as bad as machine code, but still icky.

Let's take a closer look:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

The first part, "!DOCTYPE HTML PUBLIC" tells the browser that this is the DOCTYPE definition (Document Type Definition, a.k.a. DTD), it's for an HTML page, and the definition is a public definition--not something proprietary (that would be SYSTEM and the next part would be omitted). The second part, " "-//W3C//DTD HTML 4.01//EN" ", tells it that the W3C (World Wide Web Consortium) drafted the definition, that it's a definition for the 4.01 version of HTML, and developed in English. The last part ("http://www.w3.org/TR/html4/strict.dtd") is the URL pointing to where the machine-readable DTD specifications are hosted.

By the by, the DOCTYPE tag is the only tag that you don't close in proper HTML.

Next up are the <html> tags. The opening "html" tag goes just under the DOCTYPE declaration and the closing "/html" goes at the very end of the web page. No exceptions (especially if you're still learning =D). These tags tell the browser that everything in the page is web page stuff.

The "head" tags come next. Inside the head, you should always have at least a title for the page, contained in opening and closing "title" tags. You also want to drop your "meta" tags in here. Meta data is data about data. More will be made of these tags in a later article. Not least, links to accessory files, like your style sheet and any web scripts, get planted in the head. The "/head" tag closes before the "body" tag opens -- rather like the chin coming before the torso.

All of your content, all the beautiful, wonderful, BUY-MY-BOOKS-NOW stuff goes inside your "body" tags. If you want the content to show up in the web browser, you put in between the body tags, which means your page structure is also going to go in here.

And that means, time for the next article.