Content Out Web

Lesson 1: From Content to Mark-Up

In the first lesson, you start out by writing an article in plain text (using good typographical practices). You then start a local web server to test your article. As we experience pain points, we introduce HTML to mark up our text so that the browser knows more about what different elements in our content mean. By the end of the lesson, you will have a valid HTML5 document on your own static web site hosted on Amazon S3 that anyone can view.

Exercise 1: Write the first post

Throughout the process, we will be creating a blog (or a personal site with articles). Our aim is to get something online as quickly as possible that we can then iterate on and improve. We want to create the minimum viable product and put it online. For a blog, that’s the first article.

Let’s get started:

  1. In Finder, create a new folder to hold your web site. I called my folder contentoutweb.com and Natalie called hers ndkane.com. We both placed the folders inside our Dropbox folder so that our work will be backed up in case something goes wrong. Dropbox is a good first step to safeguarding your work. Later, we will learn about other tools that help you compare versions of your work, go back in time to previous versions, and collaborate with other designers and developers.
  2. Open up your favourite text editor (we will be using Sublime Text 2, which is available on OS X, Windows, and Linux) and create an empty file in your work folder called index.html.

Write your first post in the index.html file as plain text. Don’t worry about marking it up in any special way. Just write. Here are a few points to keep in mind when writing your first post:

Good typographical practices

It’s important to get into using good typographical practices as you write your content. It’s not hard to do and it will immediately separate your work from those of others. Here are some useful examples:

For more a more thorough discussion of typographical style while writing, see my article On Practicality on Breaking Things.

Sublime text screenshot with the initial content

Now that your post is ready, it’s time to test it in the browser. ‘But wait,’ I can hear you yell, ‘we haven’t even marked it up using HTML. There’s no CSS! We haven’t written any JavaScript!’ Exactly. To get started, we don’t need any of that.

Exercise 2: Test locally

First off, we want to see how our post looks in a browser. Sure, we can just double‐click on the index.html file and it will pop up in the browser but it’s not the same as having it served via a web server. Don’t worry, you don’t have to install and configure a web server, your Mac comes with several. We’re going to use the simplest possible one.

  1. Open up Terminal (yes, the command line — don’t worry, this isn’t going to hurt. You can find it in /Applications/Utilities/Terminal.app or just search for it by name in Spotlight. (You can also use a wonderful little application launcher called Alfred. It’s much faster than spotlight for finding and launching apps and does a whole bunch of other stuff — like let you do quick arithmetic and search Google, etc. Personally, I wouldn’t want to use a Mac without Alfred.)
  2. In Terminal, type cd and leave a space after it. This is a command. It stands for ‘change directory’. Don’t press return yet. Which directory (folder) do you want to change to? The one that we’re using to store the files for our web site, of course. We could type in the folder name manually but that’s laborious and we’re lazy. Instead, drag the folder from Finder into your Terminal window and you will see the path to it magically appear next to your change directory command. On my machine, it looks like this:
    cd /Users/aral/Dropbox/Projects/contentoutweb.com
    Now press return and you will change to the folder that holds your web site. It currently only has the index.html file that you created earlier.
  3. Now, let’s start our web server to serve files from this folder by using a little web server that comes with Python. Python is a programming language. To run apps written in Python you use the Python runtime. This is an executable file called — you guessed it — python. To start the server, simply type the following command in Terminal and press return:
    python -m SimpleHTTPServer
    You should see a message similar to the following:
    Serving HTTP on 0.0.0.0 port 8000 ...
Screenshot of the results

Congratulations, you are now running your own (albeit very simple), web server!

Now, let’s fire up a browser and check out our awesome creation: In the address bar of your favourite browser (I use Safari), type the address of the web server. Where do you get the address? Well the server already told you when you started it — remember it said it was ‘Serving HTTP on 0.0.0.0 port 8000’. You can either enter http://0.0.0.0:8000 in your browser’s addresss bar, or the easier to type http://localhost:8000. (Protip: other developers will think you’re cooler if you enter the latter, it’s the standard alias for a local development server.)

Screenshot of the article in browser. It displays but it’s not pretty.

And, it works! Well, kind of. It’s not much to look at as all the paragraphs are stuck together and what’s up with all those weird characters where our proper punctuation is meant to be?

Don’t worry, we can fix all that. But to do so, we have learn a little HTML (HyperText Markup Language).

HTML

When you tested the post in the browser, did you notice that you didn’t get any errors? Why was that? Sure, it didn’t come out looking great, but the browser didn’t just throw up its (metaphorical) hands and exclaim ‘nope, I’m not rendering this mess, come back when you learn how to make a proper HTML page.’ It rendered your post to the best of its abilities because HTML is very forgiving. Browsers being lenient and forgiving was an important factor in the early success of the web. It meant that anyone with a text editor could throw any old trash at a browser and the poor sod would chug along, rendering the mess to the best of its abilities. This meant that anyone and their uncle could create a web site and, as is often the case in situations like this, everyone and their uncle did. Sure, occasionally bad things would happen and things wouldn’t look right but it still meant that you didn’t need to be strict in your syntax for a web page to work. Compare that to programming languages where even a character out of place can lead a compiler to throw a hissy fit.

So, although your post renders, it isn’t pretty. And that’s all right for now. Because before we make it any prettier, we’re going to put it live on the Internet.

‘What! No way! I can’t have the world seeing my site in such a state!’, I hear you cry in sheer horror. Don’t worry, the world isn’t going to flock to your site en masse just because you happened to publish it on the Internets. There is, however, a very important reason to get your site up there as soon as possible. And that’s to light a fire underneath you. Once it’s live, there’s every chance that someone might see it. If it’s not something you’re proud of, you have all the more reason to fix it. If it’s just lying there on your computer, it’s all that easier to watch an episode of your favourite TV series instead. (I’d like to make it clear that I’ve never been guilty of this ever.)

So, we’re going to put your jumble of words online and then we’re going to iteratively make it better.

Another advantage of getting your work online as quickly as possible is that you can share it with friends. The moment it starts looking a bit better, you can brag about it to your best pal by sending her the address. That’s much harder to do (although not impossible) if your site is locked away on your computer.

Exercise 3: Upload your site to the web

You’re probably aware that there are a plethora of web hosting companies out there. Their offers range from pennies a month to thousands. Our needs are humble. Our site is going to be a static web site. That is, it is going to be comprised of assets (HTML files, images, etc.) that don’t need to be dynamically generated. Some sites, like social networks, do need the content they serve to be dynamically generated. Imagine a social network with millions of users, each of whom has a profile page and a timeline. They can’t possibly have people creating those pages by hand and saving them in files. Instead, the pages are generated on request from data stored in databases. Such functionality is usually written in programming languages like Python, Ruby, PHP, etc. Since our simple site doesn’t need such functionality, our hosting needs are simpler. We just need a service that can serve files.

Amazon S3 (Simple Storage Service) lets you store and retrieve files. It can also act as a web server for a static site. It’s free to start out with and is a perfect fit for what we want to do. So we will be uploading our web site (currently, just the index.html file) to S3.

Thankfully, the folks at Amazon have written an easy‐to‐follow walk‐through that shows you how to set up a static site using their services. (Why they haven’t made this a simple one‐click process is beyond me.) Follow the instructions on that article. It might seem daunting but the actual process is easy if you stick to the letter of what they tell you. (Just remember to put your-actual-domain.com wherever you see example.com in their instructions, including in the bucket policy code that you copy and paste in step 4.) In Step 3, upload the index.html file you created.

When registering a domain name. I recommend using iwantmyname.com. They’re easy to use and, in my experience, lovely folks who are active on Twitter. At the point of this writing, if you search for your domain name using Domai.nr, you get a 5% discount when you buy it from iwantmyname.com.

Once you’re done with the instructions, you should have your web site online. Initially, you’ll be able to reach it from the Amazon S3 endpoint URLs and, once your DNS information has propagated, you can reach it via your domain name too.

Now that your site is on the web, let’s start improving it.

Making it readable

Like we said, although your post renders, it’s not pretty. For a start, all your paragraphs are crumpled up into a single one. But why? After all, you left empty lines between them while writing them. Why didn’t the browser render them as paragraphs? It didn’t because extra whitespace is generally ignored by HTML. You can leave a hundred empty lines between paragraphs if you like, it’s not going to make any difference. In order to get paragraphs to render as paragraphs, we need to tell the browser that they are paragraphs. We can do this by marking them up with a pair of special tags. The opening tag, <p> tag, tells the browser, ‘here begins a paragraph’ and its closing tag, </p> tells the browser, ‘here ends a paragraph’. With very few exceptions, tags in HTML always have an opening and closing tag that mark up — or add meaning to — the text that they enclose:

<p>This is a paragraph.</p>
<p>This is another paragraph.</p>

Exercise 3: Mark up your paragraphs

  1. Surround each paragraph between an opening <p> tag and a closing </p> tag.
  2. Test your work locally by refreshing the browser (remember that the server is running on http://localhost:8000)
Screenshot of the article in browser with the paragraphs marked up and displaying separately.

Your post should already start looking better. You will see your paragraphs separated from each other by a margin. But why? How does the browser know exactly how much to separate them. You didn’t set that margin somehow.

The thing is, you didn’t need to. The browser comes with what is known as a default style sheet to make things easier. These styles are what are known as CSS (Cascading Style Sheets). We will define our own styles when we get to customising the appearance of our web site but, at this point, it is important to understand a very important distinction: by surrounding our paragraphs with paragraph tags, we didn’t tell the browser to leave some space between them. We told the browser that they were paragraphs. The reason the browser renders them the way that it does is because it has a default style sheet that specifies that anything that is a paragraph should have a top and bottom margin. We do not mark up text for presentational purposes, we mark it up to add meaning to it. You will often hear this referred to as semantics and this is what people mean when they talk about semantic HTML. If you get confused, just remember that you are trying to explain to the browser what certain bits of your page are, not how they should appear. It’s an important distinction. In developer parlance, we say that it is important for us to separate content from presentation.

You can see the default style sheet in action by using the web inspector in your browser. In Safari, you have to go into Preferences, then the Advanced tab, and check the Show Develop menu in the menu bar first. Then, from the Develop menu, select the Show Web Inspector option (or press ⌘⌥I). Make sure that the Style tab is showing (⇧⌃3) and click on one of the paragraphs in the Web Inspector to see the style rules for it. See the bit that reads -webkit-margin-before: 1em; and -webkit-margin-after: 1em; — that’s the bit that tells the browser, ‘leave some space above and below paragraphs’. We don’t need to worry about the syntax right now and we won’t ever be using those actual attributes ourselves (we’ll be using the simpler standard margin attribute) but you just got a glimpse of CSS. The Web Inspector will also be one of your best friends during web development so make sure to learn its keyboard shortcut.

Exercise 4: Mark up your headings

Did you have titles or subtitles in your post? If not, add some now. Notice how they are not rendered any differently from the body text. Again, this is because we haven’t told the browser that they are special in any way. Let’s mark them up so that the browser knows that our headings are, indeed, headings.

  1. Surround the topmost heading in your post with a first‐level heading like this:
    <h1>This is the title</h1>
  2. Surround subheadings with second‐level heading tags:
    <h2>This is a second‐level heading</h2>
  3. If you have other sections nested under second‐level headings with their own headings, mark them up as h3. You can go all the way to h6 but you should really reconsider the structure of your content if you ever find yourself nested that deep.
  4. Test out your post and see how the headings are rendered differently based on their level. Again, exactly how the headings are presented is determined by the built in default stylesheet. We can override those settings later with our own custom styles if we want to.
Screenshot of the article in browser with a hierarchy of headings.

Exercise 5: On heads, titles, and character sets

Things are starting to look much better but there’s still the issue with those crazy‐looking characters. Why are our proper punctuation marks coming up in such a weird manner? The reason is that they are not in the default character set that the browser assumes your article was written in. You see, even before personal computers came on the scene, those lovely Americans came up with a character set called ASCII to codify the letters in the English language for teleprinters and other digital devices used for written communication. It also contained certain non‐printing characters used as control codes for those early devices. Even though ASCII (and, more precisely, the extended‐ASCII character set ISO 8859-1) didn’t include any of those funny letters with lines and dots all over them that us foreign folks love, it remained in widespread usage on the web until 2007, at which point it was surpassed by UTF-8. This encoding can represent every character in the Unicode character set. It’s backwards compatible with ASCII but includes basically every glyph available under the sun (and a few that may have been imported from Mars, by the look of them).

So, all this to say that our punctuation looks funny because it’s not in the default character set of the browser. To fix it, we need to tell the browser to use UTF-8 as its default character set. We can do that by using a meta element to provide that piece of metadata for our article.

  1. Add the following element to the top of your document:

    <meta charset='utf-8'>

  2. Refresh your browser.

Screenshot of the article in browser using the utf-8 character set.

Ah, that’s better. No more funny characters.

But do you see how the title of our article says ‘localhost:8000?’ That’s not pretty is it? It should really state the title of our article (or a combination of our site name and the title of our article). Let’s fix that:

  1. Add the following element under the meta element you added earlier, replacing the text between the opening and closing tags with the title of your article:

    <title>Content Out Web</title>

  2. Refresh your browser.

Screenshot of the article in browser with the title set.

Ah, nice. Things are looking much better. But there is still a problem.

Did you notice how the last two elements we added (the meta element and the title element) are different to the other elements we used to mark up our content? We used paragraph tags and heading tags to add meaning and structure to our content — to the body of our page. The meta element and the title element, however, gave the browser information about the page in general, in other words metadata. It would be nice if we separated the content (the body) from the metadata (the head).

  1. Nest the meta element and the title element in the head element.

    <head>
      <meta charset='utf-8'>
      <title>Content Out Web</title>
    </head>
  2. Nest the content in the body element.

    <body>
      <h1>Content Out Web</h1>
      <p>I’m teaching my friend Natalie how to design and develop a web site…</p>
      …
    </body>
    

Make sure that you indent your markup properly, using the tab key. Properly indented markup is easier to read. That, in turn, makes it easier for you to spot mistakes and fix them (in geekier parlance, ‘it makes it easier to debug’).

If you refresh your browser now, you should not notice any difference. That’s because the changes we’ve made have not had any visual effects. We are just tidying up our markup.

Valid HTML5

You might be wondering why, apart from the goodness of our hearts, we decided to separate the head and the body. In the HTML5 specification, those elements are actually optional and browsers will try to do the best they can to understand and render your pages if you leave them out. However, some browsers (especially older ones) may not do a great job of it. So it’s good practice to include them. Plus, other developers will give you strange looks and talk behind your back if you don’t.

There is still one issue with our article, though. It’s not valid HTML.

Valid HTML, in a nutshell, means HTML that is valid according to the specification. Now, if you’ve looked through the specification via the link above, you’re probably scared and need a drink. Go ahead, have a drink. Back? Great! Instead of having to manually check if your pages are valid, you can run them through an HTML5 validator. Do it now and see what errors you get.

In order to run your page through the validator, you’ll have to upload the latest version of it to Amazon S3. If the S3 upload form is getting tedious, try using a file transfer application like Cyberduck. It’s a much nicer workflow and you’ll really appreciate it when you start uploading larger quantities of files.

Unfortunately, the validator is not happy with the page:

Validation failed: Error: Start tag seen without seeing a doctype first. Expected <!doctype html>

It’s asking us to specify the document type. If our page is to be valid HTML5, we must tell browsers that it is an HTML5 document. So, let’s add the document type declaration:

  1. Add the document type declaration to the top of your page:

    <!doctype html>
    <head>
    …

  2. Upload your changes to the web and run the validator again.

Success message: This document is valid HTML5

Ah, success!

Just because the document is valid HTML5, however, it doesn’t mean that it is elegant HTML. Remember those developers who would snigger behind your back if you left out the <head> and <body> elements, even though they’re optional? These same folks will no doubt point and stare if they ever see you leave out another optional but traditionally expected element that goes by the name of <html>.

  1. After the document type declaration, add the opening html tag. The closing html tag goes at the very end of your page, after the closing body tag. When you’re done, the general structure of your page should look like this:

    <!doctype html>
    <html>
      <head>
        <meta charset='utf-8'>
        <title>Your title</title>
      </head>
      <body>
        <h1>Your title</h1>
        <p>Your first paragraph…</p>
        …
      </body>
    </html>

  2. Run it through the validator again and make sure that it validates.

Well, that’s it! Congratulations, you just created your first HTML5 document. And not only that, but it’s up on the web for everyone to see. Give yourself a pat on the back.

For next time…

If we don’t limit the measure, lines can get too long to read comfortably.

Although we’ve improved the page considerably since its humble plain text beginnings, it’s still suffers from readability issues. The most apparent of these is that if you make your browser as wide as possible, the text becomes unreadable. The lines are just too long to be read comfortably. We call the width of a body of type the measure In the next lesson, we’ll see how to limit the measure to a comfortable width for reading using CSS (Cascading Style Sheets). We will also see how we can leave a bit more space between the lines (‘leading’ in typography), again using CSS. We will also create another page and learn how to link from one page to another.

Homework

  1. Write a new article. This time, start marking it up with HTML from the beginning.
  2. Skim the Wikipedia page on HTML and spend some time on the history section to get an idea of how HTML and the web came to be.
  3. Read about measure from Mark Boulton.
  4. Look at some web sites you normally visit and see if you can point out any issues with their typography that we covered in this lesson. If you want to see how they implemented a certain feature, just select the Develop → Show Page Source menu item in Safari (or press ⌥⌘U). You can also use the Web Inspector as we saw earlier. Remember that many of the best web designers and developers learned their craft not from books but from peeking at other people’s code.

Comments, corrections, questions?