The Internet - User Networking - UNIX: The Complete Reference (2007)

UNIX: The Complete Reference (2007)

Part II: User Networking

Chapter 10: The Internet

The Internet is a vast worldwide network of computers, which has grown and continues to grow at a fantastic rate, both in number of users and amount of traffic. You may not know that the Internet was originally designed to connect UNIX computers. Today it spans all types of operating systems. The major reason for its explosive growth has been the tremendous success of the World Wide Web (WWW), a vast collection of “pages” located on computers throughout the world that are connected over the Internet. The software that makes it possible to use the web effectively the web browser, has evolved into a powerful and easy-to-use application.

This chapter describes the Internet and introduces several different Internet services, including netnews (a bulletin board service), the Internet Relay Chat (IRC), Instant Messaging (IM), and the World Wide Web (WWW). We will concentrate on the most important of these services and provide enough information to help you get started using them. You will also get pointers on where to go to obtain detailed information about using Internet services, including many sources of information available on the Internet itself.

What Is the Internet?

The Internet is a network of computers that use common conventions for naming and addressing systems. It is a collection of interconnected independent networks; no one owns or runs the entire Internet. The computers that compose the Internet run variants of UNIX, as well as a variety of other operating systems, including Windows. Using the TCP/IP and related protocols, computers on the Internet can carry out a wide range of networking tasks. For example, people with Internet access can send electronic mail messages to other people on the Internet, as described in Chapter 8. People can log in to remote computers on the Internet using the telnet command as well as copy files on remote computers on the Internet using the ftp command (discussed in Chapter 9). And they can do many other things through web browsers,discussed later in this chapter.

Accessing the Internet

If you are a user on a multiuser system or if your computer is part of a larger network at a company, an educational institution, or some other organization, you may already be connected to the Internet. If this is the case, you can access Internet services using the appropriate commands described later in this chapter and in Chapter 9. However, if you want to access the Internet from your own computer, you have two choices: You can connect to the Internet directly, or you can use a public-access provider. Directly connecting to the Internet is complicated and beyond the scope of this book. Unless you plan to become heavily involved with the Internet, a better option is to use one of the many public-access providers.

Using a Public-Access Provider

To use a public-access provider, you need to connect to a public-service provider, called an Internet service provider (ISP). ISPs generally charge a fee for using their system to access the Internet. These providers offer a wide range of Internet services. In many places you have several options for how you connect to the Internet. You can connect using a modem over a standard telephone line. You can also take advantage of a high-speed Internet connectivity option, including connections made using a cable modem over cable lines and connections made using a digital subscriber line (DSL) modem over regular telephone lines. If you select a connection using a modem over a regular telephone line, you should find a local (in the sense of a local phone call) Internet access provider or one with a toll-free number, to keep your phone bills low. You may want to select a provider that charges a flat monthly fee that allows use for a large block of hours per month rather than a usage-based fee, because it is very easy to find yourself connected to the Internet for hours at a time. If one or more high-speed connections are available at your location and you are willing to pay the extra cost, you will pay a flat monthly service fee that will provide you a permanent connection to the Internet. Check with your local cable company and DSL providers for details.

Internet Addresses

Each computer on the Internet has an official Internet address, known as an IP address, together with a name that uniquely identifies that computer. Because people prefer using names for computers, whereas networking software uses IP addresses, a way is needed to translate the name of a system to an Internet address. This mapping is provided by the Domain Name Service (DNS). When a program encounters the name of a computer, the program uses the DNS to translate this name into its IP address. See further discussion in this chapter, as well as in Chapter 17, for more information.

IP Addresses

Every computer on the Internet has an IP address that is made up of four integers between 1 and 254 (with 255 used in subnet masks that are used to separate host parts of an address into two or more subnets), separated by dots, such as 127.64.11.9. The first part of the address specifies a particular network that is part of the Internet, and the second part of the address specifies a particular host on that network. The rules for assigning these numbers lie beyond the scope of this book.

Internet System Names

Names of systems on the Internet (known as fulty qualified domain names) consist of alphanumeric strings separated by dots. For example, zeus.cs.unj.edu is the full Internet name of a particular host; here zeusin the name of the system, cs represents the group of all systems in the computer science department, unj contains all systems at the (fictional) University of New Jersey, and edu contains all systems at educational institutions in the United States. Here, edu is one of several possible top-level domains.

There are two varieties of top-level domains. The first, organizational domains are designed to be used inside the United States. A top-level domain in the United States indicates the type of organization that owns the computer. For example, the domain edu is used for computers at educational institutions. It is important to note that organizational domains were devised before the Internet became an international network. This has led to a different type of top-level domain being used outside of the United States. Instead of using the type of organization, computers on the Internet outside of the United States use geographical top-level domains where two letters are used to represent each country of the world. For example, the domain nz represents the top-level domain of computers in New Zealand.

Domain Name Service (DNS)

As mentioned previously, a machine node name is linked to a particular IP address through the DNS service. To use this service, you must register your machine with a server that runs the translation software, called a Domain Name Server. The only two pieces of information that are needed are the machine node name, or DNS name (such as stu.att.com), and the current IP address (such as 135.19.47.203). These two items are stored in a DNS table. Once you are registered, all users can access your machine via its DNS name. This capability is especially useful when your internal IP network changes its addresses frequently. Your local DNS administrator can change the IP address in the DNS table to point to the new address without requiring the people that want to reach you to do anything, since the node (DNS) name that they use to contact you will remain the same. We will discuss DNS in more detail in Chapter 17.

The Usenet

One of the older and still popular services available on the Internet is a bulletin board service known as netnews. Netnews is transmitted over the Internet using the Network News Transfer Protocol (NNTP). The network of computers that share netnews, over the Internet or otherwise, is known as the Usenet (User’s network). Computers at schools, companies, government agencies, and research laboratories in countries throughout the world participate in Usenet.

The collection of programs used to share information is called netnews, and messages are known as news articles. Netnews software is freely distributed to anyone who wants it. News articles containing information on a common topic are posted to one or more newsgroups.

Usenet Background

The original netnews software was developed in 1979 by Truscott and Ellis to exchange information via an old networking program called uucp, between Duke University and the University of North Carolina, Chapel Hill. Interest in netnews spread after a 1980 USENIX talk, with many other sites joining the network soon afterward. Versions of netnews software were developed at Berkeley making it easier to read and post articles and to organize newsgroups, and making it possible to handle many sites.

Traditionally, netnews articles were read using one of several different software programs called newsreaders designed especially for this task. However, with the advent of web browsers (the software designed for displaying web pages), new ways of accessing netnews are now possible. For example, a web browser can be used to receive and to post netnews articles. Furthermore, news articles are now commonly available as web pages, making it possible to access them exactly in the same way as any web page. There is also a web site called Google Groups that you can access to read archived netnews articles-see later in this chapter for details.

How Usenet Articles Are Distributed

In the past, systems used dial-up connections and UUCP software to exchange netnews. However, in recent years more and more systems use existing networks and their communications protocols, such as the Internet with TCP/IP, for news exchange; the Network News Transfer Protocol (NNTP) is used in this case. A group of backbone sites forward netnews articles to each other and to many other sites. Individual sites may also forward the netnews they receive to one or more other sites. Eventually, the news reaches all the machines on the Usenet. Often news has to travel through many different intermediate systems to reach a particular machine.

How Newsgroups Are Organized

Netnews articles are organized into newsgroups. There are literally thousands of different newsgroups, organized into main categories. These categories are either topic areas, institutions, or geographical areas. The names of all newsgroups in a category begin with the same prefix. Some articles on the Internet are distributed worldwide, while others are only distributed in limited geographical areas or within certain institutions or companies. Table 10–1 shows some of the prefixes for the largest categories used for posting netnews worldwide.

Table 10–1: Some Popular Newsgroup Prefixes

Classification

Content

comp

Computing

news

Netnews and the Usenet itself

rec

Recreation

sci

The sciences

soc

Social issues

humanities

Humanities issues

talk

Discussions (talk)

alt

Alternative topics (wide topic area)

misc

Miscellaneous (everything not fitting elsewhere)

An example of a prefix used for newsgroups within a particular institution is att, which is used by AT&T for its internal newsgroups. Examples of prefixes used for newsgroups for specific geographical areas include nj, for articles of local interest in New Jersey, ca, for articles of local interest in California, and ba, for articles of local interest in the San Francisco Bay Area. Hundreds of different prefixes are used for local and special-purpose newsgroups.

Individual newsgroups are identified by their category, a period, and their topic, which is optionally followed by a period and their subtopic, and so on. For instance, comp.text contains articles on computer text processing, comp.unix.questions contains articles posing questions on the UNIX System, and rec.arts.movies.reviews contains movie reviews.

Identifying Available Newsgroups

You will be able to read netnews only in those newsgroups your machine knows about (unless you access netnews via the web-more about that later). To get a list of newsgroups your machine knows about, look for the file /usr/lib/news/newsgroups and print it out. Table 10–2 includes some of the most popular newsgroups (other than those devoted to sex!), other representative newsgroups, and newsgroups with wide distribution, along with a description of their topics.

Table 10–2: Some Popular Newsgroups and Their Topics

Newsgroup

Topic

comp.ai

Artificial intelligence

comp.databases

Database issues

comp.graphics.animation

Animated computer graphics

comp.lang.c

The C programming language

comp.misc

Miscellaneous articles on computers

comp.sources.unix

Source code of UNIX System software packages

comp.text

Text processing

comp.unix.questions

Questions on the UNIX System

misc.consumers

Consumer interests

misc.forsale.non-computer

Want ads of items other than computers for sale

misc.misc

Miscellaneous articles not fitting elsewhere

misc.wanted

Requests for things needed

news.announce.conferences

Announcements on conferences

news.announce.newusers

Postings with information for new users

news.answers

FAQs for different newsgroups

news.lists

Statistics on Usenet use

rec.arts.movies.current-films

Discussions on recent movies

rec.audio.marketplace

High-fidelity equipment want ads

rec.autos.tech

Technical aspects of cars

rec.birds

Bird watching

rec.gardens

Gardening topics

rec.humor

Jokes

rec.photo.misc

Photography and cameras, other than want ads or darkroom topics

rec.travel.europe

Traveling throughout Europe

sci.crypt

The use and analysis of cipher systems

sci.math

Mathematical topics

sci.math.symbolic

Symbolic computation systems

sci.misc

Miscellaneous articles on science

sci.physics

Physics, including new discoveries

soc.singles

Single life

soc.women

Women’s issues

Reading Netnews

Netnews can be read using a traditional newsreader or in combination with the newsreader software built into most popular web browsers. You can also read netnews using a web browser by accessing certain web sites. There are many traditional newsreaders, some of these that have been or are widely used are readnews, vnews (visual news), rn (read news), trn (threaded read news), and tin (threaded Internet newsreader). These newsreading commands are described in the following discussion.

The .newsrc File

The programs for reading netnews use your .newsrc file in your home directory, which keeps track of which articles you have already read. In particular, the .newsrc file keeps a list of the ID numbers of the articles in each newsgroup that you have read. When you use one of the programs for reading news, you see only articles you have not read, unless you supply an option to the command to tell it to show you allarticles. Ranges of articles are specified using hyphens (to indicate groupings of consecutive articles) and commas. The following is a sample .newsrc:

$ cat .newsrc

misc.consumers: 1-16777

news.misc: 1-3534, 3536-3542, 3545-3551

rec.arts.movies: 1-22161

sci.crypt: 1-2132

sci.math: 1-7442, 7444-7445, 7449, 7455

sci.math.symbolic: 1-782

rec.birds: 1-1147

rec.travel: 1-8549

comp.ai: 1-4512

comp.graphics: 1-5695 comp.text: 1-4690

comp.unix.aix!

comp.unix.questions: 1-16142

comp.unix.wizards: 1-17924

misc.misc: 133, 160-162

misc.wanted: 1-8119, 8125, 8131

news.announce.conferences: 1-699

You can edit your .newsrc file if you want to reread articles you have already seen. To do this, use your editor of choice to change the range of articles listed in the file so that it does not include the numbers of articles that you want to read. You can also tell netnews that you are not interested in a particular newsgroup by replacing the colon in the line for this newsgroup with an exclamation point; this “unsubscribes” you to this newsgroup; the exclamation point tells the netnews program to skip this newsgroup when you read news. (In the preceding example, note the exclamation point after the newsgroup comp.unix.aix.)

Using readnews

Although readnews is the oldest program for reading netnews and primarily uses a line-oriented interface, some people still use it. When you enter the readnews command, you see the heading of the first unread article in the first newsgroup in your .newsrc. For example,

$ readnews

------------------

Newsgroup sci.math

------------------

Article 3313 of 3459 Mar 29 19:22.

Subject: New Largest Prime Found

From: galoisparis.UUCP (E. GaloisUniv Paris FRANCE)

(110 lines) More? [ynq]

The header in the preceding example tells you that this is article number 3313 of 3459 in the newsgroup sci.math. You see the date and time the article was posted, and the subject as provided by the author. The electronic mail address of the author and the author’s name and affiliation are displayed. Finally, you are told that the article contains 110 lines. You are then given a prompt. At this point, you can enter y to read the article, n not to read it and to move to the next unread article (if there is any), or q to quit, updating your .newsrc to indicate which new articles you have read. Besides these three possible responses, there are many others. The most important of these other commands is x, which is used to quit without updating your .newsrc. Some of the other available readnews commands are listed in Table 10–3.

Table 10–3: Some readnews Commands

Command

Action

r

Reply to the article’s author via mail

N [newsgroup]

Go the next newsgroup or the newsgroup named

U

Unsubscribe to this newsgroup

s [file]

Save article by appending it to the file named; default is file Articles in your home directory

s | program

Run program given with the article as standard input

!

Escape to shell

<number>

Go to message with number given in current newsgroup

Go back to last article displayed in this newsgroup (toggles)

b

Go back one article in this newsgroup

I

List all unread articles in current newsgroup

L

List all articles in current newsgroup

?

Display help message

You can use the -n option to tell readnews which newsgroup to begin with. For instance, to begin with articles in comp.text, type

$ readnews -n comp.text

You may also want to print all unread articles in the newsgroups that you subscribe to. You can do so using this:

$ readnews -h -p > articles

$ lp articles

The -h option tells readnews to use short article headers. The -p option sends all articles to the standard output. Thus, the file articles you print using lp contains all articles, with short headers.

Using vnews

In the same way that many users prefer using a screen-oriented editor, such as vi, to a line-oriented editor, such as ed, many users prefer using a screen-oriented netnews interface. The vnews program provides such an interface. The vnews program uses your screen to display article headers, articles, and information about the current newsgroup along with the article you choose to read.

When you type vnews, you begin reading news starting with the newsgroup found first in your .newsrc file if this group has unread news. (If you do not have a .newsrc file, vnews creates one for you.) You can specify a particular newsgroup by using the -n option. For instance, the command

$ vnews -n comp.text

can be used to read articles in the newsgroup comp.text.

You will be shown a screen containing the header of the first unread article in this group, as well as a display on the bottom that shows the prompt, the newsgroup, the number of the current article, the number of the last article, and the current date and time. (The format of the header depends on the particular netnews software being used.) An example of what you will see is shown in Figure 10–1.

Image from book

Newgroup comp.text (Text processing issues and methods)

Article <2332@jersey.ATT.COM> May 31 13:18

Subject: special logic symbols in troff

Keywords: troff, logic

From: khr@ATT.COM (k.h. rosen@AT&T Laboratories)

(23 lines)

more? comp.text 484/587 Oct 1 17:13

Image from book


Figure 10–1: Using the vnews command

You can see a list of vnews commands by typing a question mark at the prompt. Some commonly used commands are listed in Table 10–4.

Table 10–4: Some Commonly Used vnews Commands

Command

Action

ENTER

Display next page of article, or go to next article if last page

n

Go to next article

r

Reply to article

f

Post follow-up article

CTRL-L

Redraw screen

N [newsgroup]

Go to next newsgroup or newsgroup named

D

Decrypt an encrypted article

A

Go to article numbered

q

Quit and update .newsrc

x

Quit without updating .newsrc

s [file]

Save the article in file in home directory; default is file Articles

h

Display the article header

Go to previous article displayed

b

Go back one article in current newsgroup

!

Escape to shell

For instance, to read the current article, either press the SPACEBAR or the ENTER key. The contents of the article will be displayed, and the prompt “next?” will appear.

Using rn

The rn program for reading netnews articles has many more features than either readnews or vnews. For instance, rn allows you to search through newsgroups or articles within a newsgroup for specific patterns using regular expressions. Only basic features of rn will be introduced here; for a more complete treatment, see one of the references described at the end of this chapter.

To read news using rn, enter this command, optionally supplying the first newsgroup to be used:

$ rn comp.unix

Unread news in comp.unix 23 articles

Unread news in comp.unix.aux 3 articles

Unread news in comp.unix.cray 12 articles

Unread news in comp.unix.questions 435 articles

Unread news in comp.unix.wizards 89 articles

and so forth

******** 23 unread articles in comp.unix-read now? [ynq]

If you enter y or press the SPACEBAR, you begin reading articles in this newsgroup. However, you can move to another newsgroup in many different ways, including the commands displayed in Table 10–5.

For instance, to search for the next newsgroup with the pattern “wizards,” use the following:

******** 23 unread articles in comp.unix--read now? [ynq] /wizards

Searching...

******** 89 unread articles in comp.unix.wizards---read now? [ynq]

Table 10–5: Some Newsgroup-Level rn Commands

Command

Action

n

Go to next newsgroup with unread news

p

Go to previous newsgroup with unread news

Go to previously displayed newsgroup (toggle)

I

Go to first newsgroup

$

Go to the last newsgroup

gnewsgroup

Go to the newsgroup named

/pattern

Scan forward for next newsgroup with name matching pattern

?pattern

Scan backward for previous newsgroup with name matching pattern

Once you have found the newsgroup you want, you start reading articles by entering y. You can also enter=to get a listing of the subjects of all articles in the newsgroup. After you enter y, the header of the first unread article in the newsgroup selected is displayed as follows:

******** 89 unread articles in comp.unix.wizards read now? [ynq] y

You obtain the first article, which will look something like this:

Article 5422 (88 more) in comp.unix.wizards

From: fredjersey.att.com (Fred Diffmark AT&T Laboratories)

Newsgroups: comp.unix.wizards,comp.unix.questions

Subject: new Solaris real time features

Keywords: Solaris, real time

Message-ID:

Date: 2 Mar 99

Lines: 38

--MORE--(19%)

You enter your command after the last line. Some of the many choices are displayed in Table 10–6. The commands in Table 10–6 let you read the current article, find another article containing a given pattern, or perform one of dozens of other possible actions.

Table 10–6: Some Article-Level rn Commands

Command

Action

SPACEBAR

Read next page of article

ENTER

Display next line of article

CTRL-L

Redraw the screen

CTRL-X

Decrypt screen

n

Go to next unread article in newsgroup

P

Go to previous unread article in newsgroup

q

Go to end of article

Go to previously displayed article (toggle)

^

Go to first unread article in newsgroup

g pattern

Search forward in article for pattern specified

s file

Save article to file specified

number

Go to article with number specified

$

Go to end of newsgroup

/pattern

Go to next article with pattern in its subject line

/pattern/a

Go to next article with pattern anywhere in the article

/pattern/h

Go to next article with pattern in header

?pattern

Go to first article with pattern, scanning backward

/

Repeat previous search, moving forward

?

Repeat previous search, moving backward

The rn has many more sophisticated capabilities, such as macros, news filtering with kill files, and batch processing.

Using trn

Instead of using rn to read netnews, you may want to use trn, a threaded version of rn developed by Wayne Davison. This newsreader is called threaded because it interconnects articles in reply order. Within a newsgroup, each discussion thread is represented as a tree where reply articles branch off from the respective originating article that they are a reply to. A representation of this tree, or part of it if it is too large, is displayed in the article header when you read articles.

Many people prefer using trn because it lets them work through trees of threaded articles, reading an article, replies to this article, replies to these replies, and so on. If you typically use rn, you may want to try trn (keep the manual pages for trn at your side when you first begin using it). Because trn is an extension of rn, we will not cover it in detail here, but we will briefly describe how articles in a newsgroup are presented and organized when you use this newsreader.

When you tell trn you want to read the articles in a particular newsgroup, you are presented with the overview file for this newsgroup one page at a time, showing threads of articles from that newsgroup, as you’ll see in Figure 10–2.

Image from book

sci.math 818 articles

a+ Carl Gauss 3 Quadratic reciprocity

Lenny Euler

Adrian L.

b A1 Einstein 2 Relativity theory

P.W. Herman

d D. Hilbert 1 >1+1=0

e+ Sonya K. Klein bottles

Mr. Mobius

Gwendolyn G.

Deborah Z.

- - Select threads (date order) - - (Top 1%) [>Z]

Image from book


Figure 10–2: An example file overview trn screen

This screen shows us that the newsgroup sci.math has been selected and that there are 818 unread articles in this newsgroup. We see four threads displayed, identified by the letters a, b, d, and e (c is skipped because it is a trn command). To select threads, you type the letter of the thread. For instance, here the letter a was entered, which caused the first thread to be selected. Similarly, the letter e was typed, selecting the fourth thread of the screen. (Also note that at the bottom of the screen, we’re told that we have seen the top 1 percent of articles.) For more information about trn, enter

$ man trn

Using tin

Another widely used threaded news reader is tin, short for threaded Internet newsreader. When you start tin, you are presented with a list of the newsgroups you are subscribed to, with the number of articles that you have not yet read in a newsgroup displayed to the left of the newsgroup name. At this stage, you are at the newsgroup-selection level. To read the articles in one of these newsgroups, move your cursor to the name of this newsgroup and press the ENTER key You will then see a list of subjects, each representing a thread consisting of one of more articles, as well as responses to these articles. At this stage, you are at the subject-selection level. To see a summary of the articles in a subject, move your cursor to the thread and press the L key (short for list). This brings you to the article-selection level.

To return to the list of subjects, press the Q key. To begin reading the articles in a subject, position the cursor over this subject and press the ENTER key or type the number of this subject, which you will see to the left of the subject name, and press the ENTER key Use the TAB key to move to the next article in a thread and press the ENTER key to display the initial article in the next subject area. To respond to article and add to its thread, press the F key This will give you the opportunity to create your response. Once you have read all the articles that you want in a newsgroup, pressing the c key will mark all articles in that newsgroup as read; these articles will not be displayed the next time you read netnews. At any point, you can press the H key to see all commands available to you at that point. For more information about tin, go to http://www.tin.org or consult its manual page by entering

$ man tin

Using a Web Browser to Read Netnews

Instead of using a newsreader to read netnews, you can use a program that comes with your web browser. Web browsers usually provide an easy-to-use interface for reading netnews. Using your browser, you can find and subscribe to newsgroups, read and select messages, thread messages, filter messages, reply to messages and post new ones, and do many other things.

Among the benefits to using a web browser for netnews is that newslists and related lists can be categorized logically on a web page, with other linked pages containing similar items accessed as a hot link. Another benefit is that the display of netnews lists can be altered to be more appealing than the normal lists generated for a newsgroup. For details see Internet: The Complete Reference, second edition, listed at the end of this chapter.

Using Google Groups to Read Netnews Articles

Another way to access netnews articles is to use Google Groups at http://groups.google.com/. Google Groups maintains an incredibly large and comprehensive archive of more than 1 billion Usenet postings dating back to 1981. Using Google Groups, you can locate newsgroups whose names and/or descriptions match keyword searches and search for articles in newsgroups that contain a particular word or phrase. You can also browse all the articles in a particular newsgroup. Google Groups also allows you to create you own new newsgroup. Google Groups incorporates the original archive of Usenet articles previously supported by the defunct Deja News service.

Posting News

Many programs that are used to read netnews can also be used to post news articles. For example, you can post news articles when using tin to read netnews by pressing w (write). To create your article, enter the subject and text of your article. You can edit the text using the Pico editor. You post your article by typing CTRL-X. If you want to post a news article that begins a new thread while reading netnews with trn, type f and then type y when you are prompted to answer the question “Are you starting an unrelated topic.” You can then create your news article using the emacs editor.

Another way to post news articles to netnews is to use one of several different netnews programs that can be used to write news articles and to send them to the Usenet. Two of these are Pnews and postnews. We will describe how to use Pnews next.

Using Pnews

To use Pnews, type this:

$ Pnews

You will be prompted for the answers to a series of questions. After providing the answers, you write your article and post it.

The first thing that Pnews asks you is to which newsgroup or newsgroups you want to post your article. You should include only relevant newsgroups, with the most relevant listed first. Some articles clearly belong in a specific newsgroup. For instance, if you have a question on computer graphics, you probably should only post it to comp.graphics. Other articles should be posted to more than one newsgroup. For instance, if you have a question on graphics in text processing, you may want to post this to comp.unix.questions, comp.text, and comp.graphics. Be sure not to post your article to inappropriate newsgroups.

Choosing a Distribution After specifying the newsgroups for your article, Pnews asks you how wide distribution should be. There are some messages you would like all Usenet users in a particular group to receive. For instance, you may really want to ask Usenet users in Sweden, Australia, and Korea for responses to a question on computer graphics. However, if you are selling your car, it is quite unlikely that you want to send your netnews article to these countries. (If you post such an ad worldwide, someone in Sweden may sarcastically ask you to drive the car by for a look!) How widely your article is distributed depends on the response you give when the Pnews program prompts you for a distribution. The possibilities depend on your site and are displayed by the program.

After specifying the newsgroups, you are prompted for the Title/Subject and then asked whether you want to include an existing file in your posting. When you respond, you are then placed in your editor (specified by the value of your shell variable VISUAL or, if this is not set, EDITOR). The first lines of the file are in a particular format. There are lines for the newsgroup, the subject, a summary, a follow-up to line, a distribution line, an organization line, a keywords line, and a Cc: line. You can edit each of these lines and then edit your article. When you are finished editing the file, you can then send the article to the Usenet.

Including a Signature You can have a block of lines automatically included at the end of every article you post. To do this, create a file called .signature in your home directory containing the lines you want to include at the end of your articles. (On some systems, no more than four lines are allowed in a netnews signature. This varies from system to system.) Be sure to change the permissions on this file to make sure it is readable by everyone. Besides putting your name, e-mail address, and phone number in your signature, you may want to put in your favorite saying. For example,

$ cat .signature

Oscar O. Orez

ooojersey.ATT.COM (201) 555–1234

************************ Life is a Dream! ************************

To avoid irritating fellow netnews readers, do not use lengthy or offensive signatures.

Moderated Newsgroups

Not all newsgroups accept every article posted to them. Instead, some newsgroups, such as rec.humor.funny, have moderators who screen postings and decide which articles get posted. Moderators decide which articles to post by considering the appropriateness, tastefulness, or relative merit of postings. When you read articles with current versions of netnews software, moderated newsgroups are identified in the group heading of articles. When you post an article to a moderated group, your article will be sent directly to the moderator of the group for consideration.

Internet Mailing Lists

A mailing list is a distribution of electronic mail messages to a set list of recipients from a central point. A mailing list manager maintains a subscriber list. A list may or may not be moderated. If not, when a subscriber (or for some lists, when anyone) sends a message to the mailing list manager, this message is posted to everyone on the subscriber list. If the list is moderated, the moderator decides whether to approve messages sent to the mailing list manager. Subscriptions of a mailing list may also be open to everyone, or subscriptions may be restricted by the mailing list manager. All this is accomplished via a mailing list management program, such as LISTSERV (see http://www.lsoft.com/listserv.stm) and Majordomo (see http://www.greatcircle.com/majordomo/). Each mailing list also has an administrative address.Messages are sent to this address when someone wants to subscribe or unsubscribe to the list, or make other changes to their subscription.

Mailing lists number in the tens, and perhaps hundreds, of thousands, and they exist on a tremendous variety of subjects. With so many mailing lists, you may wonder how you might find those that could be of interest to you. Fortunately, there are excellent ways to find mailing lists on particular subjects. One excellent way to find mailing lists is to use CataList, the mailing list directory web site, at http://www.lsoft.com/lists/listref.html. Using CataList, you can do keyword searches to find mailing lists or browse through lists by category CataList knows about more than 74,000 different public mailing lists.

Subscribing and Unsubscribing to a Mailing List

Once you find a mailing list that might include messages of interest to you, you can subscribe to it. To subscribe to a mailing list, you send a command to the administrative address for that mailing list, putting this command as a line in an e-mail message. For mailing lists that use the LISTSERV mailing list software, this command needs to be in the form

subscribe listname your name

where listname is the name of the list and your name is your actual name, not your e-mail address. For mailing lists that use Majordomo, you do not include your name on this line.

Often you will find that you are not as interested in the messages posted to a particular mailing list as you thought you might be, or you just find yourself swamped with messages. If this is the case, you might decide to unsubscribe to the mailing list. To do this, on the first line of an e-mail message send the command

signoff listname

if the mailing list uses LISTSERV software or the command

unsubscribe listname

if the mailing list uses Majordomo list management software.

Caution

Many people try to subscribe or unsubscribe to a mailing list by sending a message to the list address rather than the administrative address. Never do this, because all this does is post your message (subscribe or unsubscribe) to everyone on the list!

If you cannot find a mailing list that meets your needs, because the subject of interest is not addressed, because of a clutter of too many messages, or some other reason, you may want to start you own mailing list. There are several ways to start and run your own mailing list. You can install a mailing list management program on your computer. (Refer to the web sites for LISTSERV and Majordomo to find out more about this option.) If this option does not appeal to you, you might want to use a mailing list hosting service that charges a fee. For more information about this option, consult Internet: The Complete Reference,listed at the end of this chapter.

Internet Relay Chat

The Internet Relay Chat (IRC), developed by Jarkko Oikarinen in Finland in 1988, provides a way for people on the Internet to carry out a conversation with many different participants, similar to how a telephone chat line operates. The IRC was designed as a major advancement over the talk command, which allows two users to carry on an electronic conversation. Unlike talk, the IRC supports multiple users and multiple simultaneous channels and it has many additional features. Because of its rich set of features and capabilities and because people like to chat, the Internet Chat has become an extremely popular part of the Internet in the past few years.

To use the Internet Relay Chat, you must have an IRC client program installed on your machine. The standard UNIX IRC client program is called ircII. (There are many other IRC client programs; see http://en.wikipedia.org/wiki/List_of_IRC_clients for a list.) You must also be connected to a network, such as the Internet, that provides a TCP/IP connection to an IRC server. On the Internet, groups of IRC servers are grouped together into IRC networks. Each network can support many different chat rooms, which on IRC are known as channels. There many different IRC networks; some of the largest of these are EFnet, Undernet, IRCnet, and Galaxynet. (See http://www.irchelp.org/irchelp/networks/ for a list of IRC networks.) The number of channels on a particular IRC network can be quite large. For example, on the large and widely used EFnet there are often more than 35,000 active channels.

An excellent way to find a particular channel that may be of interest to you is to the SearchIRC site at http://searchirc.com where you can search through more than 650,000 active channels on approximately 3,500 different IRC networks to find channels that may be of interest to you.

As mentioned previously, each conversation using the IRC (on a particular IRC network) takes place on a particular channel. There are some channels that are present on most IRC servers such as EFnet. For example, the channel #hottub is a general meeting place for people to talk about every possible subject. (Note that the names of IRC channels generally begin with the pound sign, #). Other general chat channels are #talk and #chat. There are also channels devoted to discussions of technical topics, such as #unix, #perl, and #linux. And there are channels dedicated to the discussion of particular countries and their cultures, such as #england and #korea. Some channels have chat sessions in languages other than English. For example, #francais has discussions in French and #espanol has discussions in Spanish. You will also encounter channels with discussions in Japanese where Kanji characters are used; you won’t be able to participate in these unless your system supports Kanji characters (and unless you understand Japanese!).

Getting Started with the IRC

If an IRC client, such as ircII, is installed on your system, you can start you IRC session by entering the command

$ irc

This will connect you to a default IRC server. If ircII is not installed on your system, please see the web site http://www.irchelp.org/irchelp/ircii/ for information and instructions about downloading and installing it.

You will now be in an IRC session. You continue by entering IRC commands, each of which begins with a slash (/). If you want to connect to a different IRC server than your default server, use the /servercommand. For example, to connect to the EFnet server irc. colorado.edu, you enter

/server irc.colorado.edu

If your connection is successful, you will get a message back from the server to that effect.

*** Connecting to port s6667 of server irc.colorado.edu

After connecting to a particular IRC server, you are not automatically connected to any channel. The first thing you may want to do is to list all the available channels. When you use the IRC command this way,

/list

you will see a list of all channels, the number of people currently on each channel, and the topic of the channel (for channels where a topic has been set). Note that you might want to run the command

/set hold_mode on

before you run the /list command so that only one screen of information will be presented to you at a time.

You can also see who is currently participating in a particular channel using the /who command. For example, to see who is currently taking part in #hottub, you type this:

/who #hottub

To join a channel, you use the IRC /join command. For example, to join the channel #hottub, you use this command:

/join #hottub

You can see who is joined to your current channel using the command

/who *

To exit from a channel, you use the /leave command. For example, when you want to leave #hottub, you use this command:

/leave #hottub

It is possible to participate in more than one channel. To do so, you must first run the command

/set novice off

and then use the /join command to join each of the channels you want to participate in.

You can get a brief introduction to the Internet Relay Chat using this command:

/help intro

Summary of IRC Commands

Table 10–7 lists some of the most important ircII commands and describes what each does. You can find a comprehensive list of ircII commands and their actions at http://www.irchelp.org/irchelp/ircii/commands/.

Table 10–7: Some Useful ircll Commands

Command

Action

/help

Lists all IRC commands

/join channel

Joins you to the channel given

/leave channel

Leaves the channel given

/list

Displays information about all channels

/list -max m

Lists channels with no more than m participants

/list -min n

Lists channels with at least n participants

/nick nickname

Sets your nickname to the given nickname

/quit

Ends your IRC session

/who channel

Displays current participants in channel given

/who *

Displays who is a participant in your current channel

/whois *

Displays information about all participants

Running an IRC Server

To set up your machine as an IRC server, you must run the ircd (IRC daemon) program. Doing this is beyond the scope of this book. We recommend you consult the IRC Daemon: IRC Server Software web page at http://www.irchelp.org/irchelp/ircd/. You will find useful links for learning how to set up, configure, and maintain an IRC server, as well as links for downloading the necessary software.

Finding Out More about the IRC

Some excellent sources are available to you to find helpful information about the Internet Relay Chat. The Internet Relay Chat (IRC) Help web page at http://www.irchelp.org is a great starting point for useful web resources. You’ll find FAQs, tutorials, help pages, primers, IRC client information, and many other related things at this site. Several books are devoted to the Internet Relay Chat, such as IRC & Online Chat by Powers, The IRC Survival Guide by Harris, and Learn Advanced internet relay Chat by Toyer. You can also consult the book Internet: The Complete Reference, Second Edition.

Instant Messaging (IM)

Another way to carry out a conversation with someone over the Internet is to use Instant Messaging, or IM, for short. To use instant messaging, you need to have an instant messaging client program installed on your system. There are a number of instant messaging clients for one or more variants of UNIX, including Gaim (http://gaim.sourceforge.net/) (for Linux, BSD, Mac, and OS X), Kopete (the KDE Instant Messenger) (http://kopete.kde.org/), Ayttm (http://ayttm.sourceforge.net/) (for Linux and BSD), Eb-Lite (http://linux.softpedia.com/get/Communications/Chat/EE-lite-225.shtml) (for Linux), Sun Java System Instant Messaging (http://www.sun.com/software/products/instant_messaging/index.xml) (for Solaris, Mac OS X, Linux, and HP-UX). These programs are available for free downloading. Using a client program, you can set up a connection to one or more instant messaging services. IM clients generally support a large number of different IM services. We will briefly discuss one of these clients, Gaim, here.

Gaim

A versatile, multiplatform instant messaging client program called Gaim (named after a fictional alien race from Babylon 5) is available for free download. Gaim was originally written by Mark Spencer; it now runs on Linux, BSD, and Mac OS X, as well as on Windows. Gaim is compatible with many different instant messaging systems, including the AOL Instant Messenger, MSN Messenger, Yahoo! Messenger, and Jabber, as well as IRC. Using Gaim, you can log in to multiple accounts on multiple IM networks at the same line. For example, using Gaim you can simultaneously chat with a friend on AOL Instant Messenger and talk with a different friend on Yahoo! Messenger, while you participate in an IRC channel.

Gaim provides tabbed message windows for switching among different conversations. It supports a wide array of features, including many features of different IM services. In particular, Gaim supports file transfer, buddy icons, away messages, typing notification, and MSN window closing notification. Using the Buddy Pounces feature of Gaim, when a particular buddy signs on, goes away, or returns from idle, you can have the program notify you, send you a message, play a sound, or run a program.

For more information about Gaim, go to the official Gaim home page at http://gaim.sourceforge.net/.

The World Wide Web

Hundreds of millions of people using the World Wide Web everyday. But what is the web? The web, short for the World Wide Web, is a global network connecting millions of documents, called web pages, stored as files on computers called web servers. Web servers often contain groups of web pages that together make up a web site. Web pages are formatted using a special language, HTML (Hypertext Markup Language), discussed later in this book. Web users view these files using a client program called a web browser, which has become a crucial software program for personal computers.

Browsers

The web is based on a client/server model. The client runs browser software that allows a user to request information on the web and to browse and navigate through it to pick out useful information. The information that you request is stored on a machine called a web server. The function of the web server is to provide (serve up) web documents, pages, and applications to multiple simultaneous browser clients. We will discuss web servers further in Chapter 16.

To view information on the web, you use a program known as a browser, which is a program running on a client machine. Your browser is your user interface as you navigate through the World Wide Web. You provide your browser with the address of a web site (in the form of a Uniform Resource Locator (URL) described later in this chapter). The browser then tries to obtain the web page you requested over the Internet. If the browser successfully fetches this page, you can then view information on that page and navigate to locations both on the page and those linked to other pages, through what is referred to as a hot link or hyperlink.

Web Browsers

To access the web, you will need a web browser. The browser is the program you use to display web pages and to navigate between web pages. Web browsers either are stand-alone products or are bundled with other Internet applications. We will discuss several of the many available web browsers for UNIX, but before we do so, we will briefly describe some of the fascinating history behind the development of web browsers.

Browser History

The first web browser, Mosaic, was developed by Marc Andreessen and Eric Bina at the National Center for Supercomputer Applications and was introduced in 1993. The first release of Mosaic was developed to run on the UNIX X Window System. Later Andreessen left NCSA and became one of the founders of the Mosaic Communications Corporation, which later changed its name to Netscape Communications Corporation. The web browser produced by the Netscape Communications Corporation, simply called Netscape, was the first commercial browser subsequent to the release of Mosaic. Later, Netscape introduced its Netscape Communicator, an Internet applications package which included an enhanced version of their Navigator web browser bundled with an e-mail client, a tool for reading netnews, a program for composing web pages, groupware software, and several other programs. Netscape Navigator and Communicator became extremely popular programs.

To counter the success of the Netscape web browser, Microsoft developed a competing web browser called the Internet Explorer, based on the original Mosaic browser. The Netscape Communications Corporation and Microsoft fought a three-year war, called the browser war, where each introduced new features into their browsers and matched features each other introduced. Furthermore, Microsoft bundled their Internet Explorer with their Windows operating system; they also produced a UNIX version of Internet Explorer. The Microsoft strategy, especially the inclusion of the browser at no extra charge with the operating system, made it impossible for the Netscape Communications Corporation to compete. In 1998, the Netscape Communications Corporations, realizing that Microsoft had succeeded in making the Internet Explorer the dominant web browser, decided to start the open-source Mozilla project, with the goal of developing the next generation of the Netscape Internet applications package.

America Online (AOL) purchased the Netscape Communications Corporation in 1998 and developed several releases of Netscape. In the following years, AOL lost interest in Netscape and in 2003, AOL disbanded the remnants of the Netscape Communications Corporation. Their final release of Netscape was made in 2004. Meanwhile, Internet Explorer was available until 2002 for two UNIX platforms, HP-UX and Solaris; after 2002 the Internet Explorer was not ported to UNIX platforms.

The Mozilla organization developed their Mozilla Applications Suite, including a browser, an e-mail client, a netnews client, a web page editor, and an IRC client. Release 1 of the Mozilla Applications Suite was introduced in 2002. In 2003, the Mozilla Foundation was created to continue the open-source development work of Internet applications based on Mozilla software. The Mozilla Foundation, using the Mozilla Applications Suite as a base, has developed its Firefox web browser and a variety of other Internet applications. The final release of the Mozilla Applications Suite has been released; it is being superseded by a new offering, called SeaMonkey, which is under development by the Mozilla community.

Using UNIX Browsers

There are many browsers available for UNIX platforms; some of these browsers are parts of larger Internet application suites. Noteworthy browsers for one or more UNIX variants include the Mozilla browser, which is part of the Mozilla Applications Suite; Firefox, a browser also developed under the auspices of the Mozilla Foundation; Epiphany, a web browser for the GNOME desktop; and Konqueror, a web browser for the KDE desktop. For a list of web browsers and links you can use to determine which platforms they run on, go to http://en.wikipedia.org/wiki/List_of_web_browsers.

We will briefly introduce some features of the Firefox browser. Other browsers share many features with Firefox, although the particulars on using and configuring each browser will vary. Using a browser is relatively straightforward. Most of the functions are apparent from the button or menu titles. All you really need to know is that hyperlinks are displayed in a distinctive style, either by text color, by underlining, or by a colored border around an image, and that you follow hyperlinks by clicking the distinctive text or image.

Besides following links, you may want to use a number of other functions which most browsers provide. You can bookmark (save) the link to a site, so that you don’t have to retype the entire URL the next time you want to access it. If you want to refer to the content on the page later, you can save it to a file, using the File | Save As menu, or you can print it by using the Print button. If you get lost, you can go back to your home page (the default one when you start your browser) by selecting the Home button. As you become more experienced, you can try to customize your environment to make browsing easy Before you do this, though, you should understand the effects of the changes you make. You may find it helpful to read the online help information for your browser by selecting the Help button.

Your Initial Home Page

When you first invoke your browser, it will probably attempt to display the browser vendor’s home page, or it might just display a local file. If your machine has direct access to the Internet, you might be pleasantly surprised to see a home page appear after a few seconds. If you don’t have direct access to the Internet, because your organization either is not connected or is connected through an unfriendly firewall, the connection attempt will time out and you will be greeted with a message giving some indication of the problem.

You will probably want to configure your browser to display a home page of your choice rather than the vendor’s default page. You may be better off specifying a local file for your initial home page. That way, if the network beyond your machine is down, you won’t have an annoying timeout when you bring up the browser.

Using Firefox

With Firefox, the initial home page is indicated in an option menu, as shown in Figure 10–3. To begin, click Tools on the menu bar, and then click Options (note that this procedure is done by clicking Edit and then Preferences in some Firefox distributions).

Image from book
Figure 10–3: Firefox initial home page setting

The Options window displays a group of five categories. You begin with the General category

Firefox General Settings Configure your initial home page here by entering your favorite URL in the text box in the Home Page area in the Location dialog box. You can also select your current page as your home page. You can also use a page that you have previously bookmarked or a blank page. In the General category, you can also choose whether Firefox should be your default browser. You can also specify how Firefox should connect to the Internet. You need to check with your system administrator to see if your organization runs a “caching proxy server.” If so, enter the name of the machine and port number in the appropriate fields in the Manual Proxies form. Normally, the caching proxy is set by your ISP. However, corporations typically use a firewall proxy to keep people from going to unwanted sites. In addition to caching information, firewalls provide a strong degree of security by monitoring outgoing requests as well as incoming ones. Failure to set up adequate protection with a firewall can make your browser environment unsafe; that is, web users that are not on your network can access your information and possibly change things.

Firefox Privacy Setting By selecting the Privacy category and then selecting the History tab, you can specify the length of time that pages you visit should be kept in the History file. By selecting each of the other tabs, you can tell Firefox to save information you enter in forms for later use, you can have Firefox remember login information for different web pages, you can tell Firefox when to remove downloaded files, you can manage cookies (which are pieces of information that web sites store on your computer), and you can specify the size of the cache used by Firefox.

Other Firefox Settings You can select the other tabs: Content, Tabs, Downloads, and Advanced to further configure your Firefox browser. For example, using the Content tab you can block pop-up windows and enable or disable Java and JavaScript. Using the Tabs tab, you can determine how links from other applications are opened, such as in a new window or on a new tab on the most recent window. Using the Downloads tab, you can specify the folder where files should be saved. Using the Advanced tab and selecting the Security tab that follows, you can specify the security protocol or protocols to use.

Helper Applications and Plug-Ins

Documents on the web come in many different media flavors, including text, images, audio, and movies, and each of those media flavors comes in many different formats. For example, a text page may be expressed in many different formats, including HTML, PostScript, LaTeX, Word for Windows, or unstructured text. Browsers differ in their capabilities to display various forms of each type of media. Depending on your browser version, certain file extensions may be handled by a helper application, which associates the file extension with a specific program (for instance, .doc files are opened by MS Word).

With most browsers, you can also integrate new content types using software programs called plug-ins. Although both helper applications and plug-ins enable a browser to expand the number of file types that it can handle, plug-ins are most closely integrated with the browser’s environment. Plug-ins can be loaded and unloaded from memory, whereas helper applications usually remain active even after you have left the web page you were viewing and even after you have closed your browser down.

HelperApplications

Browsers invoke external programs called “Helper Applications” or “Viewers” to deal with document formats that the browsers themselves do not understand. The format of a document is indicated by the last part of the name of the document, sometimes called the “extension” of the document name. For example, several common formats and the corresponding extensions are displayed in Table 10–8.

Table 10–8: Some Common Internet File Formats and Their File Extensions

Format

Extension

HTML

.html

PostScript

.ps

GIF

.gif

JPEG

jpg, .jpeg

Wave (audio)

.wav

MP3

.mp3

Real Audio

.rpm, .ra

PDF (Adobe Acrobat Reader)

.pdf

AVI

.avi

MPEG

.mpeg

Table 10–9 shows some popular helper applications.

Table 10–9: Some Popular Helper Applications

Format

Viewer

Graphics

Xv

Audio

Showaudio

PostScript

gs

Plug-ins

Another way to view different media types with Firefox, Mozilla, and many other browsers, is to use a plug-in. Plug-ins can be used to seamlessly integrate content of different media types in web pages. Firefox, Mozilla, and many other browsers, can determine whether it has a plug-in for playing the file. When a file with one of these extensions is accessed from a web page, the plug-in automatically starts the associated application based on the file type. For example, if you access a file called starspangle.wav that is a wave file, this audio file will begin playing over your attached speakers. To find Firefox and Mozilla plug-ins, go to https://addons.mozilla.org/firefox/plugins/. Some of the plug-ins you will find there are the Acrobat Reader for viewing PDF files, a Flash Player for delivering Macromedia Flash content, the Java Runtime Environment that runs applications and applets written in Java, and the Real Player, which can play streaming audio and video.

Reading RSS Feeds You can use Firefox to access Really Simple Syndication (RSS) feeds, such as blogs and news headlines, using the Live Bookmark feature. Life Bookmarks automatically keeps track of updates for you. To learn more about Live Bookmarks, go to https://addons.mozilla.org/firefox/extensions/. There are also add-on programs to Firefox that can be used to read RSS feeds. Go to https://addons.mozilla.org/firefox/extensions/ to find these.

Web Documents

After you have learned the basic operations of your web browser, you may want to understand more about how information is organized and structured on the web. The most common unit of information on the web is the document. Most documents consist of text and images and are called pages. However, documents may come in a variety of other forms including audio and video and a wide selection of image files. Browsers may display documents directly, or they may invoke another program called a “helper application” or a “viewer.” All browsers can display text, and most can display some image formats. For sound or movies, however, they need to call a viewer. The binding between a document type and a viewer may be configured by the user, making it possible to reference document types unknown to the browser. This is especially helpful for newer types of audio and video applications.

Each document on the web has a unique address known as a Uniform Resource Locator (URL). A document’s URL indicates the Internet protocol needed to access the document (e.g., HTTP, FTP, and so on), the Internet address of the machine serving the document, the filename of the document on the machine relative to a server-specific root, and an optional port number for specialized server configurations.

Although most documents are static files, a document may be generated by executing a program at the server, making it possible to serve dynamic data such as weather, dates, and times, which may change from one reference to the next. These programs are often called CGI-BIN scripts. We will talk about them in Chapter 15, as well as some client-side tools used for presenting dynamic data.

Links

Perhaps the single most significant factor contributing to the phenomenal popularity and growth of the web is the hypertext link or hyperlink. Any document, anywhere on the web, can refer to any other document, anywhere on the web, with a hypertext link. The browser displays a link in a document with some form of highlighting such as a contrasting color or an underline, or in the case of links associated with images, with a distinctive border.

The user follows the link by moving the mouse over the highlighted text or image and clicking with the mouse. This instructs the browser to display the document indicated by the URL associated with the highlighted text. The new document in turn may include links to other documents, which contain links to yet other documents. The web is not hierarchical or tree-structured like a computer’s file system. In other words, after following a thread of links through several pages it is not necessary to make your way back up the first thread before another thread can be started. Instead, any document can link to any other document or documents in a web-like structure-thus, the origin of the term “web” to describe the collection of all hypertext servers.

Addressing

To send traditional mail (in computer circles sometimes called smail for snail mail) to a person, it is necessary to know that person’s house number, street, and city or town (and perhaps postal code and country) To call a person on the telephone requires a phone number. A phone number and a number/street/city/town are both forms of an address, an identifier that is unique in a given context such as the phone or postal systems. On the web each document also has a unique address, known as a Uniform Resource Locator, or more commonly, a URL.

A URL is embedded in a document by the author when a hypertext link is created and is accessed by a browser when the link is followed. Browsers display the URL of the current page and usually also display the URL of a link when the cursor is moved over the hypertext reference. You will usually reference URLs by clicking links, not by typing them in explicitly However, you may see URLs outside the context of a browser, for example, in a netnews article or e-mail. URLs are starting to appear in the nontechnical press as well. A quick review of a recent issue of Time magazine revealed URLs mentioned in two ads. (In these cases you will have to enter the URL into your browser to view the referenced document.)

A key strength of the web is the integration of access to many dissimilar resources from a common browser. The addresses of those resources are likewise integrated into the common syntax of the URL. We will describe the URL structure of several web protocols.

HTTP

Let’s take a look at an http URL. All of the examples here refer to a fictitious company, Foobar Sales, so don’t try to use them. The web is changing very rapidly and many URLs quickly become stale-that is, the documents they refer to may have been moved or deleted or perhaps the machine serving a document has been upgraded and has a new name. We prefer to use a contrived URL that will never work rather than a real one that may be stale by the time you read this. A typical URL looks like this:

· http://www.foobar.com/marketing/brochures/overview.html

This URL tells the browser to use the HTTP protocol (hypertext transfer protocol-yes, the word protocol is redundant, but we won’t get into that) to contact a machine named www.foobar.com and to retrieve a document identified by marketing/brochures/overview.html. Often you will see a URL given without any document specified.

· http://www.foobar.com

This URL tells the browser to contact machine www.foobar.com and fetch a default document. By default, this document is a file named index.html; however, the name of the default file can be configured at the server.

FTP

HTTP is the most common protocol used on the web, but it is not the only one. Many other protocols, including FTP and telnet, are supported by web browsers. Traditionally FTP was invoked from the UNIX System command line and was used by interactively entering a series of commands such as ls or dir to display directories, cd to move around the directory hierarchy, and get and put to transfer files. Because of the convenience of the point-and-click interface of web browsers, many people have completely abandoned the command-line interface and use only browsers for access to anonymous FTP servers.

This is an example of a URL for an anonymous FTP reference:

· ftp://ftp.foobar.com

This instructs the browser to contact machine ftp.foobar.com using the FTP protocol. The browser logs in with the login name anonymous and supplies the user’s login name and machine name in the form of a mailing address as a password. Because the preceding reference does not indicate a specific resource, the home directory for anonymous transfers is displayed. This usually looks like this:

bin

etc

incoming

pub

All of these entries are directory names. All but pub are used for administrative purposes for anonymous FTP service. The pub directory contains the files offered for anonymous access by foobar, or additional directories that lead to them. Clicking “pub” will display the contents of that directory Clicking any directory is equivalent to the UNIX or Windows cd (change directory) command followed by an ls (UNIX) or dir(DOS) to display the contents of the directory After you have located the name of the desired file, click the filename to transfer it to your machine. Depending on the browser and file type, the file may be displayed directly by the browser. Otherwise, the browser may invoke a helper application or viewer to display the file, or you may be prompted to confirm that the file should be saved and to supply the filename or an alternate filename.

A URL can also supply a full description of a file resource as shown here:

· ftp://ftp.foobar.com/pub/drivers/prod1.tar.z

When selected (clicked), the file prod1.tar.Z will be transferred immediately without any intermediate directory display or further file selection from a directory list.

Browsers may also use the FTP protocol for nonanonymous FTP service, although this is much less common and is generally a bad idea. A URL of the form

· ftp://bill:letmein@foobar.com/work/src/proj1/pl.c

causes the browser to log in to foobar.com using the name “bill” and supplying the password “letmein.” This is not a good idea because anyone reading your page can obtain your password (the rest should be obvious). Alternatively, you could omit the password as shown here:

· ftp://bill@foobar.com/work/src/proj1/pl.c

The browser will prompt the user for a password, which must be correctly supplied before the server will return the document. This may be handy for quickly viewing one of your own files from a remote location but is of dubious value for general, public use.

Telnet

There are circumstances where the author of a page may wish to indicate a link to an interactive, character-based service. Entering the URL

telnet://foobar.com

instructs the browser to invoke a telnet helper application (if one is available and the browser is configured to use it) and pass to it the machine name so that a telnet session is established with foobar.com.Although this accomplishes nothing more than the user would by invoking telnet directly, it does simplify the process by passing the machine name to telnet and by making telnet available with a single mouse click from within the browser.

The username and even a password may also be included as shown here:

telnet: //bill@f oobar. com

telnet: //bill: letmein@f oobar. com

Netnews

A link to a netnews group or article is specified in your browser using a URL of the form

news:group_name

news:article_number

Using NNTP (Network News Transfer Protocol), the browser contacts an NNTP server and obtains some or all of the articles in the newsgroup “group_name” or just one article indicated by the numeric “article_number.”

The NNTP server supporting netnews in your organization is identified to the browser using an option menu or environment variable.

Mailto

On the web most information flows out from the servers to the users. Although most servers maintain a log of requests that shows who accessed what pages, there is rarely any other feedback from users. The mailto URL, shown here, makes it easy for users to communicate back to the authors of web pages:

mailto:bill@foobar.com

When this URL is selected, the browser will display a mail dialog box. The user types a message, which is sent to the mail address indicated by the URL. It is a thoughtful touch to include a mailto URL on your home page to make it convenient for your readers to send you comments. These can be a source of valuable feedback.

Personal URLs

Web pages are stored on the server in a directory hierarchy that is rooted in a directory indicated to the server in a configuration file. For security reasons this tree is usually writable only by the system administrator. On systems shared by multiple users where the users do not have administrator privileges, this makes it difficult for individual users to create and maintain their own web pages. Administrators quickly tire of requests to update files in the browser page database. The solution for this problem is the personal URL, shown here:

· http://www.foobar.com/~wrw/my_home_page.html

This instructs the server to obtain a document named my_home_page.html from a directory associated with user wrw. This directory is usually named public_html and is located in the user’s home directory We will have more to say about this later.

An Abstract Look at URLs

In the abstract, a URL is defined this way:

scheme:scheme-specific-data

where scheme is one of these:

scheme:

http

https (secure http)

ftp

gopher

mailto

news

telnet

wais

(Scheme can also be one of several others that are not frequently encountered.) The scheme-specific-data is a description of a resource or action to perform that is specific to the named scheme such as http or ftp.

Although the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the initial part of the scheme-specific-data:

scheme-specific-data:

//user: password@host: port

//user:password@host:port/url-path

This initial part starts with a double slash (//) to indicate its presence and continues until the following slash (/), if any Other elements are

§ user An optional user name. Some schemes (e.g., FTP) allow the specification of a user name.

§ password An optional password. If present, it follows the user name, separated from it by a colon.

§ host The fully qualified domain name of a network host, or its IP address as four sets of decimal digits separated by periods.

§ port The optional port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon.

§ url-path The rest of the URL consists of data specific to the scheme, and is known as the url-path. It supplies the details of how the specified resource can be accessed. The slash (/) between the host (or port) and the url-path is not part of the url-path.

A Few Formalities

We have not been particularly rigorous in this chapter regarding the term URL or the distinction between a URL path and ordinary files. You should at least be aware of some details in the formal definition of the structure of URLs.

We have used the term URL loosely to mean any identifier for resources on the web. You may encounter two other terms, URI and URN, which, together with URL, have more formal definitions. URI, for Uniform Resource Identifier, is the general term encompassing both URLs and URNs. URL, for Uniform Resource Locator, specifies the “address” of a resource, whereas URN, for Uniform Resource Name, specifies the “name” of a resource. The distinction between them relates to the notion of persistence. A URN has greater persistence than a URL-that is, the URN identifying a document will remain constant even though the physical location, as described by the URL, changes. Through an as-yet-unspecified mechanism, a URN is automatically mapped to a URL.

Because the original implementations of web software were developed on the UNIX system, it is not surprising that the “url-path” looks like a UNIX file path specification. It is particularly easy for UNIX users to fall into the trap of thinking of the “url-path” as a filename. This is not always the case for three reasons. First, some web servers (see Chapter 16) have a mechanism for mapping an arbitrary “url-path” to an arbitrary file. This is useful when server administrators wish to present a document structure or hierarchy to the public that differs from the actual structure as stored on disk. It also makes it possible to maintain a fixed public structure while the internal structure changes (for any of the reasons things change on computer systems). Second, the machine running the web server may not even be running a UNIX variant and the file structure and naming syntax may be quite different from that of UNIX. The obvious example is the case of a web server running on a Windows operating system. As a minimum, the server must translate the forward slashes in the URL to the backslashes used by DOS and add a drive specification such as “C:” or “D:”. Finally, the link may be to an application, not a document page at all. Such is the example in a web link that points to a CGI-BIN script (see Chapter 27).

Summary

In this chapter you learned about the Internet and many different Internet services for finding information and communicating with others. In particular, you learned about Internet addresses, the common naming convention for computers on the Internet. You learned how to read netnews articles and how to post news articles to netnews, the heavily used electronic bulletin board on the Internet. You learned about mailing lists, including how to subscribe to them. You also learned about the Internet Relay Chat, which is a text-based chat line on the Internet and about Instant Messaging. You learned about the World Wide Web, that vast network of documents distributed around the world. You learned how to use and configure a web browser to help you get started. Finally you learned about documents on the web.

How to Find Out More

You may find the following books particularly useful in understanding the Internet:

· Bird, Linda. The Complete Guide to Understanding and Using the Internet. Upper Saddle River, NJ: Prentice Hall, 2003.

· Comer, Douglas E. The Internet Book. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 2000.

· Gralla, Preston. How the Internet Works. 2nd ed. Indianapolis, Que, 2001.

· Underdahl, Brian, and Edward C. Willett. Internet Bible. New York: John Wiley & Sons, 2000.

· Young, Margaret Levine. Internet: The Complete Reference. 2nd ed. Berkeley, CA: McGraw-Hill/Osborne, 2002.

· To learn more about Usenet and Netnews, consult

· Spencer, Henry, and David Lawrence. Managing Usenet. Newton, MA: O’Reilly and Associates, 1998.

· There are some books about the Internet Relay Chat that you may want to consult:

· Harris, Stuart, The IRC Survival Guide. Reading, MA: Addison-Wesley, 1995.

· Powers, James, IRC & Online Chat. Grand Rapids, MI: Abacus, 1997.

· Toyer, Kathryn. Learn Advanced Internet Relay Chat. Plano, TX: Wordware, 1998.

· The IRC Help web page at http://www.irchelp.org/irchelp/ is a useful web site for resources about the Internet Relay Chat. The newsgroups alt.irc and alt.irc.ircii may also be of interest.