Program examples compiled using Visual C++ 6.0 compiler on Windows XP Pro machine with Service Pack 2 and some screen snapshot Figures have been taken on Windows 2000 server. Topics and sub topics for this tutorial are listed below. Don’t forget to read Tenouk’s small disclaimer. Supplementary item is WEBSITE.
TCP/IP, Winsock, and WinInet
As a C++ programmer, you're going to be asked to do more than create Web pages. You'll be the one who makes the Internet reach its true potential and who creates distributed applications that haven't even been imagined yet. To be successful, you'll have to understand how the Internet works and how to write programs that can access other computers on the Internet.
In this section, you'll start with a primer on the Transmission Control Protocol/Internet Protocol (TCP/IP) that's used throughout the Internet, and then you'll move up one level to see the workings of HyperText Transport Protocol (HTTP). Then it's time to get something running. You'll assemble your own intranet (a local version of the Internet) and study an HTTP client-server program based on Winsock, the fundamental API for TCP/IP in Windows. Finally you'll move on to WinInet, which is a higher level API than Winsock and part of Microsoft's ActiveX technology.
To COM or Not to COM
Surely you've read about ActiveX Controls for the Internet. You've probably encountered concepts such as composite monikers and anti-monikers, which are part of the Microsoft Component Object Model (COM). If you were overwhelmed, don't worry, it's possible to program for the Internet without COM, and that's a good place to start. This module and the next module are mostly COM-free. In Module 34, you'll be writing a COM-based ActiveX document server, but MFC effectively hides the COM details so you can concentrate on Winsock and WinInet programming. It's not that ActiveX controls aren't important, but we can't do them justice in this book. We'll defer to Adam Denning's book on this subject, ActiveX Controls Inside Out (Microsoft Press, 1997). Your study of this book's COM material and Internet material will prepare you well for Adam's book.
You can't write a good Winsock program without understanding the concept of a socket, which is used to send and receive packets of data across the network. To fully understand sockets, you need a thorough knowledge of the underlying Internet protocols. This section contains a concentrated dose of Internet theory. It should be enough to get you going, but you might want to refer to one of the TCP/IP textbooks if you want more solid theory or you can find the info at Linux Socket (C code), a complete story of the TCP/IP and OSI, down to the packet level, RAW socket with working program examples which available at Tenouk.com. The latest Windows socket implementation in the .NET (with C#, VB .NET and C++/CLI code samples) can be found in the Windows Networking using the .NET. The native Winsock implementation on Windows platforms can be found in the Win32 Winsock (C code).
Network Protocols: Layering
All networks use layering for their transmission protocols, and the collection of layers is often called a stack. The application program talks to the top layer and the bottom layer talks to the network. Figure 1 shows you the stack for a local area network (LAN) running TCP/IP. Each layer is logically connected to the corresponding layer at the other end of the communications channel. The server program, as shown at the right in Figure 1, continuously listens on one end of the channel, while the client program, as shown on the left, periodically connects with the server to exchange data. Think of the server as an HTTP-based World Wide Web server, and think of the client as a browser program running on your computer.
Figure 1: The stack for a LAN running TCP/IP.
You can get more information of the Internet Protocol suite and other standards discussed in this module at RFC Editor.
The Internet Protocol
The Internet Protocol (IP) layer is the best place to start in your quest to understand TCP/IP. The IP protocol defines packets called datagrams that are fundamental units of Internet communication. These packets, typically less than 1000 bytes in length, go bouncing all over the world when you open a Web page, download a file, or send e-mail. Figure 2 shows a simplified layout of an IP datagram.
Notice that the IP datagram contains 32-bit addresses for both the source and destination computers. These IP addresses uniquely identify computers on the Internet and are used by routers (specialized computers that act like telephone switches, Layer 3 of the TCP/IP stack) to direct the individual datagrams to their destinations. The routers don't care about what's inside the datagrams, they're only interested in that datagram's destination address and total length. Their job is to resend the datagram as quickly as possible.
The IP layer doesn't tell the sending program whether a datagram has successfully reached its destination. That's a job for the next layer up the stack, Transmission Control Protocol, TCP. The receiving program can look only at the checksum to determine whether the IP datagram header was corrupted or not.
Figure 2: A simple IP datagram layout.
The User Datagram Protocol - UDP
The TCP/IP protocol should really be called TCP/UDP/IP because it includes the User Datagram Protocol (UDP), which is a peer of TCP. All IP-based transport protocols store their own headers and data inside the IP data block. First let's look at the UDP layout in Figure 3.
Figure 3: A simple UDP layout.
A complete UDP/IP datagram is shown in Figure 4.
Figure 4: The relationship between the IP datagram and the UDP datagram.
UDP is only a small step up from IP, but applications never use IP directly. Like IP, UDP doesn't tell the sender when the datagram has arrived. That's up to the application. The sender could, for example, require that the receiver send a response, and the sender could retransmit the datagram if the response didn't arrive within, say, 20 seconds. UDP is good for simple one-shot messages and is used by the Internet Domain Name System (DNS), which is explained later in this module. UDP is used for transmitting live audio and video, for which some lost or out-of-sequence data is not a big problem.
Figure 3 shows that the UDP header does convey some additional information, namely the source and destination port numbers. The application programs on each end use these 16-bit numbers. For example, a client program might send a datagram addressed to port 1700 on the server. The server program is listening for any datagram that includes 1700 in its destination port number, and when the server finds one, it can respond by sending another datagram back to the client, which is listening for a datagram that includes 1701 in its destination port number.
IP Address Format: Network Byte Order
You know that IP addresses are 32-bits long. You might think that 232 (more than 4 billion) uniquely addressed computers could exist on the Internet, but that's not true. Part of the address identifies the LAN on which the host computer is located, and part of it identifies the host computer within the network. Most IP addresses are Class C addresses, version 4 (IPv4) which are formatted as shown in Figure 5.
Figure 5: The layout of a Class C IP address.
This means that slightly more than 2 million networks can exist, and each of those networks can have 28 (256) addressable host computers. The Class A and Class B IP addresses, which allow more host computers on a network, are all used up. The Internet "powers-that-be" have recognized the shortage of IP addresses, so they have proposed a new standard, the IP Next Generation (IPng) protocol or IPv6. IPng defines a new IP datagram format that uses 128-bit addresses instead of 32-bit addresses. With IPng, you'll be able, for example, to assign a unique Internet address to each light switch in your house, so you can switch off your bedroom light from your portable computer from anywhere in the world. IPng already implemented in new computer, network and electronics devices.
By convention, IP addresses are written in dotted-decimal format. The four parts of the address refer to the individual byte values. An example of a Class C IP address is 126.96.36.199. In a computer with an Intel CPU, the address bytes are stored low-order-to-the-left, in so-called little-endian order. In most other computers, including the UNIX machines that first supported the Internet, bytes are stored high-order-to-the-left, in big-endian order. Because the Internet imposes a machine-independent standard for data interchange, all multibyte numbers must be transmitted in big-endian order. This means that programs running on Intel-based machines must convert between network byte order (big-endian) and host byte order (little-endian). This rule applies to 2-byte port numbers as well as to 4-byte IP addresses.
The Transmission Control Protocol - TCP
You've learned about the limitations of UDP. What you really need is a protocol that supports error-free transmission of large blocks of data. Obviously, you want the receiving program to be able to reassemble the bytes in the exact sequence in which they are transmitted, even though the individual datagrams might arrive in the wrong sequence. TCP is that protocol, and it's the principal transport protocol for all Internet applications, including HTTP and File Transfer Protocol (FTP). Figure 6 shows the layout of a TCP segment. (It's not called a datagram.) The TCP segment fits inside an IP datagram, as shown in Figure 7.
Figure 6: A simple layout of a TCP segment.
Figure 7: The relationship between an IP datagram and a TCP segment.
The TCP protocol establishes a full-duplex, point-to-point connection between two computers, and a program at each end of this connection uses its own port. The combination of an IP address and a port number is called a socket. The connection is first established with a three-way handshake. The initiating program sends a segment with the SYN flag set, the responding program sends a segment with both the SYN and ACK flags set, and then the initiating program sends a segment with the ACK flag set.
After the connection is established, each program can send a stream of bytes to the other program. TCP uses the sequence number fields together with ACK flags to control this flow of bytes. The sending program doesn't wait for each segment to be acknowledged but instead sends a number of segments together and then waits for the first acknowledgment. If the receiving program has data to send back to the sending program, it can piggyback its acknowledgment and outbound data together in the same segments.
The sending program's sequence numbers are not segment indexes but rather indexes into the byte stream. The receiving program sends back the sequence numbers (in the acknowledgment number field) to the sending program, thereby ensuring that all bytes are received and assembled in sequence. The sending program resends unacknowledged segments.
Each program closes its end of the TCP connection by sending a segment with the FIN flag set, which must be acknowledged by the program on the other end. A program can no longer receive bytes on a connection that has been closed by the program on the other end.
Don't worry about the complexity of the TCP protocol. The Winsock and WinInet APIs hide most of the details, so you don't have to worry about ACK flags and sequence numbers. Your program calls a function to transmit a block of data, and Windows takes care of splitting the block into segments and stuffing them inside IP datagrams. Windows also takes care of delivering the bytes on the receiving end, but that gets tricky, as you'll see later in this module.
The Domain Name System
When you surf the Web, you don't use IP addresses. Instead, you use human-friendly names like microsoft.com or www.cnn.com. A significant portion of Internet resources is consumed when host names (such as microsoft.com) are translated into IP addresses that TCP/IP can use. A distributed network of name server (domain server) computers performs this translation by processing DNS queries. The entire Internet namespace is organized into domains, starting with an unnamed root domain. Under the root is a series of top-level domains (TLDs) such as com, edu, gov, biz, info and org. Do not confuse Internet domains with Microsoft Windows NT domains. The latter are logical groups of networked computers that share a common security database.
Servers and Domain Names
Let's look at the server end first. Suppose a company named SlowSoft has two host computers connected to the Internet, one for World Wide Web (WWW) service and the other for FTP service. By convention, these host computers are named www.slowsoft.com and ftp.slowsoft.com, respectively, and both are members of the second-level domain slowsoft, which SlowSoft has registered with an organization called InterNIC or other delegated domain name registrars. (INTERNIC.)
Now SlowSoft must designate two (or more) host computers as its name servers. The name servers for the com domain each have a database entry (zone record) for the slowsoft domain, and that entry contains the names and IP addresses of SlowSoft's two name servers. Each of the two slowsoft name servers has database entries for both of SlowSoft's host computers. These servers might also have database entries for hosts in other domains, and they might have entries for name servers in third-level domains. Thus, if a name server can't provide a host's IP address directly, it can redirect the query to a lower-level name server. Figure 34-8 illustrates SlowSoft's domain configuration.
A top-level name server runs on its own host computer. InterNIC manages (at last count) nine computers that serve the root domain and top-level domains (root-servers.org). Lower-level name servers could be programs running on host computers anywhere on the Net. SlowSoft's Internet service provider (ISP), ExpensiveNet, can furnish one of SlowSoft's name servers. If the ISP is running Windows NT Server, the name server is usually the DNS program that comes bundled with the operating system. That name server might be designated ns1.expensivenet.com. Unix/Linux system will normally use BIND program for the name server.
Clients and Domain Names
Now for the client side. A user types http://www.slowsoft.com in the browser. The http:// prefix tells the browser to use the HTTP protocol when it eventually finds the host computer. The browser must then resolve www.slowsoft.com into an IP address, so it uses TCP/IP to send a DNS query to the default gateway IP address for which TCP/IP is configured at the client machine as shown below.
Figure 8: The TCP/IP settings of the network card.
This default gateway address identifies a local name server, which might have the needed host IP address in its cache. If not, the local name server relays the DNS query up to one of the root name servers. The root server looks up slowsoft in its database and sends the query back down to one of SlowSoft's designated name servers. In the process, the IP address for www.slowsoft.com will be cached for later use if it was not cached already. If you want to go the other way, name servers are also capable of converting an IP address to a name.
Figure 9: SlowSoft's domain configuration.
You're going to be doing some Winsock programming soon, but just sending raw byte streams back and forth isn't very interesting. You need to use a higher-level protocol in order to be compatible with existing Internet servers and browsers. HTTP is a good place to start because it's the protocol of the popular World Wide Web and it's relatively simple.
HTTP is built on TCP, and this is the way it works: First a server program listens on the default port 80. Then some client program (typically a browser) connects to the server (www.slowsoft.com, in this case) after receiving the server's IP address from a name server. Using its own port number, the client sets up a two-way TCP connection to the server. As soon as the connection is established, the client sends a request to the server, which might look something like this:
GET /customers/newproducts.html HTTP/1.0
The server identifies the request as a GET, the most common type, and it concludes that the client wants a file named newproducts.html that's located in a server directory known as /customers (which might or might not be \customers on the server's hard disk). Immediately following are request headers, which mostly describe the client's capabilities.
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/x
UA-OS: Windows NT
User-Agent: Mozilla/2.0 (compatible; MSIE 3.0; AK; Windows NT)
If-Modified-Since: Wed, 26 Mar 2005 20:23:04 GMT
The If-Modified-Since header tells the server not to bother to transmit newproducts.html unless the file has been modified since March 26, 2005. This implies that the browser already has a dated copy of this file stored in its cache. The blank line at the end of the request is crucial; it provides the only way for the server to tell that it is time to stop receiving and start transmitting, and that's because the TCP connection stays open. Now the server springs into action. It sends newproducts.html, but first it sends an OK response:
HTTP/1.0 200 OK
Immediately followed by some response header lines:
Date: Thu, 03 Mar 2005 17:33:12 GMT
Last-Modified: Wed, Mar 26 2005 20:23:04 GMT
The contents of newproducts.html immediately follow the blank line:
<head><title>SlowSoft's New Products</title></head>
<h1><center>Welcome to SlowSoft's New Products List
Unfortunately, budget constraints have prevented SlowSoft from
introducing any new products this year. We suggest you keep
enjoying the old products.<p>
<a href="default.htm">SlowSoft's Home Page</a><p>
You're looking at elementary HyperText Markup Language (HTML) code here, and the resulting Web page won't win any prizes. We won't go into details because dozens of HTML books are already available. From these books, you'll learn that HTML tags are contained in angle brackets and that there's often an "end" tag (with a / character) for every "start" tag. Some tags, such as <a> (hypertext anchor), have attributes. In the example above, the line:
<a href="default.htm">SlowSoft's Home Page</a><p>
creates a link to another HTML file. The user clicks on "SlowSoft's Home Page," and the browser requests default.htm from the original server.
Actually, newproducts.html references two server files, default.htm and /images/clouds.jpg. The clouds.jpg file is a JPEG file that contains a background picture for the page. The browser downloads each of these files as a separate transaction, establishing and closing a separate TCP connection each time. The server just dishes out files to any client that asks for them. In this case, the server doesn't know or care whether the same client requested newproducts.html and clouds.jpg. To the server, clients are simply IP addresses and port numbers. In fact, the port number is different for each request from a client. For example, if ten of your company's programmers are surfing the Web via your company's proxy server (more on proxy servers later), the server sees the same IP address for each client.
Web pages use two dominant graphics formats, GIF and JPEG. GIF files are compressed images that retain all the detail of the original uncompressed image but are usually limited to 256 colors. They support transparent regions and animation. JPEG files are smaller, but they don't carry all the detail of the original file. GIF files are often used for small images such as buttons, and JPEG files are often used for photographic images for which detail is not critical. Visual C++ can read, write, and convert both GIF and JPEG files, but the Win32 API cannot handle these formats unless you supply a special compression/decompression module. There are other formats as well such as PNG. The HTTP standard includes a PUT request type (together with the GET and POST) that enables a client program to upload a file to the server. Client programs and server programs seldom implement PUT.
The File Transfer Protocol handles the uploading and downloading of server files plus directory navigation and browsing. A Windows command-line program called ftp (it doesn't work through a Web proxy server) lets you connect to an FTP server using UNIX-like keyboard commands. Browser programs also usually support the FTP protocol in a more user-friendly manner. Normally, you can protect an FTP server's directories with a user-name/password combination, but both strings are passed over the Internet as clear text. Nowadays we have a dedicated ftp client programs such as gFtp, cuteFtp etc. and the connection can be secured one. FTP is based on TCP. Two separate connections are established between the client and server, one for control and one for data
Internet vs. Intranet
Up to now, we've been assuming that client and server computers were connected to the worldwide Internet. The fact is you can run exactly the same client and server software on a local intranet. An intranet is often implemented on a company's LAN and is used for distributed applications. Users see the familiar browser interface at their client computers, and server computers supply simple Web-like pages or do complex data processing in response to user input.
An intranet offers a lot of flexibility. If, for example, you know that all your computers are Intel-based, you can use ActiveX controls and ActiveX document servers that provide ActiveX document support. If necessary, your server and client computers can run custom TCP/IP software that allows communication beyond HTTP and FTP. To secure your company's data, you can separate your intranet completely from the Internet or you can connect it through a firewall, which is another name for a proxy server.
Continue on next Module...
Further reading and digging:
DCOM at MSDN.
COM+ at MSDN.
COM at MSDN.
Win32 process, thread and synchronization story can be found starting from Module R.
MSDN MFC 9.0 class library online documentation - latest version.