* This article is really just a collection of notes for my own use which are semi readable and summarised for other people to read.
How does the Internet Work? Its a pretty straight forward question, and one that I stumbled over in an interview a few months ago so I’m starting from scratch to put it together. Warning: this post will be remedial to some, its pretty much just some notes I put together.
Objective
The question I want to answer is:
What happens when I enter a URL in my browser and hit enter?
Part 1
Building Blocks
The internet is basically many computers communicating to each other using a set of languages and protocols. The languages and protocols and structure has been put in place to make the communications between computers efficient among other things such as secure etc.
The question of finding the correct computer to communicate with is the most immediate problem facing your browser/computer as soon as you enter that URL. But even making the request to find that correct computer is important. Forgetting the DNS lookup for now and diving straight into the TCP/IP stack…
How do computers communicate on the internet?
Your computer has a stack of protocols, the computer you want to talk to has a stack of protocols which match. Each layer on the stack takes care of a separate function independently of the other layers above and below it.
When you want to describe the stack in a broad sense refer to the TCP/IP stack. (4/5 layers)
If you want finer details about the layers in the stack refer to the OSI model. (7 layers)
TCP/IP Stack
Using the 4 layer model:
Application Protocol Layer [Application]
Transmission Control Protocol Layer (TCP) [Transport]
Internet Protocol Layer (IP) [Internet]
Hardware Layer [Network]
The message that your browser sends, say a request for a web page, will propagate down through this stack on your computer, across the internet, and up an identical looking stack on another computer.
Refer to:
Diagram
http://www.theshulers.com/whitepapers/internet_whitepaper/index.html#stack
http://en.wikipedia.org/wiki/TCP/IP_model#Layers_in_the_TCP.2FIP_model
Each layer in the stack performs a different function:
Application Layer:
Handles the protocols specific to applications on your computer, such as for your web browser (http), for your email client (SMTP) or your ftp program (FTP obviously).
The HTTP Protocol has standards of communicating which need to be adhered to, the application layer (your web browser) will take care to form its communications adhering to the correct standards. An easy way to see this in action is by installing the live http headers plugin for firefox, which will let you view the header request the browser makes and the response from the computer hosting the site you requested.
e.g:
http://www.last.fm/ GET / HTTP/1.1 Host: www.last.fm User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.0.5) Gecko/2008121621 Ubuntu/8.04 (hardy) Firefox/3.0.5 Ubiquity/0.1.4 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-gb,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 200 OK Date: Sat, 24 Jan 2009 15:50:21 GMT Server: Apache/1.3.39 (Unix) X-Proxy-Fix-Up: headers fixed up Set-Cookie: AnonTrack=f02a12391c7d1bf56a65972f61b33ae7; expires=Tue, 19-Jan-2010 15:50:21 GMT; path=/; domain=.last.fm X-Web-Node: www64 X-Cache: HIT Content-Encoding: gzip Vary: Accept-Encoding Connection: close Content-Type: text/html; charset=utf-8
After typing www.last.fm into my browser you can see the browser has made a GET request and the server hosting last.fm has returned a 200 OK response. The application layer on both computers have constructed these responses.
You can even see the actual application as Firefox on my machine and Apache on the server.
Once the request or response is made it is passed down the stack to the TCP layer.
TCP / Transport Layer
The TCP (or Transport) layer manages the transport of the messages constructed by the application layer. It ensures that the messages and data sent between the application layers on the two machines communicating:
- arrives in order
- does not contain errors
- does not include duplicates
- contains all the information sent
Not only does the TCP protocol manage the data transfer, it also specifies which port on the computer the application is listening to/broadcasting from.
TCP manages a lot and is important in order to ensure that the data transmitted gets to the destination without errors. TCP is a connection oriented protocol, which means that it establishes a connection with the destination server before sending data.
To establish this connection a 3-way handshake is used which starts with the TCP protocol layer on your computer sending a SYN packet:
SYN ->
<-SYN/ACK
ACK->
After each packet is acknowledged and the connection is established the data is sent. Once all the data is transmitted there is a similar 3-way handshake to terminate the connection.
To achieve all of the above the transport layer or TCP protocol in this instance adds about 20 bytes to the message that the application layer put together.
Obviously this procedure is a lot more complicated than what I have covered here.
Refer To:
http://www.faqs.org/docs/iptables/tcpconnections.html
http://www.winlab.rutgers.edu/~hongbol/tcpWeb/tcpTutorialNotes.html
http://en.wikipedia.org/wiki/SYN_(TCP)
IP Protocol / Internet layer
IP protocol is simply in charge of sending and routing packets to the remote computer, in any order – its job is not to manage the packets and rearrange them as TCP does.
IP protocol is unreliable and connectionless unlike the more reliable TCP. No handshake.
To ensure the packet is directed to the correct location the IP protocol adds 20 bytes of information to the header similar to the TCP protocol.

References:
Excellent diagram 1
Excellent diagram 2
Hardware
The final hardware layer is the lowest layer in the stack and is interchangeable with practically anything as the TCP/IP stack is designed to be hardware independent. My toaster runs a TCP/IP stack for instance.
The hardware or network layer is basically responsible solely for transmitting the data over a physical medium. It moves packets between the corresponding hardware layer interfaces on the two different machines. The processes of transmitting and receiving packets on a given link can be controlled both in the software device driver for the network card, as well as on firmware or specialized chipsets.
The TCP/IP model includes specifications of translating the network addressing methods used in the Internet Protocol to data link addressing, such as Media Access Control (MAC) address. If you log in to your router and have a look at the attached devices you will probably see MAC addresses listed next to the name of the device and the IP address assigned to the device. The MAC address is unique to every piece of hardware that has the ability to connect to the internet basically.
References:
Pulled a few sentences from here sorry wikipedia
Roundup
This was the first part of my notes on this subject and investigation into the question: What happens when I enter a URL in my browser and hit enter?
The next posts I need to cover DNS lookup which is the first thing the browser does when you hit enter on that URL and how messages transverse the interconnected routers and computers finding the correct machine that holds the answers. I may also cover the OSI stack which includes a bit more detail about the individual layers.
No related posts.



One Response
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
This information was really useful in the point of view how data is transmitted over tcp/ip stack .
I appreciate the efforts taken to detail the steps in precise manner .Look forward for other topics to enchance understanding of the topic.