Mathemathinking: 2013

Friday, July 5, 2013

Oceanic algae blooms

Check out these photos of the overgrowth of algae on a beach in China. There are two main probable causes.

Pollution. Algae are autotrophs, which means that they synthesize their own carbohydrates, fats, and proteins from carbon dioxide and more basic chemical substances in their environment. In general, an algae population will keep growing until its resources become limited or a predator keeps it in check. Under normal conditions, the ocean water does not have enough nutrients, such as phosphorous, to sustain a large growth rate of algae. The resources are thus limited.

The fertilizers that farmers use for crops, which are high in phosphorous content, can "run off" into a river or directly into the ocean. Similarly, a chemical plant may pollute a river or the ocean with a substance high in phosphorous. In both cases, the pollution serves as a source of the limiting nutrient that the algae needed to grow, and an algae bloom unfolds.

Higher water temperatures. There are optimal temperatures at which algae grow. In colder waters, the rate of algal growth is limited. At warmer water temperatures, however, the conditions are favorable for higher algae growth rates.

Trees come from air and environmental implications

"People look at a tree and think it comes out of the ground," but "trees come from air."
-Richard Feynman

A tree is around 50% carbon atoms by mass, and these carbon atoms come from carbon dioxide (CO₂) in the air. Carbohydrates, which form most of the substance of the tree, are formed by breaking a carbon-oxygen bond in CO₂ and combining it with water, which condensed from clouds in the air to form rainfall.

Since, energetically, carbon atoms would prefer to stay as CO₂ instead of reside in carbohydrates, it takes energy to rip apart a C-O bond. Trees get the energy to do this synthesis of carbohydrates from the sunlight (photons)-- hence photosynthesis. In the process, oxygen (O₂) is released.

When we put a log in the fireplace and provide heat to kickstart the reverse reaction, the oxygen from the air grabs the carbon atoms back from the log to make CO₂ and water again (combustion). Because carbon loves to be in CO₂ the fire spontaneously carries on. The extra energy it took to break the C-O bonds in the first place is released as light and heat. In a sense, the sunlight is being emitted back out to complete the balanced cycle.

Famous physicist Richard Feynman explains this in such a riveting way:

What does this mean for the environment? Around 45% of our CO₂ emissions are from burning fossil fuels. But a sizeable portion, 17%, is from deforestation. [1] When we clear land for agriculture or for buildings and burn the trees, CO₂ that was once stored in the trees gets released back into the atmosphere to cause global warming. One action we can take is minimize deforestation to prevent further release of CO₂ from the incumbent trees.

A rotting tree releases CO2 back into the atmosphere.

So carbon dioxide is food for a growing tree, and, as a tree grows, it acts as a carbon sink since it takes carbon dioxide out of the air and stores it in its trunk. Planting a new tree can thus offset some of our CO₂ emissions. But, given that we plant a new forest on a piece of land, how much of an impact can we make? A square meter of tree cover can sequester 0.306 kg of carbon per year. [2] The average passenger vehicle in the US consumes roughly 1300 kg of carbon per year. [3] This means that, to offset the CO₂ emissions from one vehicle, one would need to maintain 4,250 m² of growing trees. For comparison, an American football field is 5,300 m².

I added the word 'growing' in front of 'trees' in my discussion above. Actually, a mature forest does not absorb much CO_2. When trees die, fall over, and rot-- a natural process in a mature forest-- micro-organisms decompose the rotting tree, releasing the CO₂ once stored in the tree trunk back into the atmosphere. A mature forest is in a kind of equilibrium, where new trees can grow and sequester CO₂ only to take the place of an older, fallen tree which is emitting CO₂.

Therefore, to make a substantial impact on offsetting anthropogenic CO₂ emissions, we must plant new forests while maintaining the ones we have. That is, we can't count on the mature forest land we have today to keep working hard to eventually absorb all of the the CO₂ that we emit. One way to retain the structure of a tree that has been cut down is by turning it into lumber. This helps perpetuate it as a carbon sink. However, keep in mind that a tree must be transported and processed to be turned into lumber. This takes energy and releases more carbon. Only if this carbon is less than that stored in lumber is this a net negative CO₂ emitting process.

[1] http://www.epa.gov/climatechange/ghgemissions/global.html
[2] Nowak et. al. Carbon storage and sequestration by trees in urban and community
areas of the United States. 2013.
[3] http://www.epa.gov/cleanenergy/energy-resources/refs.html

Monday, May 20, 2013

How do airlines choose by how many customers to overbook flights?

A few years ago, I was returning home from a trip to the Florida Keys, which required two layovers. After my first flight, the airline announced that the next flight was overbooked. A \$500 voucher would be awarded to the costumer that relinquishes his or her seat. Since this was the beginning of my lazy summer before I started graduate school, I jumped at the opportunity and took the \$500 voucher and free hotel room for the night.

Overselling or overbooking is the sale of a volatile good or service in excess of actual capacity. -Wikipedia.

For the next year, the voucher rotted in my inbox until it expired, as I didn't take the opportunity to fly with that airline again. While airlines likely count on a fraction of the vouchers to expire, overbooking can maximize profits even when customers are payed off with these pricey vouchers and hotel rooms.

Consider that a fraction of flyers do not show up in time for their flights due to a delay in their preceding connection flight or to personal circumstances. In anticipation of this, airlines overbook the plane (sell more tickets than capacity) and hope that just the right amount of customers show up to get a full plane.

Let's assume that an airline gives full refunds for flights missed due to personal circumstances, or equivalently for the math, that all missed flights are due to delays in preceding connection flights. Of course, airlines do not charge twice when a customer misses a connection because of a preceding delay and takes the next flight out. With this, an airline receives revenue from a passenger equal to the ticket price only when he or she actually boards the flight. Here, each empty seat is lucidly lost revenue: if the seat is empty, the airline does not receive the revenue from the ticket sale.

Overbooking makes it likely that a flight is full of passengers so the airline receives the most amount of income (seat capacity * ticket price). But, if the airline overbooks too much, it must fork out costly vouchers and hotel rooms to the passengers that get bumped from the flight and give them a seat on another plane, potentially perpetuating the cycle and, most importantly, decreasing the revenue. Obviously, if the airline overbooks too many flights, it is just giving out vouchers. Somewhere in between is the sweet spot that maximizes revenue.

Let's put ourselves in the place of the airline and say the cost (airline voucher + hotel room + ticket for next available flight + lost customer loyalty) of bumping a passenger is \$800, and we have a 100-seat plane that flies from SFO --> ORD at a ticket price of \$250**. By how many seats should we overbook the plane on this route?

Data-driven decisions. Out of the thousands of SFO --> ORD flights over the past ten years, our airline company knows:
the total number of airline seat tickets sold: A
the number of these A customers that actually showed up on time for the flight: B
Given a random customer, the probability that he or she will show up for their flight is thus p=B/A. We will use $p=0.9$, close to this source that reports 7-8% of customers are no-shows.

We can treat the event that a customer boards the flight as being independent of the other passengers boarding*** and occurring with probability $p$. Our goal is to find the number of tickets beyond capacity that we should sell, which we call $x$. The number of customers $N$ that show up for their flight on the 100-seat plane is thus a binomial random variable with $100+x$ trials and probability of success $p$:
$P(N=n)=\binom{100+x}{n} p^{n}(1-p)^{100+x-n}$.
The term $p^{n}(1-p)^{100+x-n}$ is the probability of a specific sequence of $n$ out of $100+x$ customers boarding their flight, whereas the term $\binom{100+x}{n}$ gives the number of combinations of such sequences (we don't care which of the customers show up-- just whether they do or not!).

One approach might be to choose $x$ such that the expected value of $N$ is equal to the number of seats so that just the right amount of customers show up in the long run:
$E(N)=(100+x)p=100$.

This approach is short-sighted since it does not take into account the cost of the airline ticket or the voucher award. For example, if the airline gives out \$1 million vouchers to overbooked customers, the airline wouldn't overbook at all.

A better approach is to find a formula for the expected value of the revenue of this flight with our policy of overbooking by $x$ customers and plot the expected revenue as a function of $x$ to see which $x$ maximizes revenue. The revenue $r=r(n)$ depends on the number of passengers $n$ out of $100+x$ ticket purchasers that actually show up. We get income from each person that boards the plane and lose income from each person we bump off of the plane in the case that we are over capacity ($n>100$):
$r(n) = 250n$ if $n<100$ [if less than 100 show, we get \$250 for each passenger that shows, and we don't lose any revenue since no customers were bumped.]
$r(n) = (250)(100) - 800(n-100)$ if $n\ge 100$ [if more than 100 show, we get \$250 only for first 100 passengers, and we lose \$800 for each of the $(n-100)$ customers that were bumped.].

Now, the revenue that we expect to make, given an overbooking policy:
$E($revenue $|x)=\displaystyle \sum _{n=0}^{100+x} P(N=n$ $| x) r(n)$.
The $P(N=n)$ is given by the binomial$(100+x,p)$ distribution given a few lines above. Since we are more likely to get a full plane with increasing overbooking $x$, we get more and more likely to get the maximum possible income \$(250)(100) from the flight as $x$ increases. On the other hand, we are more and more likely to go over a full plane as $x$ increases, and the \$800 cost of bumping passengers starts to erode our revenue stream.

Using the normal approximation to the binomial distribution (with a continuity correction), I plot the expected revenue as a function of overbooking $x$ in the graph below. There are a number of remarks from this plot that aid our intuition.

During a full flight, the revenue would be \$250(100 seats)=\$25000, the upper y-limit on this graph. Note that, in the long-run, we cannot expect to fill every airplane seat-- even if we choose a good $x$.
Selling 100 tickets for 100 seats ($x=0$) does not maximize the revenue. The maximum expected revenue occurs when we sell 109 tickets! That is, revenue is maximized when we oversell the flight 9 seats beyond capacity. [$x=9$ maximizes revenue, and is therefore the best choice.]
Beyond 109 seats, the revenue decreases because the cost of bumping customers (vouchers, getting the next flight, this customer will fly on a different airline in the future) outweighs the higher certainty of getting a full plane and getting income from 100 full seats. Eventually, when we overbook the plane by 46, the airline is expected to pay more for bumping passengers than it receives in ticket sales!

It should be clear why and how airlines choose to overbook flights to maximize their profits. Each empty seat is lost money, but the airline must weigh this against the risk of paying for vouchers and hotels for customers that couldn't fit on the full flight-- and the lost customer loyalty that ensues*.

This analysis considers only the revenue of the airline. However, there is an externality associated with bumping passengers. Think about how this passenger may lose out on one day of pay, how his or her employer loses out of one day of valuable work, and how the local ice cream shop loses out on one customer that would have otherwise taken his or her family out for ice cream that day.

* Lost customer loyalty was theoretically included in the "cost of bumping a customer" and the analysis holds.

** Ticket price changes with season! We can see how complicated this gets.

*** Realistically, airlines will have models that take into account customer demographics. Perhaps even customer-specific data: one with a history of missing flights can be assumed to be more likely to miss a flight again. Further, tickets sold in a group may be treated differently: e.g., a whole family buying a set of tickets vs. a single businessman. See this article for how complicated airline models realistically may be. An interesting factor is the airport from which one is flying. Think about it: leaving Las Vegas vs. Cleveland-- who is more likely to miss their flight?

Thursday, January 3, 2013

Basemap: Toolkit in Python for plotting data on maps

I found a cool package in Python for plotting data on maps. It's called Basemap, a toolkit under Matplotlib. The example gallery shows off some of its capabilities for meteorology/climatology data and the images you see on the small screen on the back of the seat in front of you on an airplane.

As another example, I plotted the location of the 15 most populous cities in the United States and made the size of the solid circle proportional to the population. Interestingly, more than half of the 15 most populous cities are in California and Texas. The output:

The code is below. I put the data (population, latitude, longitude) for each of the cities in a dictionary. Using some basemap commands, I plotted a map of the US. The scatter object then plots each city on the map with a red, solid circle whose size is proportional to the population of the city.

states.py import pylab as plt
from mpl_toolkits.basemap import Basemap
plt.close('all')

# Data of city location (logitude,latitude) and population
pop={'New York':8244910,
'Los Angeles':3819702,
'Chicago':2707120,
'Houston':2145146,
'Philadelphia':1536471,
'Pheonix':1469471,
'San Antonio':1359758,
'San Diego':1326179,
'Dallas':1223229,
'San Jose':967487,
'Jacksonville':827908,
'Indianapolis':827908,
'Austin':820611,
'San Francisco':812826,
'Columbus':797434} # dictionary of the populations of each city

lat={'New York':40.6643,
'Los Angeles':34.0194,
'Chicago':41.8376,
'Houston':29.7805,
'Philadelphia':40.0094,
'Pheonix':33.5722,
'San Antonio':29.4724,
'San Diego':32.8153,
'Dallas':32.7942,
'San Jose':37.2969,
'Jacksonville':30.3370,
'Indianapolis':39.7767,
'Austin':30.3072,
'San Francisco':37.7750,
'Columbus':39.9848} # dictionary of the latitudes of each city

lon={'New York':73.9385,
'Los Angeles':118.4108,
'Chicago':87.6818,
'Houston':95.3863,
'Philadelphia':75.1333,
'Pheonix':112.0880,
'San Antonio':98.5251,
'San Diego':117.1350,
'Dallas':96.7655,
'San Jose':121.8193,
'Jacksonville':81.6613,
'Indianapolis':86.1459,
'Austin':97.7560,
'San Francisco':122.4183,
'Columbus':82.9850} # dictionary of the longitudes of each city

m = Basemap(llcrnrlon=-119,llcrnrlat=22,urcrnrlon=-64,urcrnrlat=49,
projection='lcc',lat_1=33,lat_2=45,lon_0=-95,resolution='c')
m.drawcoastlines()
m.drawstates()
m.drawcountries()
max_size=80
for city in lon.keys():
x, y = m(-lon[city],lat[city])
m.scatter(x,y,max_size*pop[city]/pop['New York'],marker='o',color='r')
plt.show()

Mathemathinking