Sunday, December 30, 2007

Carbon footprint of a banner ad? What the heck?

If somebody walks up to you in this holiday season and asks this question “How would you calculate the carbon footprint of a banner ad”; you should immediately check what he has been drinking. And if you are one of those adventurous types, order one of that for yourself. Because it has to be some really strong stuff that makes ones mind wander in such directions.

But this is precisely what Don Carli of Sustainable Advertising Partnership is trying to figure out. Carbon footprint, as most of you would be aware, is a measure of the impact of human activities on the environment. Don’s organization is trying to bring the advertisers and the supply chain (for all kinds of media, such as print, online, TV) together to come up with best practices for advertising; best practices that would address the challenges of sustainability and make the ads more “green”, so to speak.

Don has been involved in such initiatives for over four years now (at least, that’s what his LinkedIn profile tells me) and since it’s highly unlikely that he has been constantly inebriated all these years, you have to take his question a bit more seriously. Personally, I don’t think it’s important. If I’ve to make a list of all the stuff for which we need to reduce the carbon footprint, “banner ads” perhaps won’t figure among the first hundred billion entries in the list. But Don apparently belongs to the group of folks who believe in the age old saying that ‘every bit saved makes it a bit more’ (or words to that effect). So let’s give this question a shot.

To calculate the carbon footprint of a banner ad, we have to measure the energy consumed at each stage of the supply chain for creating, storing and serving the ad. So what exactly is the “supply chain” for serving the banner ads? Lets see who are the players involved:
1. The Advertiser
2. The creative agency (these folks design and develop the “creatives” or display ads)
3. The Ad agency (they work as the intermediary between the advertisers and publishers/ ad network)
4. The Ad network (someone like Yahoo or Double Click, who has got a pool of publishers)
5. The Publisher
6. The Content delivery network (someone like Akamai, who stores these creative closer to the end users location)
7. The ISP (and the entire internet infrastructure that brings the data to you)
8. The end user

Wow! It’s kind of mind boggling to think of all the systems that are used to serve the ad. For example, take the case of only one of the players in the supply chain, the ad network. It should have systems that help advertisers manage campaigns, book ads, view reports. Additionally, it should have systems to forecast traffic, run pricing models, log events and software to rotate banner ads, if required. All these applications, would most probably, be running on their separate servers (sometimes multiple servers for one application). So how exactly we track the energy consumed by one ad through all these systems.

But hang on a minute. We can make our task much simpler. The end objective is to see if we can reduce the carbon footprint of the banner ad by some optimization in the creative i.e. either by reducing the size of the ad or by optimizing the way it’s served. So we should only consider the systems that are affected by the type or size of the ad. For example, the pricing server would be unaffected by the size of the ad and so would a lot of other systems in the entire supply chain.

Most of the systems in the supply chain would only have a reference to the creative. The actual creative is perhaps used by only a handful of systems. So let’s see in what all systems the creative actually “consumes energy”
1. Systems used by creative agency to create the ads
2. Datacenters by content delivery network that stores the ads
3. Network bandwidth for serving the ads
4. Processing power consumed on users machine when the ad is displayed.

So if we know the average carbon footprint per processing power of the machine, we can allocate some of that to the banner ad depending on how much processing power it takes or how much bandwidth it consumes. In other words

Carbon footprint of banner ad = (Carbon footprint of the server or pc) * (processing power or bandwidth consumed by the banner ad)/ (processing power of the server or pc).

Calculate this for all the systems mentioned above and feel free to use the expression 'Voila' once you get there.

And here comes a disclaimer. Those who have read about cost accounting methods would quickly point out that the “allocation” method described above is perhaps not the optimal way to measure something. But looking at the abstract nature of the problem, this is the best I could think of.

Thursday, December 6, 2007

How does LinkedIn generates “people you may know” list?

My Google Analytics reports tell me that there is quite a significant number of people landing on my blog by searching this phrase in Google: How does LinkedIn generates “people you may know” list?. Well, I just mentioned this in passing in one of my earlier posts. That post, I don’t think, would have satisfied the Google searchers and with all probability would have left them with a serious urge to kick my and Google’s posterior.

So in order to save my ass (I don’t care what you do with Google’s), let me give this question a shot. Just to clarify I don’t have any inside information about the algorithm LinkedIn uses, this is merely my speculation.

Sometime back, one of my fellow networkers in LinkedIn asked the exact same question in the Q&A section. I, just like the United States marines, came to the rescue on that occasion. Here is my response to that question, produced verbatim for your benefit.

“If I had to develop this, I would use the following criteria to determine a potential connection for you

1) Attended same school/ universities (higher weight if you graduated in the same year),
2) Worked in the same firm (worked during the same period),
3) Number of shared connections,
4) Linkedin history of contacts between you and them
5) Contact list imported from outlook/ gmail etc. (These people were perhaps not part of linkedin when you imported the contact list; but they have joined the network since then).

Not sure what's the exact algorithm linkedin uses though.”


Let me add a couple of clarifications since I’ve taken the answer out of context. On point-4, I was referring to the people who might have applied to your job posting on LinkedIn in past or who would have privately replied to your questions and got a response in return.

Although I’m not sure what algorithm LinkedIn uses, I won’t be surprised if only thing it does is the last point I mentioned. This is a source of confusion to many people, when LinkedIn shows up the names from their address books in gmail/ outlook in the “people you may know” section.


Updated:


Looking back at my answer, I feel I can add a few more factors in the algorithm.

6) Look for members who have the same zip/area code as you. This has to be used in combination with other factors; such as employment history. For example, I definitely don't know everyone in Bangalore and may not know a lot of people in Yahoo!. But when you use my location and employment history in conjunction, chances are I would know the person who works in Yahoo, Bangalore.

7) This is similar to pint #4. LinkedIn should also look for other members activities to see if they would be a potential contact for you. If someone is visiting your profile frequently, chances are he might know you and can be considered as a potential contact for you (this is kind of back tracking). LinkedIn in recent time has started tracking this activity; although they don't always display the name of the person who visited your profile.

So we have all these factors that would help LinkedIn determine a potential connection for you. Again we can make a guess as to how LinkedIn would be doing the processing. They would perhaps start with your first level contacts and then traverse through your network graph calculating the "homophily" score between you and other members. The homophily (not to be confused with homophile)score would be a weighted sum of all these factors.

Once LinkedIn has identified all the factors, all they have to do is to keep tweaking the weights assigned to these for calculating the score. The obvious way to measure the effectiveness of this algorithm is to track how many times members follow these links and add those suggested "people you may know" to their network.

One subtle point is, to have a "wow" experience, members who are more degrees away from you might be ranked higher than someone who is already a 2nd or 3rd degree contact.

And before I close this topic, just to remind you is that this is my idea of how LinkedIn might be doing it. The actual implementation could be completely different.

So now that I’ve quenched your thirst for knowledge, you can perhaps go back to doing more productive work and searching for other important things in Google; say, for example, Lindsay Lohan pictures.

Bubble 2.0?

It is now for quite some time that a few industry experts are predicting another dot com bubble. So it was very interesting to come across this optimistic piece “What Bubble?” by Harry Gold.

Harry looks at the positive growth trend of internet ad revenue and concludes that there is actually no bubble. To quote Harry, “Wow! If that doesn't say it all and validate the hot air we've been blowing all year, I don't know what does….Just look at this growth trend:.”

It’s refreshing to see such a positive outlook, but why I’m still a bit skeptical about his conclusion. Let’s reproduce the image from his article:




-- Source: Interactive Advertising Bureau, 2007 –

Harry seems to have got it right, isn’t it? There is indeed a very significant positive growth trend for past few quarters. But before we start jumping in joy, let’s take a quick look at the Ad revenue growth trend in 2000. Do you remember when exactly the first dot-com bubble burst? It was March 13, 2000 i.e. 1st quarter of 2000. What was the Ad revenue trend before that? Well, again a very significant upward trend. Although, it’s difficult to get the numbers from the graph, the slope looks much steeper in 1999 than it is now. In fact the Ad revenue spends didn’t go down till 3rd quarter of 2000. So the downward trend in the ad spent is actually an aftermath of the bubble not the cause. Harry seems to have got the causality reversed in his equation. Perhaps he would have written a similar article in 2000 saying “What Bubble?”.

Another source for Harry’s positive outlook is an eMarketer report predicting the Ad revenue for future. We, of course, need to take such predictions with a pinch of salt. One of the methods commonly used in such predictions is the use of historical data, and as I’ve mentioned above it’s difficult to predict some drastic event just by looking at the history. One wonders why these folks were not able to predict the first bubble burst.

So what is my view on this? I feel the industry is again heading towards a major shake up. The major players like Google, Yahoo would definitely make it through; but some of the new players might find it difficult to survive. The warning signs are no where more evident than the social networking space. MySpace was valued $15bn in 2006. Facebook is similarly valued at $15bn by Microsoft recently (Yeah, I know the common argument that it was not a valuation, but strategic investment. Although I’m not sure what kind of strategy it was to flush money down the drain). Again we are going back to the days where firms are valued just by page views or user base. On last count there were around 137 million social networking sites (give or take a few); all of them relying on “network externality” effect to win the space. One wonders how many of them are going to make it through.

And by the way when we are on this subject, look at Harry’s article again to see the optimistic predictions by the eMarketer about this space. “Social-networking advertising numbers, currently being revised by eMarketer, are expected to increase from $900 million in 2007 to $2.5 billion in 2011.”. So $2.5bn annual revenue to be shared between all these players; some of them are valued at $15bn.. hmmn.. interesting.

Having said all that, I hope Harry is right and I’m wrong. I was not part of the industry when the first bubble burst (If memory serves one right, I was learning to write “hello world” programs in Java). But this time I earn my bread and butter here. So nobody will be happier if Harry’s optimistic views turn out to be right.