Yahoo – Hadoop and Green Computing Executive Summary: Yahoo, an American internet corporation currently utilizes a global cloud computing infrastructure that relies heavily on a technology called Apache Hadoop. Yahoo’s ability to crunch unimaginable amounts of data for the purpose of creating increasingly relevant experiences for its users is based on this Apache Hadoop technology. Apache Hadoop works with the cloud to process and analyze all of the data Yahoo collects efficiently.
Because Yahoo places a high importance on its ability to customize results and stories for users, Hadoop is crucial, specifically by enabling previously unattainable feats of efficiency and speed. One area of IT infrastructure that partners really well with the capabilities of cloud computing that Yahoo uses is green computing, utilizing green server farms. I’m going to recommend that Yahoo increase its utilization of these new green server farms to maximize operational efficiency. Scenario Description:
Yahoo, was founded in Santa Clara, California in January of 1994. It is best known for its web portal, search engine, advertising, mail and news clients, online mapping and many other services. It prides itself on being able to ‘cut through the noise’ and help people find what they want in a curated environment. Yahoo’s web portal provides current content such as news, entertainment, sports and weather as well as connecting users to other Yahoo services like Yahoo! Mail, Yahoo! Maps, Yahoo! Finances among others.
Yahoo is able to collect massive amounts of data from its users through its web services as well as from advertisers. Fascinatingly, although smaller than Google with respect to page hits, Yahoo actually collects much more data than other competitors, as much as 2,500 records a month from each of its visitors (6). As you can imagine, this data is incredibly useful to Yahoo in its ability to add value to users by showing them personalized results, as well as advertisers, promising them a wealth of information about users.
Given that Yahoo generates the majority of its revenue from advertising (like Google) as well as its other subscription services; technology that improves Yahoo’s ability to process and analyze the information it collects adds tremendous value. Apache Hadoop is precisely this technology. The Technology in Detail: Apache Hadoop is a software framework that allows multiple computers (or large servers, in Yahoo’s case) to handle complex and massive data processing tasks, enabling applications to work with petabytes of data across thousands of nodes.
Simply put, Hadoop maps the information spread across thousands of computers and creates an easier mean to dig into queries (8). It is an open-source Apache project, not only built but used by its contributors. Hadoop is written in Java and was created by a man named Doug Cutting (7). You may wonder how Hadoop got its name. Creatively, it was named after Cutting’s son’s toy elephant (8). Hadoop is based on MapReduce computing. According to Google, “MapReduce is a programming model and an associated implementation for processing and generating large data sets…
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. ” (9). The MapReduce framework is inspired by the map and reduce functions commonly used in functional programming. Yahoo has been the largest contributor to Hadoop (10) and uses it across a wide range of its businesses. Apache Hadoop has many benefits and a few downsides. One of the biggest perks in focusing on Hadoop is that Yahoo is able to tap into a generation of developers that know MapReuduce (11).
Being open-source, Yahoo is able to pull from a vast reservoir of experience as well as contribute its advancements to the global community.. Unfortunately, the current version of the Apache Hadoop MapReduce framework has scalability limits of around 4,000 machines (12). Also, upgrades are needed to the framework to improve scalability, memory consumption, reliability and performance. Yahoo is working with the Apache community to develop this “Next Generation” of Apache Hadoop MapReduce (12).
There are many other companies that utilize Apache Hadoop, and more specifically the MapReduce framework. A few big names include Facebook, which actually claimed in 2010 to have the largest Hadoop cluster in the world, standing at 30 PB as of July 2011 (13), eBay, Amazon, The New York Times, Twitter, and many others. Strategic Benefits of the Technology and Recommendations: Two of Yahoo’s strategic values include Innovation and Customer Fixation. First with regard to Innovation, Yahoo has said, “We anticipate market trends and move quickly to embrace them.
We are not afraid to take informed, responsible risk” (14). The Apache Hadoop technology plays directly to this goal, allowing Yahoo to map out, analyze, and predict market trends almost instantly. Vast amounts of user information can now be utilized, sorted, and used to provide people exactly what they are looking for. Secondly, with regard to Customer Fixation, Yahoo has said, “We respect our customers above all else and never forget that they come to us by choice”.
It is this understanding that customers in this dynamic online era have the right to choose where they go on the web that makes Apache Hadoop shine for Yahoo. Beyond these values, by utilizing Hadoop Yahoo is able to rack up cost savings because of efficiencies in data processing. To maximize their efficiency across the board for the sake of minimizing costs and remaining competitive in this ever-changing market, I would like to recommend that Yahoo expand on the usage of green technology.
Yahoo has started a trajectory of maximizing efficiency with its implementation and development of Apache Hadoop and can take it to the next level with green server farms. These two technologies work hand in hand together. If Yahoo was to continue to build even more green data centers, it would be possible for them to re-route work to the most energy efficient servers when traffic is down, potentially adding up to massive cost savings in energy. Yahoo has already developed one state-of-the-art data center (15).
Works Cited “The New York Times Technology Image They Know More Than You Think. ” The New York Times – Breaking News, World News & Multimedia. The New York Times, 10 Mar. 2008. Web. 03 Oct. 2011. <http://www. nytimes. com/imagepages/2008/03/10/technology/20080310_PRIVACY_GRAPHIC. html>. 6 – http://www. nytimes. com/imagepages/2008/03/10/technology/20080310_PRIVACY_GRAPHIC. html 7 – http://www. sdtimes. com/blog/post/2009/08/10/Hadoop-creator-goes-to-Cloudera. aspx 8 –