|
Web 2.0: How High-Volume eBay Manages Its Storage
2006-10-27
http://www.eweek.com/c/a/Storage ... anages-Its-Storage/
The ultrapopular auction/sales Web site continues its exponential growth and finds itself adding 10 terabytes of new storage every week. That's a lot of data.eBay, like Xerox and Google, is fast becoming its own generic verb for what it does ("Oh, just eBay it"). And when a company itself becomes the name for what it does, then a certain level of success has been reached.
Make that a very high level of success. Among Web 2.0 companies, San Jose, Calif.-based eBay is up there with Google, Amazon, Yahoo, eHarmony, Digg.com, and social networking sites MySpace.com and FaceBook.com as far as traffic, popularity and profitability are concerned.
Some of the facts and figures -- according to eBay itself -- about the worlds largest auction business and most popular commercial Web site are downright staggering:
The site averages more than 1 billion page views per day.
Users trade about $1,700 worth of goods on the site every second.
26 billion SQL queries per day.
A vehicle sells every minute.
A motor part or accessory sells every second.
Diamond jewelry sells every 2 minutes.
The site currently posts about 600 million listings per quarter and about 204 million registered users.
This one, in particular, is striking: 1.3 million people make all or part of their living selling on eBay.
To put this into a little perspective, FDRs two largest New Deal jobs programs, the Civilian Conservation Corps and the Civil Works Administration, employed a total of 6.5 million workers in the 1930s.
eBay is handling all those transactions, Web site page views and money changing hands on a near-no-latency, 24/7, international basis. The sites availability has been charted at 99.94 percent per day (a hiccup of about 50 seconds per day). When it was charted in June 1999, the site was latent an average of 43 minutes per day.
How eBay got to this level of technical efficiency and success is a long story. What we can do is offer an overview about how the company approaches its storage strategy—something that the company hasnt talked about with the media before.
Not many Web-based businesses have run into the kind of traffic and server-availability issues that eBay has experienced.
"Our growth has just been exponential for 11 years," eBay Research Labs Distinguished Engineer Paul Strong told eWEEK. "And since our job is to provide available, efficient, low-latency, 24/7 performance, we know we have a difficult job to do every day to keep the site running as perfectly as we need it to run."
eBays storage engineering team ("Eleven people," Strong said) utilizes 2 petabytes of raw digital space on a daily basis to run the site and store its data, yet has to add about 10 terabytes (or 75 volumes) of new storage every week to cover new transactions, Strong said.
That follows alongside the eHarmony story: That highly successful social networking site has to purchase additional storage about every 90 days.
eBay said it uses a traditional grid computing system with the following features to build the site:
about 170 Win2000/Win2003 servers
about 170 Linux (RHES3) servers
three Solaris servers: build and deploy eBay.com to QA; compile Java & C++; consolidate/optimize/compress XSL, JS and HTML
time to build site: was once 10 hours; now only 30 minutes
in the last 2.5 years, there have been 2 million builds.
Then, the content is deployed to a system of about 15,000 servers.
eBay uses a number of different products in its storage setup, including switches from Brocade, software framework from IBM Tivoli, NAS (network-attached storage) hardware from NetApp (5 percent of the system) and large arrays from Hitachi Data Systems (95 percent of the system), Strong said. It also runs Oracle DB, he said.
"Oh, Im sure Im leaving somebody out. Theres probably something from each of the major storage manufacturers somewhere in our system," Strong said.
eBay maintains four copies of most of its databases, according to Strong.
eBays main data centers are spread out over the continental United States, and it also has co-locations all around the world, he said.
Becoming a trusted eBay supplier is not an easy task, according to Strong. "It takes a long time for a company to prove itself enough for us to use them," Strong said. "There is a great deal of testing that goes on before we selected a vendor, for anything."
The storage environment is modular in design, so to add incremental storage containers or servers is a merely difficult—but not daunting—exercise.
"However, we really hesitate to add new brands of software and hardware, if possible," Strong said. "Wed love to be in a position where everything appears homogenous, so that it minimizes the skill sets our engineers need."
eBay home-cooks some of its own software to use within its system—customized specifically for the online auction/sales environment and eBays unique business needs, Strong said.
eBays structure is, according to Strong:
highly distributed
the auction site is Java-based; the search infrastructure is written in C++
have hundreds of developers, all working on the same code
To keep scaling with the growth of the business, Strong said eBay uses:
Centralized Application Logging, a scalable platform for logging fine-grained application information
Global billing: real-time integration with a third-party package
Business event streams: a unifying technology for efficient and reliable message queues. Cookie-cutter patterns are used within the system for optimal user experience
Reliable multicast infrastructure: allows for distributed analysis of massive amounts of data and keeps the companys growing search infrastructure up to date.
When it comes to handling all that traffic, Strong said the worlds time zones provide a kind of natural "load-balancer."
"When were busiest here in the U.S., thats generally when Europe [the second-largest region using eBay] is asleep—and vice versa," Strong said. "Although we have surges now and then, the natural divide between the two continents works well for us." |
|