NSA in Utah: Mining a mountain of data
Bluffdale • In many ways, the new Utah Data Center is the quintessential black box.
Much as the $1.5 billion building's opaque exterior walls and dark windows obscure its contents, the powerful National Security Agency puts vast resources into hiding the gargantuan trove of secret intelligence and mega-computers to be stored inside.
The NSA's thousands of spies, analysts and support staff are the U.S. military's lead intelligence collectors. Created in 1952, the agency is charged with intercepting foreign communications, cracking codes, helping track down terrorists and defending U.S. interests against cyberattacks.
But a sharper picture of what is likely to go on within its walls has come into focus with recently leaked documents on NSA surveillance, combined with prior revelations, building specifics, information from defense contractors and hints dropped by top NSA brass.
The Utah Data Center spans 1 million square feet, with a 100,000-square-foot, raised-floor area divided into four separate data halls, each holding what the NSA calls "mission-critical'' computing servers and data-storage capacity. An additional 900,000 square feet will be devoted to technical support and administrative staff, amounting to fewer than 200 NSA employees.
The entire facility is heavily fortified by a robust security perimeter, its own police force, intrusion-detection systems, backup generators with three days of fuel storage and a structure designed to withstand major physical attack. It will need an average of 65 megawatts of electricity to run what several NSA equipment suppliers say will be among the most sophisticated supercomputers and largest reserves of data storage on the planet.
The Bluffdale site helps meet the NSA's computing demands well into the future, according to Harvey Davis, NSA director of installations and logistics.
"I always build everything expandable,'' Davis said in an interview with The Salt Lake Tribune. And thanks to Utah's relatively low utility rates and other favorable conditions, he said, "We're getting the biggest bang for our buck right there.''
Global power • The Bluffdale center comes online amid a historic multibillion dollar expansion of the NSA's capabilities worldwide, part of a sweeping and futuristic vision the U.S. military has chased for more than a decade.
Leaked documents dating back to 2005 show the Pentagon pouring immense effort, hundreds of acronyms and mind-boggling sums of money into modernizing its technology, networks, hardware and data sharing as it seeks to build a "global information grid.''
A 2008 internal Department of Defense report said such a grid "will be heavily reliant on end-to-end virtual networks to interconnect anyone, anywhere, at any time with any type of information through voice, video, images or text.'' It is described elsewhere as "analogous to a secured World Wide Web.''
Bluffdale fits tightly into that global vision, NSA officials confirmed in their Tribune interview, in that it gives analysts with appropriate security clearance at NSA headquarters in Fort Meade, Md., other U.S. locations and spy posts around the world access to its data and computers.
"We're basically running it for the [intelligence] community,'' Davis said.
The NSA also has pioneered so-called "cloud-centric'' technology to let outside agencies reach remotely into its enormous data pools. And the agency has spent years matching its computer architectures and systems with those of other intelligence gatherers Â including those managing U.S. reconnaissance and geospatial satellite imagery to ease information flows.
"The key is architecture,'' NSA's director of technology Lonny Anderson said in a 2011 interview. "There's no doubt in my mind that when we connect architectures, we'll never look back.''
Future is now • The NSA will use the Cray XC30 series of high-capacity computers to crunch at least some of its data, according to documents from several of the agency's private contractors. It's unclear, though, if these will be deployed at Bluffdale or elsewhere on the NSA's network, possibly at computing centers at Fort Meade.
Under a program nicknamed "Cascade,'' the Department of Defense has helped finance development of the XC30 series, which industry officials say can run up to 1 million Intel Xenon core processors simultaneously, enabling speeds of up to 100 petaflops. One petaflop is about one thousand trillion calculations per second.
According to sociobiologist E.O. Wilson and others, that kind of brute-force computing power could conceivably simulate the behavior of every active molecule in complex human cell mechanics. Or every cell in a human body. Or track the movements of every human on Earth, in real time.
It also could make short work of breaking advanced encryption methods, according to former NSA mathematician-turned-whistleblower William Binney.
The NSA will feed such "petacrunchers'' an even more astonishing collection of data. The magnitude of the agency's data storage reserves at Bluffdale all but defy comprehension. And it appears even that capacity will meet only part of the NSA's needs.
Some, including author and NSA watchdog James Bamford, estimate Bluffdale's storage in thousands of zettabytes, or yottabytes or one thousand trillion gigabytes of data. By comparison, the U.S. computer network hardware maker Cisco projects that by 2016, the total volume of data moved globally by Internet protocol, or IP, will be 1.3 zettabytes, or the equivalent of 1.3 trillion gigabytes.
One example of military-generated data puts that into perspective.
Five years ago, an internal DoD report highlighted the need for "very large scale data storage, delivery and transmission technology,'' to index and store streaming video from an expanding swarm of unmanned surveillance vehicles, known as drones, "and other sensor networks.'' The report estimated the U.S. military's network-wide storage requirements as "exceeding exabytes and possibly yottabytes.''
Since then, use of airborne surveillance drones has surged around the globe.
Exact numbers on how much data the NSA is preparing to store at Bluffdale is, not surprisingly, a fiercely guarded secret. In interviews, NSA officials would only say they built the center's capacity with an eye on Moore's law, the notion that computing power and the data it yields doubles every 12 to 18 months.
But in another sign, the NSA and DoD say publicly they favor purchasing off-the-shelf technology to lower costs. Stackable high-end storage drives now commercially available can pack as much as 50 petabytes of data space into about 1,000 square feet, by some rough calculations though doing so creates tremendous cooling problems.
According to the U.S. Army Corps of Engineers, the Utah Data Center's design includes huge air and water cooling arrays to keep machines from overheating, using up to 1.7Â million gallons of water every day.
Revelations • Several experts said simply the scale of its drive space in Utah offers strong clues as to the NSA's intentions.
"That is far more storage than you would need to store what's on every hard drive owned by every American, much less any database anywhere,'' said Allan Friedman, a technology-policy specialist and fellow at the Brookings Institution in Washington, D.C.
"The only thing you would need this amount of storage for is if you had direct access to cables or large amounts of data coming from everywhere,'' Friedman said.
This month, an avalanche of documents leaked by a former NSA contractor Edward J. Snowden, combined with past disclosures and other information already in the public domain, hardened that conclusion.
Snowden, 30, is now an international fugitive, thought to be in Moscow while he seeks asylum from U.S. charges of document theft and espionage.
His first leak to The Washington Post and London-based The Guardian newspapers: a secret FBI-sought court order requiring mobile-phone carrier Verizon to give the NSA so-called metadata files on all calls placed by Verizon's 121 million U.S. customers over a three-month period.
The storage required for mobile-phone metadata index files on each call with originating and receiving phone numbers, call durations and details on relaying cellphone towers is relatively small, on the order of terabytes or petabytes. And that would be true even if Verizon isn't the only carrier handing its metadata over to the NSA, and even if such court orders were renewed regularly.
But then came documents on PRISM, indicating the NSA accesses large pools of American and foreign emails, posts on social-media sites such as Facebook, YouTube videos, Google search histories, as well as storage clouds maintained by Apple and Microsoft. Those data volumes reach easily into the petabyte range or beyond.
Leaked details on Boundless Informant, an NSA program for collecting and harvesting Internet and telephone-traffic data of foreigners, contained hard numbers. The agency collected nearly 97 billion pieces of intelligence in March alone, primarily in the Middle East, a NSA document released by The Guardian showed.
Other less reported potential streams of NSA data revealed by Snowden only multiplied the arithmetic. Dramatically.
Information given to the Hong Kong-based South China Morning Post indicates NSA spies hacked into the servers of major telecommunications companies in China, collecting and mining millions of private text messages, in an operation dubbed "Tempora.''
Further leaks suggest the U.S. spy agency has access to immense data streams pulled by British intelligence from up to 200 trans-Atlantic fiber optic communications cables, at a rate of 10 gigabits per second, or 21 petabytes per day, per cable.
Not all that data, of course, will or could be backed up to the Utah Data Center, although Bluffdale ranks as the NSA's largest data-storage site, Anderson said. Instead, the NSA will spread and duplicate storage across multiple locales, including new facilities in a huge expansion at Fort Meade.
But some of the data most likely will be kept in Bluffdale, raising the question of how NSA analysts might use the Utah center to find needles in such a gigantic haystack.
Big data, big problems • Beyond the challenges of storing and managing that much data, NSA analysts face the ever-growing task of sifting it and making it useful. Data that big risks outstripping the ability of humans to detect and analyze useful patterns, a process known as data mining.
Mining data in search of terrorist and cybersecurity threats takes two basic forms, experts say.
One involves tracking suspicious individuals and groups over time and space, based on the digital residue their actions and movements leave behind. The other deploys machine algorithms to detect patterns that might foreshadow attacks or point to more suspects.
While the first extends traditional law-enforcement sleuthing, the second method often compares a variety of signal patterns before and after terrorist attacks an approach that, according to several studies, has a poor track record of yielding results.
A 2008 study by the National Research Council found that while data-mining methods worked in analyzing consumer patterns, "They are less helpful for counterterrorism precisely because so little is known about what patterns indicate terrorist activity.''
No two terrorist attacks are exactly alike. Compare the Boston Marathon bombings, for example, to the ones on Sept. 11, 2001.
As zeros and ones pile up, the NSA, other government agencies and private-sector scientists are racing to build new data-mining algorithms, drawing on a blend of mathematics, machine learning, artificial intelligence and database theory.
"We're learning, just like industry is, that in big data, machine analytics are critical to success,'' the NSA's Anderson said. "So we'll have machine analytics running on everything. We do today, and we'll continue to improve those.''
In March, President Barack Obama announced a $200 million research initiative across seven federal agencies to advance techniques for mining "knowledge and insights from large and complex collections of digital data," the White House said.
Similar DoD investments, meanwhile, dwarf that sum.
The military is now pumping $250 million a year into so-called "big data'' programs to achieve "a 100-fold increase in the ability of analysts to extract information from texts in any language.'' The U.S. military's research agency, Defense Advanced Research Projects Agency (DARPA), spends another $25 million yearly on improving techniques for crunching mountains of metadata, text documents and message traffic.
These military efforts also target new ways of visualizing data, incorporating it with maps, satellite imagery, complex modeling and other display systems known as geospatial overlays.
Private-defense contractors with large Utah presences, such as L-3 Communications-West and Science Applications International Corporation, or SAIC, are poised to play a key role at Bluffdale in these geospatial programs.
SAIC recently ran job ads in Utah for a team of logistics, network and operational specialists "for an exciting national security program supporting an Army Intelligence customer in Salt Lake City.''
Expertise in so-called "data viz'' technologies is crucial to a new generation of digital analysis, one industry observer said.
"When we add in geospatial data, we can start seeing stuff,'' said J.R. Reagan, a cybersecurity specialist with Deloitte & Touche. "And those visual cues are going to be integral to understanding what's in those billions and trillions of rows and columns.''
Reporter Thomas Burr contributed to this story.
A question of scale
One thousand megabytes is a gigabyte (10^9).
Capacity of home computer hard drives is typically measured in gigabytes, with one ``gig'' holding roughly the equivalent of 678,000 pages of basic text, according to the online archiving company Lexis-Nexus.
One thousand gigabytes is a terabyte (10^12).
One thousand terabytes is a petabyte (10^15).
One thousand petabytes is an exabyte (10^18).
(U.S. computer network hardware maker Cisco estimates the total volume of data moved globally by Internet protocol, or IP, in 2011 was approximately 369 exabytes.)
One thousand exabytes is a zettabyte (10^21).
One thousand zettabytes is a yottabyte (10^24), the scale of measure used to estimate data storage capacity at the National Security Agency's new Utah Data Center at Bluffdale.
Join us for a Trib Talk discussion
Tuesday at 11:30 a.m., Trib Talk's Jennifer Napier-Pearce will moderate a live video chat at sltrib.com with reporter Tony Semerad, the Brookings Institution's Alan Friedman and others about the NSA's Utah Data Center. You can join the discussion using a TribTalk hashtag on Twitter or Google+.
The National Security Agency in Utah
Today's stories conclude a three-part examination of the NSA in Utah. Read other installments, which look at the spy agency's history and address how and why the NSA's data center ended up in Bluffdale, at sltrib.com
Tune in to C-SPAN
The Tribune's Thomas Burr will talk about the NSA's Utah Data Center Tuesday at 7:15 a.m. on C-SPAN.