Commentary By Mark P. Mills

Energy and the Information Infrastructure: Part 4 — Data Is 'The New Oil'

Energy Technology

A note about this series (Part 1, Part 2, and Part 3):

We live in the zettabyte era, a term coined by Cisco Systems in its annual digital traffic forecast. Nothing else in commerce is measured at such inconceivable scales. And no other daily activity involves a product as tiny and ethereal as a byte. Everything about the digital era, especially its aggregate energy appetite, is captured at the intersection of these extremes: the withering decline in the nanoscopic energy per byte and the blistering growth in bytes consumed.

The world creates some 40 zettabytes of useable data each year. So just how much does that much data weigh? After all, the physical realities—i.e., the hardware and the energy needed to power the equipment—are where the rubber meets the road.

The answer? The world’s annual data traffic weighs about ten million tons.

How do we know? Thank Amazon for helping to frame this aspect of the physics of information.

A few years ago, the Web giant launched its oddly named Snowmobile. It’s a 33 ton, 45-foot-long truck, a semi-trailer chock-a-block filled with digital memory that can hold 100 petabytes (PB) of data and transport it from customers’ premises to the Amazon Cloud. (A 100 PB is roughly the data in one million smartphones.)

The service was launched to serve the rapid growth in organizations that create and store data at petabyte scales. In order to take advantage of the cost and performance efficiencies associated with storing and processing data in the Cloud, one has to transfer the data to those remote hyperscale datacenters. (For more about those behemoth buildings, see Part 1 in this series.) But it’s one of those ineluctable and annoying realities of physics that data transport speeds are roughly one thousand times slower over distances of miles compared to meters. So, transferring 100 PB to the remote Cloud on the best high-speed fiber network would take nearly 26 years.

Thus the Snowmobile; parked at a customer site it can upload 100 PB in about a week. Of note, during the week of uploading the Snowmobile’s digital hardware consumes electricity equivalent to about 40 barrels of oil, and then another 10 barrels or so of actual oil to drive the truck cross-country to a Cloud datacenter. Meanwhile, peak global Internet is roughly one Snowmobile’s worth of data every 30 seconds, and forecast by Cisco to double in just three years with no end in sight to exponential growth

Before exploring the future beyond three years, consider how far we’ve come. Global Internet traffic is some 10 million fold greater than when Amazon was founded in 1994. Could the next two decades see a data explosion as great as the past two? And what would we call that era? The answer to the first question is the single biggest wild card for future digital energy demand. And the answer to the second: the brontobyte era.

Numbering things may be humanity’s oldest skill; it’s more than informative, it enables commerce. Historians think numbering began with the Sumerians in 4,000 BC. But the ancient Egyptians were the first to create a word (a hieroglyph) for the then unimaginably large number of one million.

The scale of our modern society has brought familiarity with big numbers. Annual food and mineral production are counted in millions of tons; people and their devices in billions of units; airline and highway usage in trillions of collective miles; electricity and natural gas in trillions of kilowatt-hours or cubic feet; economies in trillions of dollars. But, at the rate of a trillion per year of anything, it would take a billion years to total a zetta. Data is the only domain in history where we’ve needed to use such enormous numbers to track the flow of commerce.

For those not steeped in numerical prefixes, each name represents a 1,000-fold jump: terabyte (trillion), petabyte (thousand trillion), extabyte (thousand petabytes), and then the zettabyte. A zetta stack of dollar bills would go from the earth to the sun -- that’s 93 million miles away -- and back, 700 thousand times. Such scales defy imagination.

After zetta there is only one remaining officially named number, the 1,000x bigger yottabyte. However, anticipating that the relentless expansion of things digital, computer scientists have unofficially endorsed the brontobyte (1,000 yottas) and the geopbyte (1,000 brontos).

Thus we come to the law of large numbers, and the point of our excursion into the arcana of numerology. Processing a single byte entails an energy cost measured in nanojoules, itself an impossibly small number. A single flea hop entails 100 nanojoules. But process or move a byte a billion times per second – the gigahertz CPU in every digital device – and transport those bytes a zetta times and you quickly aggregate to quantities of electricity greater than that consumed by a nation the size of Japan.

Of course the purpose of generating data is not to consume energy, but to use software to “refine” it, to turn a raw resource into useful information. Hence, “data is the new oil,” a clever turn of phrase credited to have originated in 2006 with Clive Humby, a UK mathematician and data scientist. Oil and data are different in the obvious fact that one produces, while the other uses energy. But that’s not the point of the analogy. Petroleum fueled the economic rise of the 20^th century from the myriad of products and services that emerged from refining what was then a radically new raw resource. Similarly, refining raw data leads to the even greater array of products and services that are fueling growth in the 21^st century.

So the central issue of interest distills to estimating the prospective magnitude of the data universe; i.e., the size of the underlying “resource.” In other words; what else will yet be digitalized? The answer is anything and everything.

The ancient Egyptians compiled data about grain storage and the like because it enabled planning. The Roman census added precision to one of humanities least favorite purposes for counting; tax collection. Both forms of data collection – for managing revenue collection or supply chains – have been central to civilization for eons. Now digital technologies make it feasible to track materials from mine-mouth to factory to consumer, and not just quantities but real-time locations and features at each point in time (temperature, velocity, wear rates, etc.).

Modern data goes beyond, say, the number of people visiting a hospital or a number of cars transiting a city; it includes the collection in real time, for instance, of data about each individual’s heart rate or specific location or activity (standing, sleeping, moving, breathing), or the location, speed and operational health of every car, and increasingly, any machine or device.

Then there is data about the data. Think of this as analogous to financial derivatives. Just as there is information in the velocity and volume of financial transactions without regard to the specifics of any transaction (e.g., the direction and speed of changes in the DOW), so too is there useful information in the “digital exhaust” from the data collecting and processing machines. This so-called metadata is increasingly valuable as digital systems scale in reach and in granularity, reaching down into the interstices of everything.

We live in a time in which the very nature of data has itself transformed. Data now is as different from the pre-digital world as synthetic products (plastics, pharmaceuticals, gasoline) are different from natural products (wood, paper, and grain) in the world before chemical science. And we are on the precipice of an unprecedented expansion in both the scale and variety of data yet to be accessed.

Up until now all the excitement and disruptions to “old economy” businesses have largely centered on information-centric activities; the virtual worlds of telephony, television, mail, news, entertainment, advertising, finance and travel services. Meanwhile, most of the economy – in fact 90% of the GDP -- is associated with the physical world, such things as food, factories, houses, office buildings, hospitals, energy production, and vehicles. As many recent analyses from the likes of Goldman Sachs, McKinsey, BCG and others have mapped out, digitalization is still in early days in all hardware-related activities.

But that asymmetry won’t last. Software of course began long ago to invade hardware domains. In fact, so-called firmware control systems for industrial automation predate the Internet. The reason for the lag is not that hardware companies are captained by troglodytes, but that it is far more difficult to make effective sensors, software and systems in the world of atoms where failures (viruses, re-boots, ‘jitter’, frequent updates) have real-world physical consequence ranging from the destruction of expensive capital equipment to human injuries or fatalities.

Even so, as Silicon Valley legend Andreessen Horowitz provocatively put it, software will eventually “eat everything.” The more accurate metaphor though is that software is invading everything, like a virus. The data implications of that invasion are apparent in one simple fact: even though we’re still in early days of the digitalization of physical domains, aggregate data generated (even if not yet “refined”) in manufacturing, medicine, transportation, and infrastructures, already exceeds all data associated with retail, media and entertainment.

So, to the future: two macro trends are now driving the blossoming of a new data era.

The first macro force should be obvious: it’s the increasing automation of hardware wherein all the sensors, software and control systems everywhere necessarily generate massive data streams. For example, long before we see the autonomous car, the “connected” car with all its attendant features and safety systems will generate terabytes per day per car. With over a billion cars in the world now, and far more in the future, that trend, arithmetically, eventually leads to zettabytes per day, not per year. And that says nothing about the terabytes generated to improve maintenance and safety, in just one more example, by every aircraft engine in every one of the hundred thousand flights per day.

Even more data will gush out of the rise of industrial “digital twins.” Here a fully sensored factory, machine, or process is compared to its computer model, simulating in real-time in the virtual world the operation of its real world twin. “Such digital twins” enable far better control, efficiency, productivity, and safety.

The second data-generating macro? Look to the inexorable engineering advances in our capacity to sense and measure things, i.e., to expand the underlying “resource.” Put differently, data is a resource that -- unlike its natural analogs -- humanity literally creates. And the sensors that create that resource are simultaneously collapsing in size and cost while expanding in scale of deployment.

We now collect data at astronomical scales, and we’re just getting started. Not just in the study of astronomy itself (though that too now has instruments that continually generate thousands of petabytes of data), but about everyday things, such as traffic patterns of entire cities and nations. The Internet of Things entails much more than home thermostats and smart refrigerators, but also such things as sensor dust that can be sprayed onto agriculture fields.

Progress in the devices that can measure our world has advanced far more radically than with the far more celebrated progress in logic devices. The physics of detection -- acquiring data – is quite unlike the physics of information processing (or especially energy production). One can, in effect, ‘trick’ nature to give up data by using the features of the phenomenon one is measuring.

So in sensing, engineers can literally chase dimensions down to “the bottom” of nature with capabilities that seem genuinely magical. One more example: Scientists recently published results of a new class of sensor capable of measuring mechanical motion at a thousandth the diameter of an atomic nucleus. Such a sensor can, in effect, ‘listen’ to motion of individual bacteria.

But let’s use the easiest metric for finding the big kahuna in data generation: just follow the money. Healthcare is, as is widely noted, now the single biggest sector of the economy, surpassing manufacturing and agriculture. Few doubt that digitalization and software will unlock enormous efficiencies and cost control to every aspect of medicine. Far more interesting however, is how data will shape tomorrow’s healthcare. Finding magical new cures is anchored in emerging sensors, digital tools and computing. The associated explosion in data collection will be epic. Genomics data alone is on track to generate as much as 40,000 PB a year.

Or, consider the implications of the cryo-EM, a radically new class of microscope (essentially a kind of sensor). The cryo-EM earned its three inventors a 2017 Nobelprize and allows, for the first time, direct imaging of biological structures at molecular scales. It’s as big a deal as the invention of the optical microscope four hundred years ago and will similarly open new frontiers in biology and thus innovations and cures. A single cryo-EM creates petabyte levels of data.

There are myriad other examples of emerging, digital-based information-collecting medical devices that are right out of science fiction. See the recent Tricorder X-prize contest for shrinking a laboratory’s diagnostic tools down to smartphone size (hence the use of the Star Trek “tricorder” moniker). Then there’s the concept of a virtual physiological human (VPH) -- the idea of “digital twin” for us humans--modeled on the same concept as for mechanical machines and industrial processes noted earlier.

However, today’s pursuit of a VPH is focused on facilitating research into human biology and the potential for undertaking virtual clinical trials for therapeutics within a supercomputer instead of within a human population. Once perfected, this will radically accelerate (and make both safer and more effective) the time between discovery of a new therapeutic and its validation.

Such VPHs will necessarily entail hundreds and more likely thousand of petabytes of data. It’s no more fantastical to think that every person in the developed world will have a personal digital twin in the Cloud, than it would have been in 1980 to expect that one billion people would one day carry more compute power in their digital phones than 10,000 of the IBM mainframes of that day.

The VPH also highlights the symbiosis between sensors and data. A VPH will necessarily require means of collecting an individual’s real-time body chemistry and biological data for individual organ functions at levels of granularity not now possible. But such sensors are now emerging from the field of bio-compatible and “transient” electronics; these will include not only bio-compatible wearable computing (e.g., smart Band-Aid-like sensors), but also consumable or even vaccine-like injectable computers. A not-so-far future with a billion people each owning a petabyte-class VPH will generate yottabytes of data.

Of course all trends eventually face saturation. But there is no prospect for peak data in the foreseeable future. We are much further away from saturation in data production than the world was in 1919 with oil production.

We will thus soon enter a 3^rd information era. The first information era had a long run, beginning in 1858 with transatlantic telegraphy, quickly taking the world into telephony and from zero to several megabytes per month of data. The second, which we are just exiting, began a half-century ago on October 29, 1969, when the Arpanet was lit taking us to the zettabyte era. The 3^rd is the path to the brontobyte era. As with each era, we will see the emergence of entirely new, yet unknown, or yet-to-be-founded companies.

But step back from the long run and consider just the next two decades. Ciscoforecasts that the total amount of data generated and stored in devices will rise more than four-fold in just three years, approaching 1,000 zettabytes – the yottabyte. The benefits we will derive from refining such a rich resource are inherently nearly unimaginable.

As for the thought experiment about the energy implications: consider one fact and one key question. The fact: It took 160 years to reach zettabytes of data generation and now we’ll add 1,000 times more than that before a decade is out. The question: how could aggregate energy demand go anywhere but up?

Of course the race continues, as it has for decades, to radically improve digital energy efficiencies. Much more is possible. (See Part 3 in this series.) But it’s hard to imagine that the amount of energy required to fuel digital domains will be anything less than double.

However, although energy is a good proxy for exploring the sheer scope of the future digital infrastructures, what’s truly exciting are the benefits we will derive from refining the bronto-flood of data.

Although the aphorism that “data is the new oil” has been repeated often in recent years, the real credit for that idea belongs to Steve Jobs. His phrasing was a little less pithy, but the analogy finds its origins in a prescient lecture delivered by Jobs at Sweden’s Lund University in 1985, when he said:

“[W]e’re living in the wake of the last revolution, which was a new source of free energy. That was the free energy of petrochemicals. It completely transformed society, and we’re products of this petrochemical revolution, which we’re still living in the wake of today. We are now entering another revolution of free energy. A Macintosh uses less power than a few of those lightbulbs, yet can save us a few hours a day or give us a whole new experience. It’s free ‘intellectual energy’.” [emphasis added]

Jobs was right that the “intellectual energy” extracted by refining data and democratizing computing was as revolutionary in economic and social terms as the migration from the age of steam to the age of petrochemicals. But in the universe we live in, the intellectual energy unleashed by silicon isn’t free; it has an energy cost. And always will.

This piece originally appeared at RealClearEnergy

______________________

Mark P. Mills is a senior fellow at the Manhattan Institute and a faculty fellow at Northwestern University’s McCormick School of Engineering. In 2016, he was named “Energy Writer of the Year” by the American Energy Society. Follow him on Twitter here.

This piece originally appeared in RealClearEnergy

Energy and the Information Infrastructure: Part 4 — Data Is 'The New Oil'

Further Reading

- -

- -

-