Datascape: Preparing for Data Surges

I have the distinct honor of being the namesake of Walter William Head, my great uncle, who was the president of the Boy Scouts of America® from 1926 to 1946. My mother dreamed that I would follow in Uncle Walter’s footsteps, but because scouting had no merit badge for golf, I found it difficult to accomplish…  However, I did achieve Life Scout and have always strongly believed in the Scout motto: “Be Prepared™.” These days the tech-savvy side of me believes that in today’s massive evolution of the cloud, preparation has never been more relevant.

In previous blog posts,“Datascape: Edge to Cloud” and “Datascape: The Trouble with Fog,” I introduced the challenges that next-generation data centers face with the data explosion caused by the sheer volume of IoT devices at the edge. The challenges of managing massive amounts of Big Data and Fast Data are not trivial matters.  Data surges are what keep data center architects, data scientists, and IT managers up at night. While we cannot stop them from coming at us, we can take steps to prepare.

When I need advice about how things work I turn to my trusted advisor, Earle Francis Philhower III, Sr. Manager, Technical Marketing (aka EFP III). I recently consulted him about how IoT, online retailers, and service providers can prepare for the inevitable bursts of traffic that come from massive IoT batch analytics jobs and e-commerce events like Amazon Prime™ Day, Alibaba’s Singles’ Day, and, of course, Black Friday.

What follows is a rigorous and entertaining email exchange between my technical conscience (Earle) and a marketeer with closet ambitions of the Sr. Golf Tour (yours truly). We discussed the IoT cloud (data center), plus how to manage massive fleets of autonomous things and massive web traffic events.  We hope you enjoy!

[TWEET “Are you prepared to handle an #IoT data surge? Here’s advice from a ‘technical conscious’ #NVMe”]

+++++++

To: EFP III

From: Walter Hinton

Re: Preparing for Data Surges

Dear Technical Conscience:

Dude, I woke up in a cold sweat last night. I never made Eagle Scout™ and disappointed my mother. “Be Prepared” kept running through my head. All I could think about was how to help our friends in enterprise and cloud data centers be better Boy Scouts™. I harkened back to my early days of engineering in the Stone Age of telephone networks when “Be Prepared” meant designing networks for Mother’s Day, the one day of the year when the number of phone calls in America is 20% higher than any other. It was no simple task, but all you had to do was build a network infrastructure that could accommodate Mother’s Day within a few percentage points, and you were a rock star engineer. 

But if you’re wasting extra capacity the other 364 days of the year, it might make you less than popular with your finance department.  Hmmm, that might be one reason I moved to marketing…

We have established that 4G and 5G are slow but ubiquitous, allowing for thousands of things (phones, sensors, etc.) to stream in parallel. Now when IoT storms, Singles’ Day, or Black Friday hit a cloud, we must prepare data centers for elastic ingest and processing, just like our phone networks of the last century. We also need to keep an eye on costs and utilization so that we don’t have expensive capacity idling.

We’ve been talking about the 3-D view (Devices, Data, and Data Centers), the role of the Device (Edge), the types of data (stream, burst, batch) and the anatomy of cloud (Data Center) workloads like analytics, processing, and archival. What’s the 3-D View for handling massive bursts of data?

Walt

++++++++++

To: Walter Hinton

From: EFPIII

Re: Preparing for Data Surges

Good morning Walt,

Email on Saturday morning?  I’m worried about you.  Why aren’t you on the golf course?  Remember: “all work and no play …” 

That said, here’s how I’d describe proper preparation under The 3-D View: Devices, Data and Data Centers.

It starts with an evolving storage technology for enabling such elasticity (and, therefore, rock star engineer status). It already lives in many next-generation data centers near you, and it’s known as Non-Volatile Memory express (NVMe™). NVMe is a standard protocol and driver for flash-based SSDs. Developed by an open industry consortium of leading storage, networking, and server vendors (NVMexpress.org), the NVMe interface increases non-volatile storage performance in PCIe-based servers and SSDs by eliminating the ATA and SCSI command stacks and Direct-Attached Storage (DAS) bottlenecks associated with traditional storage interfaces.

Compared with SATA or SAS, the NVMe standard delivers better bandwidth and IOPS performance, plus lower latencies. It also provides scalability to main storage devices without the cost or complexities of battery-backed RAID or HBA cards. Now, not all manufacturers and models can provide NVMe devices that hit these potential speeds. Here at Western Digital we have the SN200 series, with both U.2 and Add-In-Card (AIC) form factors that come very, very close to the theoretical bandwidths and latencies.

Check out these charts:

Be Prepared for Data Surges

EFPIII

++++++++++++++++++ 

To: EFPIII

From: Walter Hinton

Re: Preparing for Data Surges

Dude, I am using a new-fangled “edge” technology from the 12th fairway (okay, the rough) that lets me send email!  The course record is in no danger of going down today, so I’m stuck on this preparation thing.

I love NVMe – especially server-side where we now have very dense servers that can accommodate from 12 to 48 2.5” NVMe drives.  For e-commerce and streaming engines like several NoSQL and MySQL™ variants you can scale and fail in place for easy management … nice!  But what about massive scale IoT use cases like manufacturing facilities, robotics, machine learning, and artificial intelligence (AI) where there is no fog, so networks are faster?

I know of a customer with a factory floor that has thousands of sensors streaming data at thousands of times per second to a NoSQL database.  Each sensor puts out a relatively small stream of data, often only hundreds of kilobytes-per-second (KB/s), but when those streams come together, they create a torrent with the potential for terabytes of data per second (TB/s). Not only do we have to ingest, aggregate, and archive, but we also must perform real-time analytics.  Can we make this big task easier?

Walt

++++++++++++++++++

To: Walter Hinton

From: EFPIII

Re: Preparing for Data Surges

Sorry about the rough there, Walt.  I hear they call you “The Chop” for your ability to dig it out of the thick grass, but it also explains why every Monday you are whining about a sore left shoulder.  I’d aim for the fairway.

For data surges you can deploy dozens or hundreds of servers with traditional SATA- or SAS-based SSDs, or you can deploy significantly fewer servers using NVMe. The massive bandwidth advantage of NVMe allows for aggregating these sources into the database and providing enough throughput and IOPS to perform near real-time analytics without requiring a significant number of servers that may be idle most of the time.

We’ve talked in previous blogs about how IoT devices (e.g. autonomous cars, drones, factory/farm machines and equipment, and surveillance cameras) collect massive amounts of data. There’s as much as 1GB per second in the case of an autonomous car.  Latency here can literally mean life or death; for example, the farmer’s autonomous tractor can’t send data to the cloud and back fast enough before plowing over the cattle. Forgive the imagery here, but it’s a real possibility. To effectively handle these on-board data deluges and mitigate bovine (and other) disasters, NVMe devices will be required for better bandwidth, IOPS performance, and lower latencies.  Watch for new entries in this technology space, including specialized features for device lifespan and temperature range.

These on-board data collections are expected to generate nearly two petabytes (PB) of data per year. That’s just too much to keep on board. Even if we could store it locally, there is far more value in sharing that burden with the cloud. Think of existing real-time traffic congestion reporting, which can exist only if the data is shared with the “fog.”  Therefore, massive autonomous fleets will need to push processed data to the cloud for storage and processing, creating yet another data surge that requires the bandwidth, IOPS, and low latency that’s enabled through the streamlined NVMe stack in the cloud.

EFPIII

+++++++

To: EFP III

From: Walter Hinton

Re: Preparing for Data Surges

Thanks as always, technical conscience!  I get it. Just like in the days of designing telephone networks, I realize that we must plan for peaks and valleys. While we do not want to stop the massive surge of activity from coming, we can take measures to prepare for the inevitable. The data surges caused by today’s IoT devices or big e-commerce promotions do not have to take us down! We can prepare and create even cooler IT rock stars as data centers evolve to adopt NVMe technology!

Walt

++++++++++++++++++

To learn more about NVMe and its increasing role in next generation data centers, check out our recent blog on the evolution of storage.

Also, stay tuned for our next installment in the Datascape blog series, where we will explore the Enterprise segment of our Datascape infographic.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.