Storage

Whether it is at the drive, SAN (storage area network), or cloud levels, Code On’s proprietary coding technology, Random Linear Network Coding (RLNC), provides innovative capabilities for data storage retrieval, reliability, and repair.

Target Markets

 

RLNC’s storage solutions apply to all distributed storage applications, including

  • Multi-drive storage,

  • Storage Area Networks (SAN),

  • Multi-cloud,

  • Edge caching,

  • Hybrid cloud applications.

As a consequence, RLNC storage solutions stand to play a crucial role in the following applications:

  • Peer-to-Peer (P2P) networks and applications,

  • Content Delivery Networks (CDN),

  • Software-Defined Networking (SDN) and Network Function Virtualization (NFV),

  • Cloud services,

  • Cloud security,

  • Streaming video and IPTV.

 Multi-Cloud Services

Most of the global email and calendaring data is currently stored ‘in the cloud’, with other applications quickly following the trend. However, the cloud is not reliable or secure enough for such a shift. Cloud outages are growing in frequency, with outages affecting most major cloud services in recent years, including cloud storage (Dropbox, Apple, Amazon, Microsoft, CloudFare) and email (Yahoo, Gmail).

To ensure a level of reliability, service providers usually replicate user data across multiple cloud locations (data centers). In the case of cloud failure or disconnection, requests are fulfilled through connections to mirror storage facilities. The duplication of both storage and connection are crucial for reliability. The Gmail failure of September 2013, for example, was reportedly due to “redundant network paths” failing “at the same time”.

Replication increases storage and energy costs significantly. Moreover, the existence of copies at multiple remote locations reduces data security, and further drives costs, as each copy needs to be equally secure. Excessive replication and mirroring may also have an adverse effect on reliability by causing storage and communication overloads, hence increasing outage events.

 

What if an operator were to distribute a large number of file copies to different storage locations, where none of the copies represents the complete original file? Oddly enough, this method has been proven to deliver data to a given location more rapidly. An experiment conducted at Aalborg University (Denmark) shows that storing less than 65% of a file in five commercial clouds yields similar reconstruction delays as storing the whole file in each cloud. Furthermore, storing partial copies is more secure.

But how to manage the transmission of file fragments from multiple clouds, particularly in an increasingly dynamic storage environment?

By removing state distinctions between packets of the same file or drive sector, RLNC replaces duplicate files with smart data. This guarantees that coded packets arriving from all clouds contribute to the reconstruction of the original file, leading to significant speedups in average file reconstruction times.

 Edge Caching

Edge caching brings content closer to the user. It improves download times and facilitates the distribution of popular content. However, despite the wide-scale deployment of edge caching solutions, failures still occur. For example, CloudFare’s one-hour outage on March 3rd, 2013, was attributed to “systemwide failure of edge routers”.

RLNC realizes the potential of edge caching in a number of ways. First, it offloads Content Distribution Networks (CDNs) through implementing coded distributed storage. As in conventional uncoded caches, the caching of a small proportion of the coded files at edge nodes enables users to speed up their downloads. Unlike conventional caching solutions, however, coded caching requires fewer storage resources and simplifies download transactions.

Coded edge caching not only reduces server blocking but also enables coded caches to act as a peer-to-peer infrastructure, allowing them to scale naturally with local data demands. This unique RLNC feature reduces cache sizes and significantly increases data availability.

How It Works

In conventional distributed storage, content is divided into sectors that are distributed across the target drives. When content is requested, one copy of each sector is required. It follows that high levels of redundancy are required to ensure acceptable availability, especially in heterogeneous systems.

Using Random Linear Network Coding (RLNC), an optimal number of coded sectors is distributed across storage locations, depending on security, speed, and availability requirements.

Coded sectors are more versatile than original sectors, as any coded sector can replace any missing original sector. This enables storage systems to use less redundancy than conventional coding or duplication techniques, a feature that minimizes storage and energy resource consumption. To retrieve content, all that is needed is to retrieve as many distinct coded sectors as there were original sectors. This enables the download or streaming of the same content from different locations simultaneously, without coordination.

Owing to its recording feature, RLNC is the only coding technology that can create coded sectors from other coded (and uncoded) sectors without decoding first. This enables storage nodes and intermediate caches to generate additional redundancy on-demand and in a decentralized fashion, leading to tremendous reductions in required transportation, overall storage, energy, and drive repair speed (i.e., the time it takes to replace missing or failing nodes or sectors).

More importantly, RLNC removes the need for the complex bookkeeping and scheduling required to ensure the availability of original sectors in conventional systems. This ability to reduce the need for state information (a.k.a., “stateless communications”) eliminates much of the management overhead traditionally associated with distributed storage.

For more information on RLNC storage products, please contact our partner Chocolate Cloud

Performance Improvements

Significantly reduce download time for distributed delivery;

  • Considerably reduce data center energy consumption by reducing transaction times and required storage resources;

  • Greatly improve data availability in multi-resolution video schemes with high traffic loads;

  • Ensure cloud integrity for distributed storage with dynamic settings.