Storage: File systems are the foundation for protocol convergence

While we are busy focusing on the convergence of file and object storage with scale-out / distributed file systems, the industry is busy adding block level interfaces to the mix.During one of our vendor calls the other day, someone asked us a...


While we are busy focusing on the convergence of file and object storage with scale-out / distributed file systems, the industry is busy adding block level interfaces to the mix.

During one of our vendor calls the other day, someone asked us a question on how IDC would track the convergence of file, object and block based storage. The context for the question was the breed of solutions from several vendors that not only provide file and object interfaces but also block interfaces like iSCSI. 

How would we track such solutions? It was an interesting question that got me thinking. Should the basis for this analysis be the underlying file system technology? 

For example, Nexenta offers iSCSI atop ZFS and Red Hat offers iSCSI atop Red Hat Storage 2.0 (a.k.a. Gluster). Vendors like BlueArc (now part of HDS), EMC and NetApp have been offering block interfaces atop their respective file systems. Is this a furthering (and a validation) of this approach? Perhaps.

Where things get interesting is this approach is now being adopted in scale-out solutions. Traditionally this approach has been limited to traditional dual-controller architectures. Many times such architectures are active-passive allowing the file system to be managed by one controller at any given time. 

This was replaced by clustered file systems and now scale-out distributed file systems. So are we moving to a situation where essentially all storage will be based on some kind of file system or object store that runs on on commodity platforms/architecture? 

Will file, object and block interfaces merely become interfaces that can concurrently provide access to the same data store? Some vendors will see this is as a validation of their approach. Will that be the case?

So what this comes down to is distinguish between data layout - as a means to organising data and data access interfaces - as a means to accessing this data:

  • Data layouts can be based on any type of file system or object store. These could be based on scale-out, distributed, clustered or monolithic file systems (these terms are being loosely coupled to show that the type of file system may be inconsequential). They could even be based on frameworks such as OpenStack. I would call data layout generally as a hidden layer as in most cases the layout is not exposed to the server.

    For example a vSphere environment using iSCSI on a Nexenta vs. NFS on a NetApp may not result in any behavioral differences noticed by the hypervisor or guest OS. There is file and object convergence occuring here but that convergence will remain largely transparent to the applications. So will it be a big deal that these platforms support block as well?

  • Data interfaces is where I think there will be a lot of movement to gain parity. Today file interfaces, object interfaces and block interfaces are largely siloed. With a data layout layer that supports any kind of interfaces, there is no reason why file, object and block interfaces need separate backend architectures.

    In other words they can all be supported by a common file-and-object based system. So NFS, CIFS, CDMI, REST, WebDAV, iSCSI and even Fibre Channel could all be served from a single data platform. Wait...isn't that being done already? 

It remains to be seen if the industry as a whole acknowledges and adopts this approach. Some vendors have resisted this approach on the grounds that it is too simplistic and may not provide the layered approach for enterprises. Thus, today with the exception of one or two incumbent vendors most of this action is with smaller storage startups who are justifying the approach to offer a lower cost solution. There are still many questions to be answered such as:

  • Performance and scalability?

  • Data management services - how will they be offered?

  • Quality of services in a mixed workload environment - if everything resides on a single file system

  • How newer technologies like Hadoop are handled

  • How they integrate with OpenStack APIs and mechanisms.

IDC has already started the debate on behalf of its clients. In a recently published paper "How distributed file systems are rewriting the future of the storage ecosystem" (IDC# 236517) these very questions are explored.

Posted by Ashish Nadkarni, IDC
Enhanced by Zemanta

Find your next job with computerworld UK jobs