Strata and Hadoop World 2016
Date posted
6 April 2016
Reading time
9 Minutes
Strata and Hadoop World 2016
Strata and Hadoop World is the world's biggest and best conference on all aspects of the data economy. I had the pleasure of attending this year's event in San Jose, and below you can find my thoughts on the major conference themes.
The primary component underpinning most approaches to this problem is Apache Kafka.
Kafka can best be described as a pub-sub message queue, but combined with reliable storage and the capability to massively scale out. You can run Kafka on a single machine, or you can run a cluster at web-scale.
My favourite talk of the conference was delivered by Alex Silva from Pluralsight, describing the architecture for 'Project Hydra'.
Pluralsight have developed a micro service architecture to ingest data from numerous elements of their business and store the information as Kafka data streams. This replaces the traditional message queue (e.g. RabbitMQ) as the "dumb pipe" in the micro-service equation, with support for fast, advanced analytics baked in.
Hydra should provide a fine case study for anyone wishing to develop a streaming data platform, and the project itself is soon to be open sourced.
Hadoop continues to mature
With each passing year, it becomes more difficult to pin down exactly what anyone means when they say 'Hadoop'. Like the term "big data" with which it is synonymous, Hadoop has come to stand for the explosion in data storage and processing technologies, which have moved the conversation beyond the relational database, as the default option.Hadoops evolution in the past year can really be summed up in one word: Spark, and this proved to be a major theme of the conference. Every major analytics tool vendor is either currently supporting Apache Spark as their underlying processing framework, or they have it on their roadmap.
Enterprise adoption
Hadoop adoption in the enterprise was the other major element of this story and the projects that are emerging to support it. Projects such as Cloudera's RecordService add finer grained access control and data masking to the existing Hadoop security landscape. Perhaps unsurprisingly, the project has its roots in financial services, having been developed and open sourced in collaboration with CapitolOne. As Hadoop is increasingly viewed as a place to store core information assets such as customer transactional information, the demand for controls which have long been standard in the established relational database world, are now seen as required features.Kafkaesque
The other major talking point of the conference was real-time processing and analytics on data streams.