At the beginning of this month I attended my third Strata Data conference, my second in London. My first was in New York in 2014 after which I shared my ten takeaways. In this post, I’ll summarise my experience from Strata Data 2019, with a comparative glance at previous years.
When I first attended Strata, I was taken aback by the sheer scale of the event. Vendors and attendees swarmed the Javits Centre and the buzz in the air rang true to the 2014 industry hype. Arriving at the ExCel on the opening day this year, the event felt small. The swarms of attendees were less dense and the exhibitor stalls felt a little swallowed up by the exhibition hall. Fewer vendors and fewer attendees, probably by half, but do I have any data to back up my observation?
I asked O’Reilly to share these metrics with me but I was politely rejected on the basis of company confidentiality. Perusing the O’Reilly Conference Archive, I discerned this year’s event had around 27 exhibitors, down from 54 in 2015 and 59 in 2016. While I was unable to obtain official attendee figures for prior years, I have found reference to 1450 attendees as far back as 2012. Registered attendees for this year were around 700. So, fewer by half seems reasonable. Other analysis of the US-only events confirms this trend.
In previous years, wading through the swarms to get a decent seat at the key note sessions was usually worth the effort. The tone is typically set immediately – always a combination of excitement and opportunity – which pumped you up for a day of networking and learning. With less wading required, the tone for this year was again set immediately, but I found it to be a conciliatory, almost apologetic one.
Strata is hosted by O’Reilly and Cloudera and both have delivered cracking keynotes at previous events. This year Cloudera’s opening keynote was delivered by their Chief Marketing Officer. It began with a quasi-apology of last-minute preparation, majored on an interesting use case for a customer who sadly wasn’t available to speak on stage and finished with an uninspiring commitment of ‘making data great again’. I wasn’t really pumped by it.
I spoke to many engineers, working for clients, vendors and SIs and all corroborated my experience of on-premises spending being slowed and, in some cases, stopped completely, over cloud-first or cloud-only investment.
I spoke to one vendor who have had much success in the recent past with on-site deployments. They have now pivoted into re-engineering to be cloud-native at the expense of almost their entire European workforce. These are challenging times for those playing cloud catchup. If you’re not cloud-native, cloud-ready or cloud-able you’re going to run out of road.
Cloud migrations were the theme of a number of conversations particularly for customers approaching licensing renewals for long-standing, big-ticket software such as SAS and Oracle. Multi-cloud was also discussed as an ambition for some, although gladly not the puritanical ‘treating every cloud as any cloud’. The focus, quite rightly, was on specific workloads to specific cloud vendors with a level of acceptable lock-in.
Curtailing cloud costs was a common challenge for clients. The ability to accurately predict and optimise the cost of cloud service consumption is something that almost every cloud consuming organisation will benefit from. I see a gap in the market for intelligent software to help build and run cloud-based systems which can project and then self-limit their costs.
As the strategy of prioritised cloud investment promulgates, the immediate beneficiaries are obvious – AWS, Microsoft Azure, Google Cloud Platform and to a lesser extent IBM and Oracle. The hyper-scale cloud providers have interesting (symbiotic/fractious) relationships with vendors with whom they have competing services and I don’t envy any vendor trying to compete with these Goliaths. Although conspicuous in their absence from this year’s conference, Confluent have shown that the Davids can still compete.
Looking at the expo hall I saw fewer vendors than ever before but in the same moment, I saw AWS exhibit for the first time, and they were busy. It looked like an exercise in exhibition stall predation. Was this representative of what’s happening in the marketplace? In the main, I think so.
Ignoring those exhibitors promising good odds of winning a drone or multiple free beers, the busier stalls were staffed by AWS, Immuta and Dremio. Immuta offers some novel engineering features for non-invasive governance policies over a range of persistence data stores. Dremio buries further down the self-service stack to enable self-service data preparation for end users. I feel there is room in the market for more vendors like Immuta and Dremio to compete healthily.
The venerable Ted Dunning with an enlightening talk on streaming and microservices and Holden Karu, clad in an Apache Spark dress, sharing her Spark on Kubernetes work attracted large crowds. With the rise and rise of Kubernetes and the efficiencies it brings to sharing infrastructure and supporting containerised pipelines, I expect it only a matter of time before we see it alongside YARN and Mesos as a production-grade deployment option for Spark.
Although there were only a few talks showcasing FaaS as an effective data processing technology, I remain convinced that that this is the direction of travel for many data engineering use cases. In fact I admitted regret during our own talk not using more FaaS in one of our existing solutions.
All of the AWS sessions were busy, some were standing room only. What was evident was their increase in breadth and depth of data-specific services. Over one third of all AWS services today are data engineering or analytics focussed.
They talked-up their ever effective dog-fooding with a profile of their own Data Lake for Amazon.com. Notably now Oracle-free, their Redshift Spectrum–EMR–S3 data lake runs 600k analytics jobs per day, storing 50PB of data. But we know that scaling workloads cost-effectively is a problem AWS have solved and solved well. Where they are yet to provide a seamless experience is unified authentication and fine-grained authorisation across their data service portfolio (another point we made during our own talk).
It feels like the hype has died down, the number of vendors have inevitably reduced from an earlier maelstrom and a smaller number of players are leading the field.
The tyre-kickers are gone, the engineers are busy building data processing platforms and delivering production pipelines. Data engineering is no longer the main event. It’s now merely the pre-cursor to machine learning – the so-called secret sauce from which competitive advantage will be borne.
With the rise of cloud-first solutions, Data Engineers will need to become Cloud Data Engineers to stay relevant. The inherent distributed architecture of data processing systems should make this a small evolutionary step.
Next year, I want to see Microsoft and Google exhibiting, sharing some of their fantastic use cases at Strata, that is if Strata remains a data-only conference. Time will tell.