Getting Started with Big Data & Analytics (2/4)
This post picks up where my previous Focus post left off.
The Build phase will see you create the right capability and construct the minimum viable platform and pipeline to deliver insight quickly.
In order to succeed, your foray into analytics will require new skills and expertise. You will need a mix of technologists and business experts who have an appetite for learning, sharing, mentoring and inspiring. You will need individuals who can explore your data sets for new insight and those who can confirm hypotheses. You will benefit greatly from agile delivery skills across your team – focusing on customer collaboration and rapidly responding to change will increase your chances of success.
New roles have emerged ranging from the Analytics Architect to the Data Scientist which, when teamed up, create a full-spectrum skill-set for analytics delivery. Filling these roles will be challenging as demand continues to outstrip supply. It will require investment in your own people and engagement with analytics partners to guide your approach. Augmenting your team with external experts will accelerate delivery, reduce risk and expedite learning.
Only after identifying the initial use cases, profiling the required data sources and sourcing the specific skills can you proceed to design and implement your analytics platform and processing pipeline. Although your platform will comprise a range of technologies for storage, processing, querying, analytics, visualisation and workflow from multiple vendors, your initial focus must be on building a MVP – minimum viable platform.
I’ve seen many customers adopt on-demand platforms to deliver their first use cases – to seal those all important first victories. Public cloud Hadoop-as-a-Service offerings like Amazon’s Elastic MapReduce can be provisioned and seeded with sample-sized datasets in less than 30 minutes with no capital investment. If additional security controls are required, cloud providers such as Skyscape provide pan-government accredited services up to IL3. Commercial distributions of Hadoop, such as Cloudera CDH can then be installed and made available to customers using secure IaaS or PaaS delivery models.
Your initial data processing pipeline need not provide an end-to-end data engineering service. The minimum amount of data integration and modelling should be implemented to deliver the insight required from your first use cases. Pipelines can be crafted using low-level data processing frameworks like Spark or MapReduce but tools such as Trifcata greatly expedite these activities through GUI-driven pipeline creation for data preparation.
With initial success proven, you may look to build your production-level platform within your existing data centre, sustaining your on-demand environments for development or testing purposes. Co-locating your production platform with existing data stores will grant you full control over optimisation and fine-tuning and will reduce the inter-network overhead of data ingestion en masse.
Next up, we Enable broader access to what we have built.
Sign up to the Kainos newsletter