Audiobook Audiobook Roaring Elephant

Episode 12 – Hadoop Summit Dublin 2016 – Day 1

13/04/2016 Duración: 29min

Welcome to our special edition podcast bought to you from day 1 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the keynotes and some of the sessions we enjoyed during day 1. 00:00 Recent events Introduction to the Hadoop Summit episode for day 1 01:40 Main Topic Some comments from attendees as to what they're looking forward to at the event Conversation about the keynotes and the sessions we enjoyed 29:38 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

Episode 11 – Interview with Community Award Winner Venkatesh Sellappa

05/04/2016 Duración: 37min

Venkatesh is a new contributor to Apache NiFI and during his talk at the Hadoop Summit next week, he takes a light-hearted look at his journey of how to become a contributor to an Apache Project. Venkatesh is one of the Community Choice winners, so congratulation are in order and we are certain you will like this interview! Enjoy, and we looking forward to seeing you at the Hadoop Summit in Dublin next week! 00:00 Recent events Easter Break Big Data Analytics Big Telco workshops/meetings and sessions stuff Domain Knowledge is important 05:40 Main Topic Interview with Venkatesh Sellappa 33:50 Questions from our Listeners: No questions this time but information on our activities during the upcoming Hadoop Summit. 37:18 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

Episode 10 – Preparing for the 2016 Hadoop Summit in Dublin

22/03/2016 Duración: 01h03min

Next month, the European Hadoop Summit will take place in Dublin. Now that the agenda for the event has been nearly finalised we take it upon ourselves to provide a virtual guide to the event. There's a lot of good things happening during the event so we share with you what sessions we think we'll be attending and why. Enjoy, and looking forward to seeing you there! This is another long episode, going over an hour for the first time. We are really curious to know if you like these longer episodes, or if you would prefer it if we kept it under the original 30 to 35 minutes? 00:00 Recent events Hands on upgrading, express vs rolling upgrade Workshop at telecom company in Russia Nifi workshops Securing a Hadoop cluster 08:00 Main Topic Dave has assembled some statistics on the type of sessions available. What sessions we would attend and why. http://hadoopsummit.org/dublin/agenda/ General advice to visitors mixed in... 54:30 Questions from our Listeners: What else is going o

Escucha

Episode 9 – SQL in Hadoop

08/03/2016 Duración: 53min

SQL was one of the first data access methods added to vanilla Hadoop. Considering that the many of the people working with Hadoop in the early days came from a database background, this is not surprising. Since then, the SQL ecosystem in Hadoop has grown considerably and in this episode we do a general overview of many of the available choices.This episode runs a bit longer than normal but we hope you'll find it worthwhile! 00:00 Recent events Spark masterclasses NiFi on trains Mifid II and the active archive World Mobile Congress 08:30 Main Topic SQL solutions: Apache Hive https://hive.apache.org/ Apache Spark Sql http://spark.apache.org/sql/ Apache Phoenix https://phoenix.apache.org/ Apache Impala (incubating) https://www.cloudera.com/products/apache-hadoop/impala.html Apache Hawq (incubating) http://hawq.incubator.apache.org/ Apache Drill https://drill.apache.org/ Presto https://prestodb.io/ Oracle Big Data Sql http://www.oracle.com/us/

Escucha

Episode 8 – NiFi Deeper Dive

23/02/2016 Duración: 47min

In this episode we'll go into more depth on NiFi complete with our second interview with Joe Witt, Senior Director of Engineering at Hortonworks who dives into how NiFi works under the covers and some considerations to think about when using it for real. 00:00 Recent events New logo for the podcast Hadoop use in telecom Spark masterclass details Apache Nifi "Hype Train" concerns 09:14 Main Topic Second interview with Joe Witt: a deeper dive on Apache NiFi 35:30 Questions from our Listeners: I have already implemented some of my ingest in flume/kafka/storm, do I need to replace that with NiFi? Is it true there is no chance of data loss with NiFi? Can I aggregate or combine data as part of the flow process? Do I need a hadoop cluster to use NiFi? 47:18 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

Episode 7 – An introduction to Data Ingest

09/02/2016 Duración: 37min

In this episode we'll cover some of the most common options for ingesting data into Hadoop including technologies like Flume, Sqoop, Kafka, NiFi and more. 00:00 Recent events Upcoming masterclasses on NiFi and Spark NiFi deployment on trains Podcast publicizing Global Systems Integrator training day 06:40 Main Topic Apache Sqoop Apache Flume Apache Kafka Apache NiFi Other Low level ingest methods 28:00 Questions from our Listeners: I want to transform the data to it’s final form before it lands in the Hadoop cluster. Which ingest tool should I use? What about XYZ vendors “hadoop loader/ingest” tool ? Do all these tools run on my hadoop nodes? How does lambda architecture fit with data ingest? 37:15 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

Episode 6 – An introduction to NiFi

26/01/2016 Duración: 30min

In this episode we'll cover some an introduction to NiFi complete with an interview with Joe Witt, Senior Director of Engineering at Hortonworks who explains exactly where NiFi came from and how it fits into your Big Data plans. 00:00 Recent events The usual "Start of the Year" meetings and events Using Apache NiFi as a self documenting deployment system We are now available on iTunes 04:50 Main Topic Interview with Joe Witt, one of the creators of Apache NiFi and currently Director of Engineering for HDF at Hortonworks. 22:40 Questions from our Listeners: Is NiFi really as easy to use as it looks? Is NiFi a part of Hadoop now? >How do I get started with NiFi? Is NiFi an ETL tool? 30:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

Episode 5 – An introduction to Spark

12/01/2016 Duración: 37min

In this episode we'll cover the basics of Apache Spark, including typical deployment situations, architecture and usage. 00:00 Recent events Seasons Greetings! Jhon shamelessly plugs his mini cluster build Apache Mesos Amazon IoT solution 05:28 Main Topic Who would use Apache Spark, why would you use it, where would you use it Apache Spark Architecture Apache Spark Components Apache Spark MLlib Apache Spark gotcha's Typical use cases for Apache Spark 28:20 Questions from our Listeners: What happens if all my data does not fit in memory? What is the security like for Spark? Why Spark on Hadoop instead of standalone Python, Scala, Java or something else for Spark? Can I access data on HDFS or local disk from my Spark script? 37:50 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

Episode 4 – Hadoop: Year in review

29/12/2015 Duración: 38min

A bit of Hadoop history of what we have seen happening over the last 12 months, some trends and interesting technologies. Some ups, some downs and possibly even some round and rounds, capped off with some Bold Predictions for 2016. 00:00 Recent events A number of engagements Apache Nifi Why some Hadoop users decide to go for separate clusters per use case or (internal) client 06:00 Main Topic A broad acceptance of Hadoop in Europe A shift from batch workload to multi-tenant, secure platform including IoT and Real time, in memory analytic. Apache Ambari making our life easier all the time Data Governance Initiative Open Data Initiative (http://odpi.org) Public clouds offer Big Data specific environment Tech advances in Hive (CBO/ORC/Zlib) and Transparent Encryption in HDFS Apache NiFi The year of Apache "open community" open source Bold Predictions! 31:00 Questions from our Listeners: What new (incubating) projects should I invest time in today, knowing that they may ne

Escucha

Episode 3 – High level Hadoop architectures

15/12/2015 Duración: 37min

What are the hardware and implementation options we see.A discussion ranging from direct attached storage versus network attached storage/storage area networks, to on-premise hardware versus cloud options. 00:00 Recent events Organisations starting their Big Data Journey A lessons learned workshop for a customer after their successful pilot Planning Masterclasses for 2016 Migration customer workshop Big Data and the Connected Car webinar (registration required) 07:30 Main Topic Direct attached storage (DAS) or “traditional” hadoop Network attached storage (NAS) / Storage Area Networks (SAN) Cloud / Azure / AWS / Google Cloud / Openstack etc... SaaS/PaaS/HaaS/HDInsight Ceph & Gluster ObjectStore(S3) and Other cloud storages 25:30 Questions from our Listeners: Doesn’t having a SAN/NAS system break data locality? Can I mix drive sizes and types within a cluster or even within the same node? Hybrid cluster environments, how to mix cloud and on premise deployment?

Escucha

Episode 2 – How to avoid disaster

01/12/2015 Duración: 43min

When you are getting started with your journey with Hadoop, how to avoid Hadoop disaster? We have seen many people going through this journey and both of us have seen things people do that makes the project successful, and things people do that make projects more difficult than they should be. 00:00 Recent events Customer pilot completion SQL on Hadoop Masterclasses Multi-tenant Spark notebook issues Spark recommendation engine webinar 11:00 Main Topic Starting too small Baseline and benchmark Config management Backup and/or disaster recovery Leaving security too late 36:00 Questions from our Listeners: Where do I find data scientists? Storage options? Install everything? 43:37 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

Episode 1 – A new beginning: Getting started in Hadoop

18/11/2015 Duración: 36min

With all the buzz around big data generally, and Hadoop specifically, there's never been a better time for getting started in Hadoop. This episode covers how your two hosts got involved in Hadoop, and also discusses some of the other popular paths into the world of BigData/Hadoop 00:00 Recent events How did your hosts get into Hadoop 04:30 main Topic Driven by individuals vs organisations Online education options Formal training 19:20 With Questions from our Listeners: Isn’t it really difficult? Do you need to know Java? Do you need to know SQL? Will I need to throw everything else in my datacentre out? Can I replace my EDW (Enterprise Data Warehouse)? Do I have to re-write all my ETL (Extract-Transform-Load)? 36:05 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha