Sinopsis
Bite-Sized Big Data
Episodios
-
Episode 52 – Big data in travel
12/09/2017 Duración: 01h16minOver the summer, when your hosts enjoyed a well-earned vacation (well, we like to think we earned it) we could not stop being Big-Data Nerds and in this episode we talk about the Hadoop opportunities we spotted. During this episode you will hear us talk about how Big data does, could or should improve many aspects of vacationing. We talk about review sites, preventive maintenance on rental cars, IoT tracking beer levels, the social media privacy issues and much, much more. We really tried to make this a "new-style" short episode, but clearly, we still need some training... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 51 – Roaring News
05/09/2017 Duración: 38minIn this news episode (our very first one), Dave is all-out on Artificial Intelligence and its use in naming "stuff"; for some subjects it apparently works very well, for other subjects not so much... Jhon brings a blog on deploying new Kerberos functionality and a tutorial for Kafka Connect for those that have not really looked at it. The ensuing discussion on Nifi vs kafka is purely coincidental. Dave AI naming Paint (May 2017) http://lewisandquark.tumblr.com/post/160776374467/new-paint-colors-invented-by-neural-network https://arstechnica.co.uk/information-technology/2017/05/ai-paint-colour-names/ Guinea Pigs (June 2017) http://gizmodo.com/this-is-what-happens-when-you-teach-an-ai-to-name-guine-1796172891 Improved Paint (July 2017) https://arstechnica.co.uk/information-technology/2017/07/ai-paint-colours-reprogrammed/ British sounding place names (July 2017) http://www.telegraph.co.uk/technology/2017/07/20/ai-trained-generate-incredibly-british-place-names/ Bee
-
Episode 50 – Alan Gates Wrap Up (Part 4)
29/08/2017 Duración: 34minThis is the final part of our long interview with Alan Gates. In this part, Alan talks more about ODPI, Cloud First, Apache Flink, Apache Pig and we finish off with a little bit of Philosophy. A big thank you to Alan for sharing his pearls of wisdom with us! [Image from Linux.com] 00:00 Recent events Our vacation is almost over but this episode too was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about 02:10 Alan Gates Wrap Up (Part 4) 34:37 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 49 – Thomas Henson on IoT architectures
15/08/2017 Duración: 52minIn this episode we have an interview with Thomas Henson for you. Thomas is an Isilon Data Lake Evangelist at Dell/EMC, but in this episode he will talk about IoT architectures, related to his talk at the DataWorks Summit San Jose 2017 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:14 Thomas Henson on IoT architectures You can find Thomas Hensons blog on Big Data at https://www.thomashenson.com/ 52:45 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 48 – Alan Gates on the DataWorks Summit (Part 3)
01/08/2017 Duración: 35minIn this third part of our interview with Alan Gates, PMC member for various Apache projects including Apache Hive and co-founder of Hortonworks, we talk about his sessions at the DataWorks Summits and about the Summits in general. [Image taken from Linux.com] 00:00 Recent events Since both Dave and Jhon are still on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 02:38 Alan Gates on the DataWorks Summit (Part 3) Since this part of the interview goes public after the San Jose Summit, it is too late to submit abstracts for that particular summit. However, the Australian version is in a couple of months so please go to the DataWorks website for more information about that one. 35:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 47 – Deep dive into Kudu
18/07/2017 Duración: 01h11minWe've been interested in Kudu for a while. But it's something that neither of your hosts have been exposed to very much. Apache Kudu went from incubation to top level project in record time and now seemed like the time was right to dig into this piece of antelope. Mike Percy, PMC member and committer on the Apache Kudu project and software engineer at Cloudera was only too glad to come on the podcast and answer all our questions! 00:00 Recent events Since both Dave and Jhon are currently on vacation, this episode was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about. 01:40 Deep dive into Kudu Special guest today is Mike Percy, PMC member and committer on the Apache Kudu project. 01:11:54 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 46 – San Jose DataWorks Summit 2017 in Review
04/07/2017 Duración: 01h54minDave joined our free ticket raffle winner Pitt at the Data Works Summit in Sunny San Jose last month and they came back with almost two hours worth of exciting stories! Thanks again to Hortonworks for providing the free ticket to our raffle that Pitt won. San Jose DataWorks Summit 2017 in Review 00:01:20 Keynotes 00:31:20 Day 1 sessions 01:10:00 Day 2&3 sessions 01:54:55 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 45 – Modern Day Airships
20/06/2017 Duración: 01h09minBreaking up our series of insights from Alan Gates, we switch gears to another really interesting topic (and guest!) where we talk about the new visualisation features coming in Apache Zeppelin and we get it straight from the brains behind the new code, Bernhard Walter. Recent events 03:03 Jhon: Churn Prediction with Apache Spark Machine Learning by Carol McDonald (@caroljmcdonald) @mapr https://mapr.com/blog/churn-prediction-sparkml/ 12:12 Dave: HDFS Maintenance State by Manoj Govindassamy @cloudera https://blog.cloudera.com/blog/2017/05/hdfs-maintenance-state/ https://issues.apache.org/jira/browse/HDFS-7877 https://issues.apache.org/jira/browse/HDFS-6729 https://issues.apache.org/jira/browse/HDFS-7541 30:50 Modern Day Airships Bernhard Walter talks about the new visualisation options in Zeppelin with some of the what, why and how. 01:09:00 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you woul
-
Episode 44 – Suicidal Spark
06/06/2017 Duración: 01h11minIn this episode we're joined by Youen Chéné and Aurélien Vandel from Saagie who talk to us about their experiences deploying Spark Streaming workloads in production (based on their Dataworks Summit talk), what worked well, what didn't and what they'd recommend you might want to do if you follow in their footsteps. Enjoy! 00:00 Recent events Dave Big Data Videos http://www.kdnuggets.com/2017/05/top-recent-big-data-videos-youtube.html https://www.youtube.com/watch?v=RQ9czRAdmMs https://www.youtube.com/watch?v=hsoKlE67rTw Jhon InsightOut: The role of Apache Atlas in the open metadata ecosystem http://www.ibmbigdatahub.com/blog/insightout-role-apache-atlas-open-metadata-ecosystem https://www.youtube.com/watch?v=yQvmoDtGgbo Apache Atlas API Version 2 https://atlas.incubator.apache.org/api/v2/index.html Cloud giants 'ran out' of fast GPUs for AI boffins https://www.theregister.co.uk/2017/05/22/cloud_providers_ai_researchers/ Benchmark: Sub-Second Analytics with
-
Episode 43 – Alan Gates talks Hive (Part 2)
23/05/2017 Duración: 54minIn this episode we discuss the maturity of the Hadoop ecosystem and how hard it currently still is to get the value out of data. In the main section, we will have the second part of the interview with Alan Gates, this time talking about the place Hive has in the ecosystem. We still have more from Alan so stay tuned for more Hive goodness in future episodes! 00:00 Recent events Dave PredictionIO 0.11 release https://github.com/apache/incubator-predictionio/blob/v0.11.0-incubating/RELEASE.md http://predictionio.incubator.apache.org/ http://predictionio.incubator.apache.org/start/ http://predictionio.incubator.apache.org/system/ http://predictionio.incubator.apache.org/gallery/template-gallery/ https://techcrunch.com/2016/02/19/salesforce-acquires-predictionio-to-build-up-its-machine-learning-muscle/ Jhon Ultra-fast OLAP Analytics with Apache Hive and Druid – Part 1 of 3 https://hortonworks.com/blog/apache-hive-druid-part-1-3/ Why Big Data Hasn’t Yet Made a Dent on Farms
-
Episode 42 – Alan Gates talks Hive (Part 1)
09/05/2017 Duración: 01h04minWelcome to the life the universe and everything episode of the Roaring Elephant Podcast. We talk some news and this episode got a little bit ranty... Apologies for that; to balance it out we have a chat with Alan Gates talking about Hive for you. There was so much Alan Gates goodness, we've split it over a few sessions and here's part one... 07:00 Recent events Dave Metron graduates to Apache TLP status https://blogs.apache.org/foundation/entry/apache-software-foundation-announces-apache https://hortonworks.com/blog/congratulations-apache-metron-tlp/ 2017 Big Data Landscape https://www.linkedin.com/pulse/firing-all-cylinders-2017-big-data-landscape-matt-turck You’re doing Hadoop and Spark wrong and they will probably fail https://www.theregister.co.uk/2017/02/21/hadoop_and_spark_risks_and_opportunities/ Jhon Apache Impala Leads Traditional Analytic Database http://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/ Cloudera Data Science
-
Episode 41 – News, news and some more news
25/04/2017 Duración: 33minIn this episode, due to us blowing our recording space budget with the Dataworks Summit day by day episodes (39 and 40 if you've not listened yet, go and do so!) we're just bringing you a short episode this time with news, all the news that's new and approved by the Roaring Elephants! 05:10 Recent events Superset: benefits and limitations of the open source data visualization tool by Airbnb https://indatalabs.com/blog/data-strategy/open-source-data-visualization-tool-superset http://airbnb.io/superset/index.html Even artificial intelligence can acquire biases against race and gender http://www.sciencemag.org/news/2017/04/even-artificial-intelligence-can-acquire-biases-against-race-and-gender Building a cognitive data lake with ODPi-compliant Hadoop http://www.ibmbigdatahub.com/blog/building-cognitive-data-lake-odpi-compliant-hadoop Top 5 Performance Boosters with Apache Hive LLAP https://hortonworks.com/blog/top-5-performance-boosters-with-apache-hive-llap/ Integrate SparkR and
-
Episode 40 – Dataworks Summit Europe – Day 2
06/04/2017 Duración: 01h07minIn this episode of the Roaring Elephant podcast, Dave and I continue to share our Dataworks summit experience, meet yet more listeners, sit in on a few more sessions and give our overall view of the day and the summit as a whole! It will make you wish you were here. 00:00:00 Intro Roaring Elephant Roadshow Day 2 - The night after the party! 00:04:14 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote Meet HBase 2.0 Bridle your Flying Islands and Castles in the Sky HBase in Practice Solving Cyber at Scale Achieving Realtime Ingestion and Analysis of Security Events through Kafka and Metron Row/Column-Level Security in SQL for Apache Spark Apache Kafka Best Practices Mool - Automated Log Analysis using Data Science and ML Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark Backup and Disaster Recovery in Hadoop 01:02:15 Wrap up Some final overall observations and lo
-
Episode 39 – Dataworks Summit Europe – Day 1
05/04/2017 Duración: 01h34minIn this episode of the Roaring Elephant podcast, Dave and I attend the Dataworks summit, meet listeners, sit in on sessions and give our overall view of the day! It's the next best thing to being here. If you ARE here, then look out for us, we'll exchange limited edition Roaring Elephant stickers for audio clips. 00:00 Intro Roaring Elephant Roadshow Day 1- Direct from Munich! 03:25 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote An Apache Hive Based Data Warehouse Interactive Analytics at Scale in Apache Hive using Druid Hadoop 3.0 in a Nutshell Running Services on YARN Streamline - Stream Analytics for Everyone (AKA SAM: Streaming Analytics Manager) Apache Atlas: Governance for your Data File Format Benchmark - Avro, JSON, ORC and Parquet An Approach for Multi-Tenancy through Apache Knox 01:27:00 Wrap up Some final overall observations and looking forward to day 2! 01:34:31 End Please
-
Episode 38 – Dataworks Summit 2017 – Preview
28/03/2017 Duración: 01h42minThis week, your hosts go over what we consider to be our pick of the sessions that will be presented during the Hadoop Summit Dataworks Summit in Munich next week. The Roaring Elephant will be in attendance, look out for the two guys in distinctive yellow fleeces with the Roaring Elephant logo on the back, we hope to see you there! 00:00 Recent events Dave DS Model Lifecycle https://www.svds.com/models-lab-factory/ Stitchfix Algorithm Tour http://algorithms-tour.stitchfix.com/ Cloudera Data Science Workbench http://vision.cloudera.com/cloudera-data-science-workbench-self-service-data-science-for-the-enterprise/ http://www.dbms2.com/2017/03/19/cloudera-data-science-workbench/ Jhon Yarn 3 Data Lake 3.0: The EZ button to deploy in minutes and cut TCO by half https://hortonworks.com/blog/data-lake-3-0-deploy-minutes-cut-tco-half/ Data Lake 3.0 Part 2 – A multi colored YARN https://hortonworks.com/blog/data-lake-3-0-part-2-multi-colored-yarn/ Data Lake 3.0 Pa
-
Episode 37 – Big Data Roles: The starter
14/03/2017 Duración: 01h22minIn this episode, we start a new series on the different roles in Big Data. Purely by coincidence, it turns out that the winner of our raffle started a new job as a Data Engineer at the beginning of this month, so naturally we decided to invite Marcel-Jan on the show to talk about the how and why of his career move. 00:00 Recent events Dave It’s morphing time: Apache Ranger graduates to a Top Level Project https://hortonworks.com/blog/morphing-time-apache-ranger-graduates-top-level-project-part-1/ https://hortonworks.com/blog/morphing-time-apache-ranger-graduates-top-level-project-part-2/ Data-Driven User Engagement https://www.svds.com/data-driven-user-engagement/ Driving Product Engagement with User Behaviour Analytics https://www.svds.com/driving-product-engagement-user-behavior-analytics/ Jhon Using Apache Spark for large-scale language model training https://code.facebook.com/posts/678403995666478/using-apache-spark-for-large-scale-language-model-training/ Big d
-
Episode 36 – Use-case: Single View
28/02/2017 Duración: 01h02minNo guests today, just Dave and Jhon talking so brace yourselves! This time we're actually going to explain what we mean by "single view of customer" go through explaining an example of a use-case and discuss how you might implement such a thing. Enjoy. 00:00 Recent events Dave Faster spark! http://www.zdnet.com/article/spark-gets-faster-for-streaming-analytics/ If you’re interested in reading/watching more then check out the site for Spark Summit East, the sessions slides and videos appear to all be live now https://spark-summit.org/east-2016/schedule/ Getting Started with Deep Learning/Speech Recognition http://www.svds.com/getting-started-deep-learning/ http://svds.com/open-source-toolkits-speech-recognition/ Data Driven Depression http://rcharlie.com/2017-02-16-fitteR-happieR/ http://blog.revolutionanalytics.com/2017/02/finding-radioheads-most-depressing-song-with-r.html Jhon IoT Calamity: the Panda Monium http://www.verizonenterprise.com/resources/repor
-
Episode 35 – What do people get wrong when deploying Hadoop? – Part 2
14/02/2017 Duración: 01h12minPaul Codding and Sheetal Dolas, both from Hortonworks, join us in this second part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave TensorKart: self-driving MarioKart with TensorFlow http://kevinhughes.ca/blog/tensor-kart What is Data Engineering? https://www.dataquest.io/blog/what-is-a-data-engineer/ Jhon Machine Learning is Fun (parts 1-6) https://medium.com/@ageitgey/machine-learning-is-fun-part-6-how-to-do-speech-recognition-with-deep-learning-28293c162f7a#.vv1lh5755 Performance comparison of different file formats and storage engines in the Hadoop ecosystem https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines How to write code using the Spark Dataframe API: a focus on composability and testing https://blog.godatadr
-
Episode 34 – What do people get wrong when deploying Hadoop? – Part 1
31/01/2017 Duración: 01h45sPaul Codding and Sheetal Dolas, both from Hortonworks, join us in this first part of a two part episode where they share their experience with what can go wrong when Hadoop is deployed. Listen to the tips and tricks these gentlemen share and double the throughput for your cluster. 00:00 Recent events Dave Apache Beam becomes a top level project! https://beam.apache.org/ https://beam.apache.org/get-started/beam-overview/ https://github.com/eljefe6a/beamexample/blob/master/BeamTutorial/slides.pdf https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective Four Types of Data Analytics http://insights.principa.co.za/4-types-of-data-analytics-descriptive-diagnostic-predictive-prescriptive MapR claims open source victory with patent http://www.cbronline.com/news/verticals/cio-agenda/mapr-claims-open-source-big-data-victory-patent-award/ Jhon Ransomware attacks on insecure Hadoop systems may be next, say security researchers http://www.itworldc
-
Episode 33 – Roaring News
17/01/2017 Duración: 50minThis episode, we have an absolutely brilliant topic that we were going to cover after the news section... But the news section has us talking so much that it ran a bit long. Preferring not to give you a two hour episode, we're rescheduling the delivery of the intended topic to next episode and present you with our first (and probably last) "News only" episode. 00:00 Recent events Dave A pair of “trends to watch in 2017” http://www.techrepublic.com/article/6-big-data-trends-to-watch-in-2017/ http://www.datamation.com/applications/5-big-data-predictions-for-2017.html Learning from a Year of Security Breaches https://medium.com/starting-up-security/learning-from-a-year-of-security-breaches-ed036ea05d9b#.4r22rbfjh Failing to monetise your apps, big data can help http://www.techrepublic.com/article/failing-to-monetize-your-apps-big-data-can-help/ A Perfect Illustration of the Big Data Value Chain http://www.techrepublic.com/article/a-perfect-illustration-of-how-the-big-data