Sinopsis
Bite-Sized Big Data
Episodios
-
Episode 32 – The sense and non-sense of certifications
03/01/2017 Duración: 50minIn this episode, we talk about the use and abuse of certifications, both the certifications you van achieve by passing an exam and the Industry ISV certifications that should help yu make purchasing decisions. 00:00 Recent events Dave 5 enterprise uses of blockchain today http://www.pcworld.com/article/3149504/cloud-computing/5-enterprise-related-things-you-can-do-with-blockchain-technology-today.html Top 7 big data trends for 2017 https://datafloq.com/read/the-top-7-big-data-trends-for-2017/2493 How to discover the hidden value in your customer journey https://www.linkedin.com/pulse/how-discover-hidden-value-your-customer-journey-ronald-van-loon Jhon Achieving a 300% speedup in ETL with Apache Spark http://blog.cloudera.com/blog/2016/12/achieving-a-300-speedup-in-etl-with-spark/ The Rhythm of Food http://rhythm-of-food.net/ http://www.thefunctionalart.com/ Information is beautiful awards http://www.informationisbeautifulawards.com/news/188-2016-the-winne
-
Episode 31 – Bold Predictions, Past and Future
20/12/2016 Duración: 01h07minIn this episode, we go over the bold predictions for 2016 we made just before the start of the year. Find out how right we were, or indeed how bad we are at predicting the future of Big Data. Undeterred, we then happily put on our Nostradamus hats and proceed to make even more new bold predictions for 2017. Have a listen and let us know if you agree or disagree with our view on the world? 00:03 Bold predictions - reviewing past predictions for 2016 Apace Atlas Apache Nifi Apache Spark SQL BigInsights 28:50 Bold predictions - future predictions for 2017 Fragmentation Data breaches Chat bots Self service Big Data Snake-Oil Alert Cyber security In-Memory & GPU Apache atlas BigInsights 01:07:07 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 30 – Apache Software Foundation
06/12/2016 Duración: 01h02minSo many of the tools and projects we talk about and use every day are prefaced by 6 letters, A P A C H E... What does it mean to be an Apache project? What does the Apache Software Foundation (ASF) do for software? Are there other options? Let us tell you about the ASF! 00:00 Recent events Dave: How we caught the circle line rogue train with data https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a#.mhqs1mikx Black Friday 2016: Mobile vs Desktop User Behaviour http://appinstitute.com/black-friday-2016-mobile-vs-desktop-sales/ AI Machine Attempts to Understand Comic Books ... and Fails https://www.technologyreview.com/s/602973/ai-machine-attempts-to-understand-comic-books-and-fails/ https://arxiv.org/abs/1611.05118 https://arxiv.org/pdf/1611.05118v1.pdf Jhon: Paypal From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World Part 1 and Reactive programming manifesto http://www.reactivemanifesto.org
-
Episode 29 – 1 Year anniversary
22/11/2016 Duración: 01h04minOne year of elephants roaring has come and gone so we reminisce a little bit about what happened over the last year. And since we could not have done this podcast nearly as good without them, we asked the special guests we have had on the podcast over the previous year to call in on the Skype call and talk about what they have been up to. 00:00 One year of pod-casting... Dave and Jhon reminiscing about how the Podcast got started. 06:55 Fireside chats with guests over the year 07:56 Joe Witt, Senior Director of Engineering at Hortonworks, 22:40 Michele Lamarca, Team Lead Big Data at Bright Computing 43:00 John Mertic, Director of Program Management for ODPi 01:04:23 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 28 – Talking Datameer with Erik Stalpers
08/11/2016 Duración: 59minIn this episode, Dave is stuck in a hotel basement in the middle of internet nowhere and Erik Stalpers from Datameer joins us to talk about the Datameer exploration and visualization tool. 00:00 Recent events Dave Machine learning vs AI http://www.wired.co.uk/article/machine-learning-ai-explained Machine Learning Data Cleansing https://gcn.com/articles/2016/10/19/activeclean-big-data.aspx https://activeclean.github.io/ Battle of the Data Science Venn Diagrams http://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html http://www.prooffreader.com/2016/09/battle-of-data-science-venn-diagrams.html (original doc 21 september 2016) Jhon How Vector Space Mathematics Helps Machines Spot Sarcasm https://www.technologyreview.com/s/602639/how-vector-space-mathematics-helps-machines-spot-sarcasm/ Straight talk about big data http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/straight-talk-about-big-data 25:10 Talking Datameer with
-
Episode 27 – Security 3: Encryption at rest and in motion
25/10/2016 Duración: 57minRounding out our series on security in Hadoop, we finish with Encryption at rest and in motion. We go over the different approaches, do's and don'ts and mention some higher level application in this space. 00:00 News for the week! Dave: Executives Still Relying on Gut, Not Gigabytes in Planning for Future http://www.datadigestonline.com/2016/10/executives-still-relying-on-gut.html Rewriting SAS Programs for Financial Data Manipulation in R http://blog.revolutionanalytics.com/2016/09/rewriting-sas-in-r-for-finance.html Chris Surdak - Why so many Big Data projects fail http://surdak.com/innovation-vs-improvement/ Jhon: Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs (14-Sep-2016) http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs SQL on Hadoop benchmarks get serious (14-Oct-2016) http://www.zdnet.com/article/sql-on-hadoop-benchmarks-get-serious/ WHERE IS APACHE HIVE GOING? TO IN
-
Episode 26 – Security 2: Authorisation and audit
11/10/2016 Duración: 01h10minIn this episode, we continue our coverage on Hadoop security. Where episode 24 dealt with the subject of authentication, we now delve deeper in the why and how of authorization and audit, and cover the major players in the arena. 00:00 Recent events Dave Beyond Privacy and Security in a Connected World http://www.svds.com/beyond-privacy-security-connected-world/ The broken promise of open-source Big Data software – and what might fix it http://siliconangle.com/blog/2016/09/27/the-broken-promise-of-open-source-big-data-software-and-what-might-fix-it-2/ Meet Apache Spot, a new open source project for cybersecurity http://www.csoonline.com/article/3124497/big-data/meet-apache-spot-a-new-open-source-project-for-cybersecurity.html SMEs advised to capitalise on ‘big data’ http://www.farminglife.com/news/farming-news/smes-advised-to-capitalise-on-big-data-1-7606523 Jhon What is hardcore data science—in practice? https://www.oreilly.com/ideas/what-is-hardcore-data-scie
-
Episode 25 – The pro’s and con’s of crafting your own distribution
27/09/2016 Duración: 01h34minWhen we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own. http://lod-cloud.net/ 00:00 Recent events Dave: Which tool should I use? http://brohrer.github.io/which_tool_should_i_use.html YaRrr! - The Pirate’s guide to R Blog: http://nathanieldphillips.com/thepiratesguidetor/ YaRrr! - Download the book: https://drive.google.com/file/d/0B4udF24Yxab0S1hnZlBBTmgzM3M/view Video tutorials to go with the above: https://www.youtube.com/playlist?list=PL9tt3I41HFS9gmeZFEuNrnu_7V_NFngfJ Listener Question from Sampath from Baltimore: When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broad
-
Episode 24 – Hadoop Summit Melbourne 2016 Preview
13/09/2016 Duración: 01h07minWith Hadoop Summit Melbourne 2016 starting the day after we are recording this episode, we go over the published agenda and discuss the current state of the Big Data Technology ecosystem while we pick our favorite sessions. Wish we were there! 00:00 Recent events Dave Cloud Security Alliance release cloud and big data security guidelines http://siliconangle.com/blog/2016/08/28/the-cloud-security-alliance-publishes-its-best-practices-for-big-data-security/ https://cloudsecurityalliance.org/download/big-data-security-and- privacy-handbook/ Common Big Data Backup and Recovery myths http://www.networkworld.com/article/3113036/big-data-business-intelligence/debunking-the-most-common-big-data-backup-and-recovery-myths.html Big Data, Google, and the end of free will http://www.ft.com/cms/s/2/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c.html Jhon SuperComputing now going to hadoop style systems https://techcrunch.com/2016/05/24/crays-latest-supercomputer-runs-openstack-and-open-s
-
Episode 23 – Security in Hadoop – Authentication
30/08/2016 Duración: 01h07minIn this episode, we discuss this fortnight's interesting big data news that caught our eye and then go on to discuss the basics around authentication in Hadoop for what is the first in a series of episodes that we'll be doing over the next few months on the broad topic of security. 00:00 Recent events Dave: The new science behind customer loyalty http://insights.principa.co.za/the-new-science-behind-customer-loyalty http://insights.principa.co.za/infographic-creating-a-data-driven-customer-loyalty-strategy 5 great charts in 5 lines of R code http://blog.revolutionanalytics.com/2016/08/five-great-charts-in-5-lines-of-r-code-each.html Using big data to create value for customers, not just target them https://hbr.org/2016/08/use-big-data-to-create-value-for-customers-not-just-target-them Jhon: Linux turns 25 (25 August 1991 ) https://www.linux.com/news/linus-torvalds-reflects-25-years-linux http://web.archive.org/web/20100104211620/http://www.linux.org/people/linus_pos
-
Episode 22 – Big Data in Small Business
16/08/2016 Duración: 01h32minThe main subject in this episode features answer to a listener question we received a couple of months ago: How can big data help small businesses? What ways can small business use big data? At the moment all the talk is about big data helping enterprise firms. And we are introducing a new section which we hope you will enjoy! 00:00 Recent events Working with a new team in sunny cork, getting them up to speed Workshop with a global SI and a European tel-co about the upcoming phases of their big data journey Workshop with a customer who has been using Hadoop for a very long time, since Hadoop 0.2! Finally looking to migrate into the future Multi vendor workshop fraud analytics Object recognition and detection in images. 11:30 Our very own "New and Noteworthy" Dave http://blogs.teradata.com/international/streaming-analytics-story-many-tales/ http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A453888 http://research.ibm.com/cognitive-computing/ostp/rfi-response.s
-
Episode 21 – The Open Data Platform Initiative
02/08/2016 Duración: 59minThis episode we have an interview with John Mertic about ODPi. There has been plenty of mystery and even some controversy about ODPi which we attempt to resolve for you. Big thanks to John for giving us some of his time for this interview! Sadly, this time the Skype Gods were not with us and we experienced some drops and hitches. We tried to smooth things over as much as possible, but we were not able to achieve our usual level of quality this time. 00:00 Recent events Vacation for Dave Study for Jhon 10:40 Interview with John Mertic @ ODPi https://www.odpi.org/ John Mertic, Director of Program Management for ODPi and Open Mainframe Project Find John on twitter: @jmertic If you're not familiar with the ODPi here's a few good links to get you started and interested in the area: Links to the ODPi Specifications: https://www.odpi.org/specifications Watch an interview with Alan Gates who discusses what the ODPi is trying to do to simplify the big data world: https://www.youtube.co
-
Episode 20 – Dave’s Hadoop Summit San Jose 2016 Retrospective – Part 2
19/07/2016 Duración: 01h06minIn this second part, we discuss the sessions that Dave attended at the San Jose Hadoop Summit and we go in depth on some related topics. Since we ran over an hour with the main topic, and we did not want to make this a three-parter, we decided to forgo the questions from the audience just this one time... 00:00 Recent events Vacation tine! Edx.Org Big Data Courses 04:00 Dave's Hadoop Summit San Jose 2016 Retrospective - Part 2 Session 1: End-to-End Processing of 3.7 Million Telemetry Events per Second Using Lambda Architecture, by Saurabh Mishra @ Hortonworks and Raghavendra Nandagopal @ Symantec Talking point: Hero-culture or why nobody wants to talk about failure anymore Session 2: Top Three - Big Data Governance Issues and How Apache ATLAS resolves it for the Enterprise, by Andrew Ahn @ Hortonworks Talking point: Guaranteed Governance, who certifies the certificate? Session 3: IoT, Streaming Analytics and Machine Learning: Delivering Real-Time Intelligence With Apache NiFi,
-
Episode 19 – Dave’s Hadoop Summit San Jose 2016 Retrospective
05/07/2016 Duración: 48minDave went to the Hadoop Summit 2016 in San Jose last week and came back with a riveting tale to tell. In this first part of the Summit coverage, join me when I ask Dave all about the keynotes and the general event. Join us next episode where Dave will talk about some of the sessions he attended! 00:00 Recent events Lift and shift to IaaS Hybrid Disaster Recovery Spark & ML goodness MOOC's San Jose Hadoop Summit 09:25 Dave went to the Hadoop Summit in San Jose! Record attendance, maybe a venue change in future Sponsor exhibition area including "interesting" story The Community Corner The keynotes Hadoop is 10 years old Microsoft on Machine Learning Hadoop Assemblies Hadoop fragmentation Cyber security Car insurance premiums "to measure" Ethics session 40:55 Questions from our Listeners Beefy feedback from Kris A listener wants to know if it is worth the trip to go to the US Summit or to just go to the "local" Summit, wherever that is. Nishant would like an
-
Episode 18 – MLeap interview: Productionising Data Science – Part 2
21/06/2016 Duración: 43minIn this episode, we have the second part of the interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project where they go into more technical details and give tips on deploying MLeap in your environment. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Yet more telco security, again. RFI for european energy company followd by "the RFI rant" Metronnnnnnnnnnn Big Data Hackathon for an airline company predicting delays Preparing an IoT hackathon on predictive maintenance Spreading the word on MLeap at a couple of customers! 11:22 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Part 2 http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 35:25 Questions from o
-
Episode 17 – MLeap interview: Productionising Data Science
07/06/2016 Duración: 54minIn this episode, we have an interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode! 00:00 Recent events Machine Learning Hackathon on Azure Strata Europe Fighting with Kafka 09:30 Interview on MLeap with Hollin Wilkins and Mikhail Semeniuk Meet Hollin and Mikhail today (7-Jun-2016) at Spark Summit 2016 in San Francisco! https://spark-summit.org/2016/events/mleap-productionize-data-science-workflows-using-spark/ http://combust.ml/ http://combust.ml/blog/2016/03/30/flexible-akka-clients-and-servers-part-1.html https://github.com/TrueCar/mleap https://github.com/TrueCar/mleap-demo 40:50 Questions from our Listeners The Episode 12 mystery unraveled Nifi works well for prototyping, but what's your view on using Nifi in production in a normal D
-
Episode 16 – Interview part two with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!
24/05/2016 Duración: 46minHopefully you enjoyed the first part of our interview with Sumeet, here is part two where we go into more detail about Yahoo's use of Hadoop, with lots of interesting topics coming up including the splintering of the ecosystem, governance and much much more. 00:00 Recent events Customer and partner adventures with Apache Nifi Jhon is settling in at Microsoft but is unfortunately quite jet-lagged. 08:15 Part two of our interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo! 39:05 Questions from our Listeners Is Apache Atlas Ready for production today? 46:35 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 15 – Interview with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!
10/05/2016 Duración: 01h56sHaving met Sumeet at the Hadoop Summit we thought he'd make a great guest for the podcast, so here he is for your listening pleasure! 00:00 Recent events Louder! iTunes and the missing episode 12 Jhon's new role at Microsoft Hadoop as a Service A fortnight of SAS + Hadoop Metron teething troubles https://issues.apache.org/jira/browse/METRON-136 17:50 Interview with Sumeet Singh - Senior Director, Cloud and Big Data Platforms @ Yahoo! 42:50 Questions from our Listeners One data-lake for all workloads? Or separate clusters for each set of workloads? How large a team do I need to manage a Hadoop cluster? 1:00:56 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 14 – Hadoop Summit – Retrospective
26/04/2016 Duración: 51minAfter the last two special edition episodes where we quickly covered each Summit day in a "same-day" episode, we go over the full event in this episode, highlighting the sessions we enjoyed the most and sharing our general feelings about the 2016 Hadoop Summit in Dublin. 00:00 Recent events Summit! Sessions on youtube Meetings and planning, Apache Metron https://cwiki.apache.org/confluence/display/METRON/Metron+Wiki https://community.hortonworks.com/articles/26047/apche-metron-tp1-blog-series.html Setting up a new podcast recording "studio" 09:00 Hadoop Summit - Retrospective Summit Schedule App Hortonworks emphasising Streaming ingest using Nifi, but the other talks did not so much Summit video sessions are starting to appear online https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos Next year: Munich Day one sessions: It's not the size of your cluster, It's how you use it Big Fish - David Darden & Don Smith Unified stream and batch processing w
-
Episode 13 – Hadoop Summit Dublin 2016 – Day 2
14/04/2016 Duración: 37minWelcome to our second special edition podcast bought to you from day 2 of the Hadoop Summit. Breaking our normal fortnightly flow we're delivering a fresh new podcast at the end of each day of the Hadoop Summit. In this episode we cover our impressions of the second day of keynotes and yet more sessions that we enjoyed. 00:00 Recent events Introduction to the Hadoop Summit Dublin 2016 from day 2 01:45 Hadoop Summit 2016 Dublin Day 2 Review Keynote/Session - Yahoo! - Sumeet Singh Keynote - Information is Beautiful - David McCandless http://www.informationisbeautiful.net/ MLeap - Mihael Semeniuk (shift Technologies) Hollin Wilkins (Truecar) Admiral - Adam Morton (Admiral) and Simon Ball (Hortonworks) Hive - Alan Gates (Hortonworks) 37:47 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.