Sinopsis
Bite-Sized Big Data
Episodios
-
Episode 93 – Apache Kylin: Extreme OLAP Engine for Big Data
19/06/2018 Duración: 46minIn this episode Apache PMC member Dong Li joins us to explains how Apache Kylin can deploy Analytical OLAP cubes in your Big Data environment. http://kylin.apache.org/ Dong Li Technical Partner & Senior Architect of Kyligence (linkedin) PMC Member of Apache Kylin http://en.kyligence.io/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 92 – Roaring news
12/06/2018 Duración: 46minAnother week, another edition of Roaring Big Data News. This time, Dave talks about driving teens and Jhon takes a detailed look at an Eventbrite data pipeline article. Breaking News Dave Driver monitoring isn't just for teens; adults can benefit, too https://arstechnica.com/cars/2018/05/buicks-smart-driver-explains-why-my-gas-mileage-sucks-and-my-editors-doesnt/ Jhon Looking under the hood of the Eventbrite data pipeline! https://www.eventbrite.com/engineering/looking-under-the-hood-of-the-eventbrite-data-pipeline/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 91 – ODPi is back and better than ever!
05/06/2018 Duración: 01h08minIn this episode, we welcome back John Mertic, director of Program Management for ODPi, R Consortium, and the Open Mainframe Project. It's been almost two years since we checked in with John and the ODPi initiative and as John mentions in the interview, a lot has changed in Hadoop... ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later https://www.odpi.org/projects/data-governance-pmc https://www.odpi.org/events Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 90 – Roaring news
29/05/2018 Duración: 38minIn this weeks Roaring News episode, Dave brings up the resilience of Apache Community open source projects and plays some Doom. Jhon has some practical Apache NIFI guides and the emergence of multi modal NoSQL databases. Breaking News DataWorks Summit Berlin video recordings are up: https://www.youtube.com/user/HadoopSummit/playlists Find Dave on his Australian road-trip: http://bit.ly/aus-nz-ibm-hwx-tour Dave DataTorrent, Stream Processing Startup, Folds (Apache Apex) https://www.datanami.com/2018/05/08/datatorrent-stream-processing-startup-folds/ DOOM! https://arxiv.org/abs/1804.09154 https://www.technologyreview.com/s/611072/ai-generates-new-doom-levels-for-humans-to-play/ https://www.youtube.com/watch?v=K32FZ-tjQP4 Bonus doom news: https://www.rockpapershotgun.com/2018/03/28/dodge-fireballs-forever-in-a-neural-nets-doom-nightmare/ https://worldmodels.github.io/ Jhon Accessing Feeds from EtherDelta on Trades, Funds, Buys and Sells (Apache NiFi) https
-
Episode 89 – DataWorks Summit San Jose Agenda Review
22/05/2018 Duración: 01h12minWith the San Jose edition of the DataWorks Summit only a month away, we go over the sessions that are available in the agenda today and offer our top picks. If you're going, or if you will be watching the replays online, we hope to guide you on your selection of sessions. DataWorks Summit San Jose 2018 And here is the dashboard we created with statistics on the San Jose sessions, for your enjoyment: https://aka.ms/DWS2018SJ The agenda is still in flux so we will be updating the dashboard regularly. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 88 – Roaring News
15/05/2018 Duración: 35minReturning to our more regular schedule, we have a Roaring News episode today. Dave has articles on multi-cloud readiness, Big Data being a pariah, and Google Duplex and Jhon came up with Synthetic data, data engineers and scientists and a Neural Network sharing cake recipes. Breaking News Dave Less than 10% ready for multi cloud http://www.cloudpro.co.uk/cloud-essentials/hybrid-cloud/7451/idc-less-than-10-of-organisations-are-ready-for-multi-cloud Tech companies distancing themselves from Big Data https://qz.com/1262102/tech-companies-are-distancing-themselves-from-big-data/ Google Duplex https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html Jhon The Rise of Synthetic Data to Help Developers Create and Train AI Algorithms Quickly and Affordably https://insidebigdata.com/2018/05/08/rise-synthetic-data-help-developers-create-train-ai-algorithms-quickly-affordably/ Data engineers vs. data scientists https://www.oreilly.com/ideas/data-enginee
-
Episode 87 – Druid: a high-performance, column-oriented, distributed data store – part 2
08/05/2018 Duración: 31minThis is the second part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 86 – Druid: a high-performance, column-oriented, distributed data store – part 1
01/05/2018 Duración: 31minThis is the first part of an interview with Fangjin Yang, co-founder and CEO at Imply and committer/PMC member for the Druid project. Druid: a high-performance, column-oriented, distributed data store which has entered the Hadoop environment with the recent integration with Apache and we since Druid has been around for a while, we are grateful to FJ for spending some time with our listeners. Fangjin Yang Cofounder and CEO at Imply (linkedin) Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 85 – DataWorks Summit Community Showcase Exhibitor Soundbites
24/04/2018 Duración: 30minThis is the final part of our coverage of the DataWorks Summit Berlin 2018. Normally we would not have had an episode this week, since we were in Berlin last week, but we had lightning interviews with the vendors in the Community Expo Are and used that coverage to make this episode. So less of "Dave & Jhon" and more "ecosystem tech" snippets this time. Even though this does stray a bit from our usual content, we still hope it is useful. This was recorded in a hotel room and on the expo floor so the audio quality is not up to our usual standards, we hope you’ll forgive us! Here is a timestamped list of the lightning interviews: 02:41 Hortonworks https://hortonworks.com/ 06:28 Alation https://alation.com/ 08:45 Arcadia Data https://www.arcadiadata.com/ 11:12 Attunity https://www.attunity.com/ 13:10 BlueMetrix https://www.bluemetrix.com/ 15:27 BMW https://www.bmw.com 18:04 IBM https://www.ibm.com 19:54 Microsoft https://www.microsoft.com 22:15 Nutanix https://www.nutanix.com/ 23:26
-
Episode 84 – DataWorks Summit Berlin – Day 2 Recap
19/04/2018 Duración: 01h30minAnd with the end of day two of the 2018 DataWorks Summit in Berlin comes the end of this years Europe Summit. But never fear, we have an extra 90 minutes of DataWorks goodness for you to consume on your way home. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 83 – DataWorks Summit Berlin – Day 1 Recap
18/04/2018 Duración: 01h23minAnother year, another European Dataworks Summit, and yes, another daily recap show from Jhon and Dave. We walk through the keynotes and sessions we attended and give our thoughts and views. This should be useful for anyone who wasn't able to attend or those seeking to peek into sessions they couldn't make. No real editing on this one, recording in a hotel room so audio quality may not be up to our usual standards, we hope you'll forgive us! Enjoy! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 82 – DataWorks Summit Berlin 2018 Preview
10/04/2018 Duración: 47minNext week is DataWorks Summit Berlin week! Your two hosts will be in attendance and in this episode we go over the agenda and plan which sessions we want to attend and why. Peppered throughout we add further insights and experiences from previous years. Unfortunately, Dave's network was a little unstable and there are a couple audio glitches in this episode. For some session statistics or if you can use some help deciding what sessions you want to attend, you can use the dashboard we created: Click the screenshot above or go to http://aka.ms/DWS2018 to access the dashboard. It is a dynamic report: clicking on graph elements (bars of pie slices) will apply filters on all the visualizations and the session list. Use control-click to combine filters. At some point the dashboard will dissapear because it is no longer relevant. for future reference, here is a large version of the screenshot. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future ep
-
Episode 81 – Roaring News
03/04/2018 Duración: 26minIn this installment of Big Data News, we talk about the recent Facebook leak, how everybody is still doing it wrong (according to some at least) and installing Hadoop "the old-fashioned way". Also briefly covered is Elastic's X-Pack, now even more "open" than before, but still rather closed it would seem. Breaking News Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 80 – Big Data Tracking
27/03/2018 Duración: 51minLast June, Wolfie Christl published a 93 page report Corporate Surveillance in Everyday Life using big data tracking. Apart from the massive pdf that can be downloaded on the net, an extensive summary can be found on the Cracked Labs website. In this episode we go over the content and give our views on the subject. If you want to follow along with us while we are discussing the different point in the onlin earticle, here is the link: http://crackedlabs.org/en/corporate-surveillance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 79 – Roaring News
20/03/2018 Duración: 37minAnother Big Data news episode! This time we consider the Big or small nodes conundrum based on an article that after close scrutiny doesn't really seem to test the real issue. Other things that get covered are Linkedin's Dynanometer, Cloudera's full production architecture advise for a recommendation service and a really interesting visualization technique based on blobs. Breaking News Big Data, Small Nodes https://insidebigdata.com/2018/02/22/make-sense-big-data-small-nodes/ Dynamometer Release https://github.com/linkedin/dynamometer https://venturebeat.com/2018/02/08/linkedin-open-sources-dynamometer-for-hadoop-performance-testing-at-scale/ Cisco IoT predictions Aka someone somewhere trots out the old “data is the new oil” trope for one more circuit, please please please stop? https://www.networkworld.com/article/3257769/internet-of-things/7-transportation-iot-predictions-from-cisco.html Production Recommendation Systems with Cloudera http://blog.cloudera.com/blog/2018/02
-
Episode 78 – Apache Trafodion transactional SQL for Hadoop (Part 2)
13/03/2018 Duración: 01h04minThis episode, a group of people from Esgyn join us to talk about the Apache Trafodion transactional SQL for Hadoop database engine. In this second part Rohit, Ken and Rao talk about the internal workings and best practices of Apache Trafodion. Rohit Jain Chief Technology Officer (linkedin) https://esgyn.com Ken Holt Chief Operating Officer and Co-Founder (linkedin) https://esgyn.com Rao Kakarlamudi VP of Pre-sales & Principal Architect (linkedin) https://esgyn.com In Search of Database Nirvana (oreilly) By Rohit Jain Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 77 – Roaring News
06/03/2018 Duración: 47minAnother Roaring News wpisode where we cover recent Big Data News items we found interesting. This time we talk about Open Source turning 20 years old, the annoyances that come with Smart Homes and a big data device in Germany. Additionally, we talk about some introductory guides to AI. Breaking News 20 years of open source + who contributes http://www.zdnet.com/article/open-source-turns-20/ https://www.infoworld.com/article/3253948/open-source-tools/who-really-contributes-to-open-source.html Smart home living is annoying as hell https://gizmodo.com/the-house-that-spied-on-me-1822429852 Big Data Divide https://www.politico.eu/article/to-protect-or-collect-germanys-big-data-divide/ The Art of Learning Data Science https://medium.com/@aparnack/the-art-of-learning-data-science-65b9f703f932 The Long Road To Become a Big Data Scientist - Infographic https://medium.com/@aparnack/sequel-to-the-art-of-learning-data-science-cb2e1f078e5a An executive’s guide to AI http
-
Episode 76 – Apache Trafodion transactional SQL for Hadoop (Part 1)
27/02/2018 Duración: 45minThis episode, a group of people from Esgyn join us to talk about the Apache Trafodion transactional SQL for Hadoop database engine. In this first part Rohit, Ken and Rao talk about the history and goals behind the Apache Trafodion. Rohit Jain Chief Technology Officer (linkedin) https://esgyn.com Ken Holt Chief Operating Officer and Co-Founder (linkedin) https://esgyn.com Rao Kakarlamudi VP of Pre-sales & Principal Architect (linkedin) https://esgyn.com In Search of Database Nirvana (oreilly) By Rohit Jain Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 75 – Roaring News
20/02/2018 Duración: 32minIn this Big Data News episode, we discuss the 5 year aniversary of Hadoop Weekly, now Data Engineering Weekly, the Strava "data leak" and Twitter Wars, may the data be with you! Breaking News Five Years of Hadoop Weekly (Joe Crobak @joecrobak @Medium) https://medium.com/@joecrobak/five-years-of-hadoop-weekly-7aa8994f140b https://dataengweekly.com/ https://www.hadoopweekly.com/ How Strava's "anonymized" fitness tracking data spilled government secrets ([Nathan Ruser @Nrg8000] @zackwhittaker @ZDNet) http://www.zdnet.com/article/strava-anonymized-fitness-tracking-data-government-opsec/ http://www.abc.net.au/news/science/2018-01-29/strava-heat-map-shows-military-bases-and-supply-routes/9369490 Tweet Wars - The last data point (@basecamp_ai) http://www.knoyd.com/blog/the-last-data-point Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 74 – Hadoop sizing part 3: Compute sizing
13/02/2018 Duración: 49minAs promised, in this final part of our Hadoop Sizing series, we round off the subject with sizing your compute and network resources. Undoubtedly we'll be revisiting this subject in the future, but the three parts of this series should give ample information on the subject for now. Hadoop Node Sizing Hadoop Data Node Density Tradeoff on HCC: https://community.hortonworks.com/content/kbentry/48878/hadoop-data-node-density-tradeoff.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.