Roaring Elephant

Roaring Elephant

Autor: Vários
Narrador: Vários
Editor: Podcast
Duración: 300:03:29

Mas informaciones

Sinopsis

Bite-Sized Big Data

Show more

Episodios

Episode 113 – H2OAIWorld London 2018 Roaring Report

06/11/2018 Duración: 01h02min

Here is our H2O.ai World conference London Roaring Report. We had a blast and we hope that this episode can give you a good taste of what was going on. The sessions are now available online: https://www.youtube.com/playlist?list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 112 – Roaring News

30/10/2018 Duración: 26min

In this last Big Data news episode for the month of November, we look forward to the H2O World event next week in London and we have articles on BI Maturity and the upcoming Apache Ozone project that will supplant HDFS in future Hadoop clusters soon(TM). BI Maturity: You can’t get there from here! http://makingdatameaningful.com/bi-maturity/ Introducing Apache Hadoop Ozone: An Object Store for Apache Hadoop https://hortonworks.com/blog/introducing-apache-hadoop-ozone-object-store-apache-hadoop/ Katacoda example down on this page https://hadoop.apache.org/ozone Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 111 – How Public Cloud changed Big Data

23/10/2018 Duración: 51min

No interview this time but just Dave and Jhon talking about how public cloud changed Big data. Current news has brought this topic back to the foreground and we though it was a good idea to give our views on this subject. Along the way, we go over the different deployment strategies for Hadoop across on premise, private and public cloud and of course, hybrid environments. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 110 – Roaring News

16/10/2018 Duración: 38min

Another week, another Big Data News episode. After going over all the event ticket giveaways that are currently going on, we have an article that goes over the basics on ETL vs ELT and have some fun with R graphs by the XKCD web comic. We finish with an in depth article on columnar data stores and a quick shout-out to Apache Nifi. Breaking News Our thanks to our guest from H2O.ai: John Spooner Director of Solution Engineering, h2o.ai Dave: XKCD Curve Fitting in R http://blog.revolutionanalytics.com/2018/09/curve-fitting.html Artificial intelligence, data will be the differentiator in the marketplace https://www.information-age.com/artificial-intelligence-data-123475102/ Jhon: Scaling ETL: How data pipelines evolve as your business grows https://bytes.grubhub.com/scaling-etl-how-data-pipelines-evolve-as-your-business-grows-72ff6c744e6e The design and implementation of modern column-oriented database systems https://blog.acolyer.org/2018/09/26/the-desig

Escucha
Episode 109 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 2

09/10/2018 Duración: 52min

In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this second part, we discuss the ins and outs of good data stewardship and how companies can adopt, implement and contribute. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 108 – Roaring News

02/10/2018 Duración: 55min

Another episode of Big Data News and not just another episode, but an episode packed and packed with items. Before we do our regular article reviews, we are doing raffles for not one, not two but three different events! And as if that was not enough, our friends from Pulsar dropped in with their big Apache top-level project announcement. So not very bite sized this time, but smack full of delicious Big Data news! Breaking News Our thanks to our guests: Solix Empower Sai Gundavelli Founder/CEO, Solix Technologies Streamlio Sanjeev Kulkarni Co-Founder at Streamlio Sijie Guo Co-Founder at Streamlio Free Big Data Event ticket giveaways: DataWorks Summit Asia Pacific Singapore Oct 11, 2018 - Tokyo Oct 16, 2018 - Melbourne Feb 06, 2018 To enter the raffle, send email to dws18apac@roaringelephant.org Tell us what event you want to attend! (Singapore, Tokyo, Melbourne) Solix Empower New York 2018 New York November 01, 2018 To enter the raffle, send email to SolixE

Escucha
Episode 107 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 1

25/09/2018 Duración: 41min

In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this first part, the focus is more on Mandy herself and we lay the groundwork for the second part that will go live in episode 109. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 106 – Roaring News

18/09/2018 Duración: 39min

In this edition of Big Data News, we take the pulse of Machine learning adoption and talk about Big Data Online Learning by IBM on Coursera and by Columbia University on Edx. We round the episode off with a look at MR3 and the evil that are benchmarks. Breaking News Data Science Professional Certificate https://cognitiveclass.ai/blog/data-science-professional-certificate/ Taking the pulse of machine learning adoption https://www.zdnet.com/article/taking-the-pulse-of-machine-learning-adoption/ Performance Comparison of HDP LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/ Join Jhon on Artificial Intelligence (AI) & Robotics by ColumbiaX on Edx https://www.edx.org/micromasters/columbiax-artificial-intelligence https://www.edx.org/course/robotics-columbiax-csmm-103x-4 https://www.edx.org/course/artificial-intelligence-ai-columbiax-csmm-101x-4 Please use the Con

Escucha
Episode 105 – Big Data at British Telecom with Phillip Radley

11/09/2018 Duración: 01h06min

In this episode we welcome Phil Radley, Chief Data Architect at BT to talk about the Big Data deployment at BT. Phillip Radley (Linkedin) Chief Data Architect @ BT https://home.bt.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 104 – Roaring News

04/09/2018 Duración: 36min

In this Big Data News episode, we discuss an article with guidelines on how you should arrange your data gathering projects with the customer in mind. Dave brings a matrix of visualization products. Breaking News The five Cs: Five framing guidelines to help you think about building data products. https://www.oreilly.com/ideas/the-five-cs?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content The Chartmaker Directory http://chartmaker.visualisingdata.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 103 – Apache Pulsar version 2.0 with Matteo and Sijie from Streamlio

28/08/2018 Duración: 43min

Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts. the first of which was published in episode 101. Here is the second part with information on version 2.0 and the future of the Apache Pulsar project. Apache Pulsar logo The first subject taken on by Sijie is Pulsar Functions, followed by Matteo talking about the new schema registry and Topic Compaction. With a new major version being released, users will probably want to upgrade so we asked the guys about the upgrade path. The rest of the episode, Matteo and Sijie share what they can regarding the future Pulsar Roadmap. Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us yo

Escucha
Episode 102 – Roaring News

21/08/2018 Duración: 22min

Big Data News at the end of the summer is not easy to find, but we did end up with three topics to discuss: from isolating GPUs in Hadoop 3.x to replicating big data (to the cloud) and quick tips from Adam's blog. Breaking News First Class GPUs support in Apache Hadoop 3.1, YARN & HDP 3.0 https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/ Replicating big datasets in the cloud https://medium.com/hotels-com-technology/replicating-big-datasets-in-the-cloud-c0db388f6ba2 https://dataworkssummit.com/berlin-2018/session/tools-and-approaches-for-migrating-big-datasets-to-the-cloud/ https://www.slideshare.net/Hadoop_Summit/tools-and-approaches-for-migrating-big-datasets-to-the-cloud Quick Tip: The easiest way to grab data out of a web page in Python https://medium.com/@ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future e

Escucha
Episode 101 – Apache Pulsar update with Matteo and Sijie from Streamlio

14/08/2018 Duración: 01h05min

Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts and here is the first part where they introduce Apache Pulsar, go in depth on the correct deployment scaling of a stable Pulsar cluster and clarify Pulsars "at least once vs exactly once" strategy. Part two will go in more depth on what's new. Stay tuned! Apache Pulsar logo Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 100 – Celebrating our Centennial with the history of Hadoop

07/08/2018 Duración: 01h07min

100 Big Data episodes! We made it, in no small part thanks to our audience: you are who keeps us going! In this episode we celebrate our centennial by going over the history of Hadoop releases, highlighting the most noteworthy events along the way. Join us down the twisty paths of our memory lanes! The blockchain related Linkedin post Jhon liked The sources for this episode: http://hadoop.apache.org/releases.html https://en.wikipedia.org/wiki/Apache_Hadoop Debate over which company had contributed more to Hadoop: http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/ Thank you for being part of the ride and now on to episode 200! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 99 – The State of Big Data at Codemotion Amsterdam

31/07/2018 Duración: 45min

The Roaring Elephant podcast was a guest at the Codemotion conference in Amsterdam a little while ago. This episode contains the audio of the talk we did on the State of Big Data. Our talk was dfinitely light on slideware, but if you want to see the video cast of our presentation, you can find it on the Codemotion youtube channel:Codemotion Amsterdam 2018: The State of Big Data by Roaring Elephant podcast Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 98 – Roaring news

24/07/2018 Duración: 22min

In this episode of Big Data Roaring News, Dave laments another announcement of Hadoop's demise and exposes A.I. imposters. Jhon has articles comparing Ranger with Sentry and Apache Nifi reaching the ripe age of 1.7 with a Minifi charged practical demo to prove the point. Breaking News Hadoop’s star dims in the era of cloud object data storage and stream computing https://siliconangle.com/blog/2018/07/09/hadoops-star-dims-era-cloud-object-data-storage-stream-computing/ The rise of “pseudo-ai” how tech firms quietly use humans to do bots work https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies Apache Ranger Vs Sentry https://www.linkedin.com/pulse/apache-ranger-vs-sentry-mythily-rajavelu/ How to build an IIoT system using Apache NiFi, MiNiFi, C2 Server, MQTT and Raspberry Pi https://medium.freecodecamp.org/building-an-iiot-system-using-apache-nifi-mqtt-and-raspberry-pi-ce1d6ed565bc Apache Nifi Version 1.7.0 released: http

Escucha
Episode 97 – ODPi: A new world for data governance

17/07/2018 Duración: 01h07min

In this episode, we welcome back John Mertic one more time. It was quite obvious that John had lots more to talk about at the end of our last interview with him. ODPi has recently reinvented itself, moving away from a strict distribution standards body towards data governance and reference specifications. ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later https://www.odpi.org/projects/data-governance-pmc https://www.odpi.org/events Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 96 – Roaring news

10/07/2018 Duración: 46min

In this edition of Roaring news, Ward Bekker returns to discuss what is happening in the world of Big Data. Ward brings news on GPUs in supercomputers and how Big Data could be wrong about you. Dave and Jhon found articles on Big data growth visualizations and GDPR. Breaking News 10 Charts that will change your perspective of Big Data’s Growth https://www.forbes.com/sites/louiscolumbus/2018/05/23/10-charts-that-will-change-your-perspective-of-big-datas-growth/#1ea595702926 New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500 https://www.top500.org/news/new-gpu-accelerated-supercomputers-change-the-balance-of-power-on-the-top500/ GDPR: A Call to Remove Technical Debt from Data Science https://medium.com/@kjarmul/gdpr-a-call-to-remove-technical-debt-from-data-science-c103a01c3102 Everything big data claims to know about you could be wrong http://news.berkeley.edu/2018/06/18/big-data-flaws/ Our thanks to Ward for adding some variety to this News episode.

Escucha
Episode 95 – DataWorks Summit in San Jose with Ward Bekker

03/07/2018 Duración: 01h52min

Since both Dave and Jhon were not able to attend the Dataworks Summit in San Jose a couple of weeks ago, we have a guest, Ward Bekker, who was happy to join and educate us on the subject. DataWorks Summit San Jose 2018 In this episode we discuss the daily keynotes and Wards' selection of sessions at the Summit ranging from the new things in Yarn 3.0, Materialized views in Hive and much more. Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II @ Hortonworks Some of the sessions and topics discussed are: Apache Hadoop State of the union https://dataworkssummit.com/san-jose-2018/session/apache-hadoop-yarn-state-of-the-union-2/ What is new in Apache Hive https://dataworkssummit.com/san-jose-2018/session/what-is-new-in-apache-hive/ Runing distributed tensorflow in production https://dataworkssummit.com/san-jose-2018/session/running-distributed-tensorflow-in-production-challenges-and-solutions-on-yarn-3-0-2/ Just the sketch: advanced streaming analytics in Apache Metron

Escucha
Episode 94 – Roaring news

26/06/2018 Duración: 37min

I this weeks edition of Roaring Big Data News, Dave talks about modernizing Hadoop and a billion java errors. Jhon has an article on improving your learning data sets. We finish with a discussion about the newly released HDP 2.6.5 with an emphasis on the deprecation notices and Yarn Containers. Breaking News Dave Modernizing Hadoop: Reaching the plateau of productivity https://www.zdnet.com/article/modernizing-hadoop-reaching-the-plateau-of-productivity/ 1 billion Java errors, here’s what causes 97% of them https://blog.takipi.com/we-crunched-1-billion-java-logged-errors-heres-what-causes-97-of-them/ https://blog.takipi.com/the-top-10-exceptions-types-in-production-java-applications-based-on-1b-events/ Jhon Why you need to improve your training data, and how to do it https://petewarden.com/2018/05/28/why-you-need-to-improve-your-training-data-and-how-to-do-it/amp/ Announcing the General Availability of Hortonworks Data Platform (HDP) 2.6.5, Apache Ambari 2.6.2 and SmartS

Escucha

|<
<<
>>
>|

página 19 de 24