Roaring Elephant

  • Autor: Vários
  • Narrador: Vários
  • Editor: Podcast
  • Duración: 300:03:29
  • Mas informaciones

Informações:

Sinopsis

Bite-Sized Big Data

Episodios

  • Episode 113 – H2OAIWorld London 2018 Roaring Report

    06/11/2018 Duración: 01h02min

    Here is our H2O.ai World conference London Roaring Report. We had a blast and we hope that this episode can give you a good taste of what was going on. The sessions are now available online: https://www.youtube.com/playlist?list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 112 – Roaring News

    30/10/2018 Duración: 26min

    In this last Big Data news episode for the month of November, we look forward to the H2O World event next week in London and we have articles on BI Maturity and the upcoming Apache Ozone project that will supplant HDFS in future Hadoop clusters soon(TM). BI Maturity: You can’t get there from here! http://makingdatameaningful.com/bi-maturity/ Introducing Apache Hadoop Ozone: An Object Store for Apache Hadoop https://hortonworks.com/blog/introducing-apache-hadoop-ozone-object-store-apache-hadoop/ Katacoda example down on this page https://hadoop.apache.org/ozone Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 111 – How Public Cloud changed Big Data

    23/10/2018 Duración: 51min

    No interview this time but just Dave and Jhon talking about how public cloud changed Big data. Current news has brought this topic back to the foreground and we though it was a good idea to give our views on this subject. Along the way, we go over the different deployment strategies for Hadoop across on premise, private and public cloud and of course, hybrid environments. Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 110 – Roaring News

    16/10/2018 Duración: 38min

    Another week, another Big Data News episode. After going over all the event ticket giveaways that are currently going on, we have an article that goes over the basics on ETL vs ELT and have some fun with R graphs by the XKCD web comic. We finish with an in depth article on columnar data stores and a quick shout-out to Apache Nifi. Breaking News Our thanks to our guest from H2O.ai:   John Spooner Director of Solution Engineering, h2o.ai       Dave: XKCD Curve Fitting in R http://blog.revolutionanalytics.com/2018/09/curve-fitting.html Artificial intelligence, data will be the differentiator in the marketplace https://www.information-age.com/artificial-intelligence-data-123475102/ Jhon: Scaling ETL: How data pipelines evolve as your business grows https://bytes.grubhub.com/scaling-etl-how-data-pipelines-evolve-as-your-business-grows-72ff6c744e6e The design and implementation of modern column-oriented database systems https://blog.acolyer.org/2018/09/26/the-desig

  • Episode 109 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 2

    09/10/2018 Duración: 52min

    In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this second part, we discuss the ins and outs of good data stewardship and how companies can adopt, implement and contribute. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 108 – Roaring News

    02/10/2018 Duración: 55min

    Another episode of Big Data News and not just another episode, but an episode packed and packed with items. Before we do our regular article reviews, we are doing raffles for not one, not two but three different events! And as if that was not enough, our friends from Pulsar dropped in with their big Apache top-level project announcement. So not very bite sized this time, but smack full of delicious Big Data news! Breaking News Our thanks to our guests: Solix Empower   Sai Gundavelli Founder/CEO, Solix Technologies   Streamlio   Sanjeev Kulkarni Co-Founder at Streamlio     Sijie Guo Co-Founder at Streamlio   Free Big Data Event ticket giveaways: DataWorks Summit Asia Pacific Singapore Oct 11, 2018 - Tokyo Oct 16, 2018 - Melbourne Feb 06, 2018 To enter the raffle, send email to dws18apac@roaringelephant.org Tell us what event you want to attend! (Singapore, Tokyo, Melbourne) Solix Empower New York 2018 New York November 01, 2018 To enter the raffle, send email to SolixE

  • Episode 107 – Open Metadata and Governance Masterclass with Mandy Chessell – Part 1

    25/09/2018 Duración: 41min

    In this GDPR world, Data Governance and Data Lineage are, or should be, very much top of mind for anybody in the Big Data world. We reached out to Mandy Chessell, who has been very active in this area and were delighted when she accepted to do an interview with us. In this first part, the focus is more on Mandy herself and we lay the groundwork for the second part that will go live in episode 109. Mandy Chessell Distinguished Engineer, Master Inventor, Fellow of Royal Academy of Engineering https://www.linkedin.com/in/mandy-chessell-a4989722/ ODPi Blog post on Egeria: First Release of ODPi Egeria is Here ODPi github projects: Egeria - Open Metadata and Governance https://github.com/odpi/egeria Data-governance companion project https://github.com/odpi/data-governance   Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 106 – Roaring News

    18/09/2018 Duración: 39min

    In this edition of Big Data News, we take the pulse of Machine learning adoption and talk about Big Data  Online Learning by IBM on Coursera and by Columbia University on Edx. We round the episode off with a look at MR3 and the evil that are benchmarks. Breaking News Data Science Professional Certificate https://cognitiveclass.ai/blog/data-science-professional-certificate/ Taking the pulse of machine learning adoption https://www.zdnet.com/article/taking-the-pulse-of-machine-learning-adoption/ Performance Comparison of HDP LLAP, Presto, SparkSQL, Hive on Tez, and Hive on MR3 using the TPC-DS Benchmark https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/ Join Jhon on Artificial Intelligence (AI) & Robotics by ColumbiaX on Edx https://www.edx.org/micromasters/columbiax-artificial-intelligence https://www.edx.org/course/robotics-columbiax-csmm-103x-4 https://www.edx.org/course/artificial-intelligence-ai-columbiax-csmm-101x-4 Please use the Con

  • Episode 105 – Big Data at British Telecom with Phillip Radley

    11/09/2018 Duración: 01h06min

    In this episode we welcome Phil Radley, Chief Data Architect at BT to talk about the Big Data deployment at BT.   Phillip Radley (Linkedin) Chief Data Architect @ BT https://home.bt.com/     Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 104 – Roaring News

    04/09/2018 Duración: 36min

    In this Big Data News episode, we discuss an article with guidelines on how you should arrange your data gathering projects with the customer in mind. Dave brings a matrix of visualization products. Breaking News The five Cs: Five framing guidelines to help you think about building data products. https://www.oreilly.com/ideas/the-five-cs?utm_medium=social&utm_source=twitter.com&utm_campaign=awareness&utm_content=radar+content The Chartmaker Directory http://chartmaker.visualisingdata.com/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 103 – Apache Pulsar version 2.0 with Matteo and Sijie from Streamlio

    28/08/2018 Duración: 43min

    Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts. the first of which was published in episode 101. Here is the second part with information on version 2.0 and the future of the Apache Pulsar project. Apache Pulsar logo   The first subject taken on by Sijie is Pulsar Functions, followed by Matteo talking about the new schema registry and Topic Compaction. With a new major version being released, users will probably want to upgrade so we asked the guys about the upgrade path. The rest of the episode, Matteo and Sijie share what they can regarding the future Pulsar Roadmap. Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us yo

  • Episode 102 – Roaring News

    21/08/2018 Duración: 22min

    Big Data News at the end of the summer is not easy to find, but we did end up with three topics to discuss: from isolating GPUs in Hadoop 3.x to replicating big data (to the cloud) and quick tips from Adam's blog. Breaking News First Class GPUs support in Apache Hadoop 3.1, YARN & HDP 3.0 https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/ Replicating big datasets in the cloud https://medium.com/hotels-com-technology/replicating-big-datasets-in-the-cloud-c0db388f6ba2 https://dataworkssummit.com/berlin-2018/session/tools-and-approaches-for-migrating-big-datasets-to-the-cloud/ https://www.slideshare.net/Hadoop_Summit/tools-and-approaches-for-migrating-big-datasets-to-the-cloud Quick Tip: The easiest way to grab data out of a web page in Python https://medium.com/@ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future e

  • Episode 101 – Apache Pulsar update with Matteo and Sijie from Streamlio

    14/08/2018 Duración: 01h05min

    Matteo and Sijie from Streamlio reached out to us and let us know they had an update on Apache Pulsar. It turned out they had a lot to talk about so we cut the interview in two parts and here is the first part where they introduce Apache Pulsar, go in depth on the correct deployment scaling of a stable Pulsar cluster and clarify Pulsars "at least once vs exactly once" strategy. Part two will go in more depth on what's new. Stay tuned! Apache Pulsar logo Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 100 – Celebrating our Centennial with the history of Hadoop

    07/08/2018 Duración: 01h07min

    100 Big Data episodes! We made it, in no small part thanks to our audience: you are who keeps us going! In this episode we celebrate our centennial by going over the history of Hadoop releases, highlighting the most noteworthy events along the way. Join us down the twisty paths of our  memory lanes! The blockchain related  Linkedin post Jhon liked The sources for this episode: http://hadoop.apache.org/releases.html https://en.wikipedia.org/wiki/Apache_Hadoop Debate over which company had contributed more to Hadoop: http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/ Thank you for being part of the ride and now on to episode 200! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 99 – The State of Big Data at Codemotion Amsterdam

    31/07/2018 Duración: 45min

    The Roaring Elephant podcast was a guest at the Codemotion conference in Amsterdam a little while ago. This episode contains the audio of the talk we did on the State of Big Data. Our talk was dfinitely light on slideware, but if you want to see the video cast of our presentation, you can find it on the Codemotion youtube channel:Codemotion Amsterdam 2018: The State of Big Data by Roaring Elephant podcast Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 98 – Roaring news

    24/07/2018 Duración: 22min

    In this episode of Big Data Roaring News, Dave laments another announcement of Hadoop's demise and exposes A.I. imposters. Jhon has articles comparing Ranger with Sentry and Apache Nifi reaching the ripe age of 1.7 with a Minifi charged practical demo to prove the point. Breaking News Hadoop’s star dims in the era of cloud object data storage and stream computing https://siliconangle.com/blog/2018/07/09/hadoops-star-dims-era-cloud-object-data-storage-stream-computing/ The rise of “pseudo-ai” how tech firms quietly use humans to do bots work https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies Apache Ranger Vs Sentry https://www.linkedin.com/pulse/apache-ranger-vs-sentry-mythily-rajavelu/ How to build an IIoT system using Apache NiFi, MiNiFi, C2 Server, MQTT and Raspberry Pi https://medium.freecodecamp.org/building-an-iiot-system-using-apache-nifi-mqtt-and-raspberry-pi-ce1d6ed565bc Apache Nifi Version 1.7.0 released: http

  • Episode 97 – ODPi: A new world for data governance

    17/07/2018 Duración: 01h07min

    In this episode, we welcome back John Mertic one more time. It was quite obvious that John had lots more to talk about at the end of our last interview with him. ODPi has recently reinvented itself, moving away from a strict distribution standards body towards data governance and reference specifications. ODPi logo John Mertic Director of Program Management for ODPi, R Consortium, and Open Mainframe Project https://www.linkedin.com/in/jmertic/ ODPi website links: https://www.odpi.org/ https://www.odpi.org/blog/2018/04/04/the-state-of-open-source-and-big-data-three-years-later https://www.odpi.org/projects/data-governance-pmc https://www.odpi.org/events Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

  • Episode 96 – Roaring news

    10/07/2018 Duración: 46min

    In this edition of Roaring news, Ward Bekker returns to discuss what is happening in the world of Big Data. Ward brings news on GPUs in supercomputers and how Big Data could be wrong about you. Dave and Jhon found articles on Big data growth visualizations and GDPR. Breaking News 10 Charts that will change your perspective of Big Data’s Growth https://www.forbes.com/sites/louiscolumbus/2018/05/23/10-charts-that-will-change-your-perspective-of-big-datas-growth/#1ea595702926 New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500 https://www.top500.org/news/new-gpu-accelerated-supercomputers-change-the-balance-of-power-on-the-top500/ GDPR: A Call to Remove Technical Debt from Data Science https://medium.com/@kjarmul/gdpr-a-call-to-remove-technical-debt-from-data-science-c103a01c3102 Everything big data claims to know about you could be wrong http://news.berkeley.edu/2018/06/18/big-data-flaws/ Our thanks to Ward for adding some variety to this News episode.

  • Episode 95 – DataWorks Summit in San Jose with Ward Bekker

    03/07/2018 Duración: 01h52min

    Since both Dave and Jhon were not able to attend the Dataworks Summit in San Jose a couple of weeks ago, we have a guest, Ward Bekker, who was happy to join and educate us on the subject. DataWorks Summit San Jose 2018 In this episode we discuss the daily keynotes and Wards' selection of sessions at the Summit ranging from the new things in Yarn 3.0, Materialized views in Hive and much more.   Ward Bekker (Linkedin) Pre-Sales Solutions Engineer II @ Hortonworks   Some of the sessions and topics discussed are: Apache Hadoop State of the union https://dataworkssummit.com/san-jose-2018/session/apache-hadoop-yarn-state-of-the-union-2/ What is new in Apache Hive https://dataworkssummit.com/san-jose-2018/session/what-is-new-in-apache-hive/ Runing distributed tensorflow in production https://dataworkssummit.com/san-jose-2018/session/running-distributed-tensorflow-in-production-challenges-and-solutions-on-yarn-3-0-2/ Just the sketch: advanced streaming analytics in Apache Metron

  • Episode 94 – Roaring news

    26/06/2018 Duración: 37min

    I this weeks edition of Roaring Big Data News, Dave talks about modernizing Hadoop and a billion java errors. Jhon has an article on improving your learning data sets. We finish with a discussion about the newly released HDP 2.6.5 with an emphasis on the deprecation notices and Yarn Containers. Breaking News Dave Modernizing Hadoop: Reaching the plateau of productivity https://www.zdnet.com/article/modernizing-hadoop-reaching-the-plateau-of-productivity/ 1 billion Java errors, here’s what causes 97% of them https://blog.takipi.com/we-crunched-1-billion-java-logged-errors-heres-what-causes-97-of-them/ https://blog.takipi.com/the-top-10-exceptions-types-in-production-java-applications-based-on-1b-events/ Jhon Why you need to improve your training data, and how to do it https://petewarden.com/2018/05/28/why-you-need-to-improve-your-training-data-and-how-to-do-it/amp/ Announcing the General Availability of Hortonworks Data Platform (HDP) 2.6.5, Apache Ambari 2.6.2 and SmartS

página 19 de 24