Sinopsis
Bite-Sized Big Data
Episodios
-
Episode 153 – How Secure is the Future of Open Source?
13/08/2019 Duración: 01h04minThe way open source software is being consumed has changed drastically: originally found on the fringes, open source technology has now become a core part of many organizations of all sizes. We take a look at the confusion and sometimes vocal irritation that has accompanied this adoption by "Big Business" and ask the question if the future of Open Source is in danger. We have been playing with the idea to give our view on this subject for a while now, but we wanted to make sure not to add to very flammable situation. Rather, we try to share usefull information and stay as close to an unbiased narrative as possible. We end the conversation on a positive note, being hopefull that the inherent openness and transparency that imbues Open Source will prevail and a new equilibrium will be found. We are not basing this discussion of any specific article, but here is a list of articles that we reference during the discussion. The CBInsights article is waht we consider the most FUD-less of them all containing a lo
-
Episode 152 – Roaring News
06/08/2019 Duración: 42minAnother fortnight, another roaring news episode covering this time: de-anonymizing anonymized data is reportedly easy, Kubernetes is easier than Big Data, Big Data is hard and hard to understand Kafka can be made easier using Factorio visualization. It's not because you're paranoid, they're not out to get you! Not totally unsurprising to your co-hosts, this article discusses how easy it is to recombine previously anonymized data to regain the ability to identify a person, based on the data sets. Now this does involve combining multiple datasets and this is something legislators have warned against in the past. GDPR specifically has a clause that adreses this and data owners need to exercise care to avoid this fro happening. That being the case, though, there are bound to be entities that are not bound by privacy legislation, or that simply choose not to follow them. So long story short (too late!), take care abot what information you share where! Going on a litle tangent, we discuss how bad this loss of
-
Episode 151 – Do you only need 6 principles?
30/07/2019 Duración: 53minA little while ago, Dave came across an article by Francesca Lazzeri titled "The Data Science Mindset: Six Principles to Build Healthy Data-Driven Organizations" and in this episode we're giving our view and expand on those principles. Is it really possible to define a successful data science organization following 6 concrete principles? Are these principle a step by step, one after the other plan you can follow on the road to success, or are these principle something you need to keep in mind from the start up until the end of days? 1. Understand the Business and Decision-Making Process We're pretty much agreeing with this one and expanding on it, we talk about the benefits of doing this exercise on streamlining the organization and security. However, to achieve the C-level support which we agree is needed, some free-form experimentation needs to take place to get to a position where you actually have something that can be shown in a clear and concise way to said C-level. However, when the step to pro
-
Episode 150 – Roaring News
23/07/2019 Duración: 33minIn this news episode, we use a nice little article on how you can help keep open source sustainable as a structure for a broader discussion on this subject. The second subject this time goes another round on the "data engineers are not data scientists" (and the reverse) subject. Ask not what Open Source can do for you, ask what YOU can do for Open Source! Many organizations, commercial and not, are using open source software so heavily, they are becoming dependent on open source for their own survival. So when you look at how you can support open source, it is not an entirely altruistic project, but makes just good business sense. Using this article for structure and inspiration, we go over the different way everybody, including YOU, can help keep the open source movement sustainable. Donating some hard cash, employing open source committers and just be an open source advocate are just a few possibilities. On a related note; do you want to keep this little open source focused podcast sustainable? Please
-
Episode 149 – The State of Developer Ecosystem 2019
16/07/2019 Duración: 01h03minWhen friend of the show Ward Bekker sent us a link to the recent survey write-up on the State of Developer Ecosystem in 2019 by JetBrains, we immediately set up a recording date with him to go over all the facts and figures... DevOps appear to be quite rare The first thing we picked up on is how many organizations are still surviving without any kind of DevOps. Even though everybody is talking about DevOpsand config management, it would appear, at least according to this survey, that these tools are still far from prevalent in the development environments. After discussing the different facts and figures contained in the webpages on the JetBrains website, we were left wondering how generic the target group was. Since this survey was conducted by JetBrains, it would definitely make sense that the respondent population was taken from their customer base and this could skew the results towards smaller, "Indy" development environments. The sense and non-sense of Multi Cloud deployment We then take a bit of a de
-
Episode 148 – Roaring News
09/07/2019 Duración: 32minWith Summer starting and news drying up a little in the heat, we managed to find some interesting things happening at the Apache Software Foundation and we try to find correlations with the Cloud Native Computing Foundation. After that, we discover that Robots actually won't be taking all our jobs... Who would have thought... The more things change, the more they stay the same..? While the ranks have closed and the messaging is "everything continuing as usual, nothing to see here", things are apparently happening at the ASF with some top level people moving on. Since only the future can tell how (and even if) this will have any noticeable effect, we have a little discussion about software foundations in general. Aside from the ASF, we talk about the Linux Foundation and the CNCF who also have their role to play. One is still glad to be of service! Over the years, there has been more than a little bit of fear mongering going on about how robots and technology in general will destroy a lot of jobs. I
-
Episode 147 – Alex Zeltov on MLOps with mlflow, kubeflow and other tools (part 2)
02/07/2019 Duración: 44minIn this episode, Global Black Belt and Technical Architect in Big Data and Advanced Analytics Team at Microsoft, Alex Zeltov, is our guest and he explains the in's and out's of MLOps though various tools like mlflow and kubeflow In this second part, we go into more depth on the practical consequences of implementing MLOps and the various tools that are available. We also go on a bit of a tangent discussing why traditional enterprises are still having a hard time to look at machine learning models as something that requires and benefits from things like model management, version control and periodic updating of models. For more from Alex on MLOps and mlflow, check out his presentation at the Washington DC DataWorks Summit a couple, of weeks ago. The slides are now available on SlideShare and the video is available on YouTube: https://www.youtube.com/watch?v=Ns82mJjJgto MLOps Just like DataOps follows on to DevOps, one may say that MLOps continues after DataOps. While there is a wikipedia page on the su
-
Episode 146 – Roaring News
25/06/2019 Duración: 36minA new function is being called into being by Forrester called the "Data Hunter" which sounded interesting enough to us to spend some time on. Then we cover a nice guest blog on the Cloudera site and we finish off with some rambling on he changes in the HPC world. Enjoy! Loincloths and spears to the ready: the Data Hunter is born! Dave found a small arcticle on the Forrester site that points to a paid webinar about Data Hunting. Now we did not pony up the 300$ they charge for the webinar, but we found the concept quite compelling and looked at the three "audience questions" that were included in the article. The "Small File Problem" and a little "You're Doing it Wrong"...? This guest blog on the Cloudera web site actually has some practical information that can be useful when you need to consolidate your incremental upload files to reduce the amount of files your Hive queries need to traverse. The additional complexity here was that this had to happen on a live production environment without service inter
-
Episode 145 – Alex Zeltov on MLOps with mlflow, kubeflow and other tools (part 1)
18/06/2019 Duración: 45minIn this episode, Global Black Belt and Technical Architect in Big Data and Advanced Analytics Team at Microsoft, Alex Zeltov, is our guest and he explains the in's and out's of MLOps though various tools like mlflow and kubeflow In this first episode, Alex talks on a more theoretical level about MLOps and the benefits it can deliver. For more from Alex on MLOps and mlflow, check out his presentation at the Washington DC DataWorks Summit a couple, of weeks ago. The slides are now available on SlideShare and the video is available on YouTube: https://www.youtube.com/watch?v=Ns82mJjJgto MLOps Just like DataOps follows on to DevOps, one may say that MLOps continues after DataOps. While there is a wikipedia page on the subject, there is not that much "prior art" available just yet. The main advantages that MLOps can deliver, according to Alex, are a much improved move to production of trained algorithmes, even allowing for CI/CD, and a more structured approach to training models where multiple data scienti
-
Episode 144 – Roaring news
11/06/2019 Duración: 37minIn the past week, trouble at Cloudera really stood out and in the context of similar problems at MapR and (somewhat less related to Big Data) Pivotal, we are devoting the entire episode to this. (Image taken from https://media.thinknum.com/articles/is-hadoop-hype-wearing-off-the-answer-may-lie-in-startups-data/) As this is a Roaring News Episode, we will discuss this story based on a number of articles we found. Cloudera has a "bad" day... The combination of some bad quarterly results and both CEO Tom Reilly and chief strategy officer and co-founder Mike Olson leaving the company have had a dramatic effect on the stock price. Now this could be an isolated incident, quickly forgotten, but in the light of similar issues at MapR (which is not a public company) and Pivotal, there does seem to be something more fundamental happening in these Open Source, venture capital fueled companies. Looking at job listings over the years The second article we discuss (from which we also took the image above becau
-
Episode 143 – Spark in Action with author Jean-Georges Perrin (Part 2)
04/06/2019 Duración: 58minAnd now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In the second part w
-
Episode 142 – Roaring News – KubeCon 2019 Report
28/05/2019 Duración: 47minA little over a week ago, KubeCon and CloudNativeCon happened and our independent Roaring Roving Reporter Rubik Dave came back from Barcelona with a comprehensive report. Kubernetes As the kubernetes.io webpage tells us: "Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications." As we discuss in the episode, Kubernetes forms a kind of middleware layer that performs orchestration of light weight docker containers. To be sure, you can use other container technologies but Docker (and its companion project Moby) are what is most often used with Kubernetes. The biggest advantage of Kubernetes, I believe, is how it has standardized the way a micro services framework based on docker container instances can be deployed and managed. There have been a myriad of other approaches that tried to solve that problem (and Dave gives a rather exhaustive list in the episode), Kubernetes has emerged to be the best supported by the community. KubeCon And that
-
Episode 141 – Spark in Action with author Jean-Georges Perrin (Part 1)
21/05/2019 Duración: 48minAnd now for something completely different: a book review! Not something we have done before, but when Jean-Georges Perrin contacted us with the suggestion of taking a deeper look at the "Spark in Action" book he is currently writing, we certainly did not say no! However, in al honesty, we talked about much, much more... Free eBook raffle Manning Publication has been kind enough to give us a couple of download codes for a free eBook version of "Spark in Action". As always, our Patreons get a first chance to get their hands on one of the codes. If you are a Roaring V.I.P. (or higher), you can head over to our Patreon Page now where you will find a posts containing all the information required. If you become a Patreon now, you immediately get access tot that post! ;) After one week, if there are any codes left, there will be a tweet about what you can do to get a free code, even if you are not a Patreon. A book review on Spark in Action, second edition with author Jean-Georges Perrin In this first part o
-
Episode 140 – Roaring News
14/05/2019 Duración: 36minAnother week another feed of roaring news articles starting with apparent changes at MapR and the release of Red Hat Enterprise Linux 8. We go in depth on the open sourcing of the DataBricks developed Delta Lake and finish with some SQL generated fractals. Big thanks to our Roaring Patreons making this podcast possible! DataWorks Summit free ticket raffle. Final week for our DataWorksSummit Washington DC free ticket giveaway! Get your free ticket now! The Roaring Elephant on YouTube. The Roaring Elephant YouTube channel has launched! Will you help us reach 100 subscribers (modest goals are a good start!) so we can claim our personalized URL on YouTube? Every time a new episode is published, you will find a video uploaded to the channel as well. There won't be any real video yet though, only a still image as you can see in the thumbnails. But as soon as we reach the related goal on our Patreon, this is where our video content will appear. In case you are wondering, when we start recording ac
-
Episode 139 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 2)
07/05/2019 Duración: 33minDataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation. If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here! Christopher Bergh (@ChrisBergh) CEO & Head Chef, DataKitchen The DataOps Cookbook DataOps is NOT Just DevOps for Data Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 138 – Roaring News
30/04/2019 Duración: 27minThe biggest news is of course the launch of our Patreon! Hop over to https://www.patreon.com/roaringelephant and see if you want to help us thrive and grow! On the technical front, we have a Blog on Machine Learning Model Management, Apache turning 20 and Google breeding aggressive A.I.! And we also have a side-conversation on NginX... Apache Software Foundation Continues to Grow Open Source Software https://www.eweek.com/development/the-apache-software-foundation-continues-to-grow-open-source-software Frameworks for Machine Learning Model Management https://www.inovex.de/blog/machine-learning-model-management/ Google's AI Has Learned to Become "Highly Aggressive" in Stressful Situations https://www.sciencealert.com/google-deep-mind-has-learned-to-become-highly-aggressive-in-stressful-situations Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 137 – Interview on DataOps with Chris Bergh of DataKitchen.io (Part 1)
23/04/2019 Duración: 45minDataKitchen.io's Chris Bergh takes us down the path towards successful DataOps implementation. If you have not heard of the DataOps concept yet and data is a big part of your environment (and really, it should be) we're sure you will find more than a couple takeaways here! Christopher Bergh (@ChrisBergh) CEO & Head Chef, DataKitchen The DataOps Cookbook DataOps is NOT Just DevOps for Data Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 136 – Temet Nosce
16/04/2019 Duración: 31minBreaking with tradition, this News Episode does not have any Big data related articles. Instead, this episode is all about our plans for the future of this podcast... Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 135 – Big Data in Cybersecurity with Saad Ayad, featuring Apache Metron (Part 2)
09/04/2019 Duración: 30minDataLeaks and the resulting attack on our privacy have been a major news item in the recent months. Big data tools like Apache Metron, built on top of Hadoop can be instrumental in detecting and preventing intrusions. In this episode, we are joined by Saad Ayad who was General Manager Security Operations at Telstra and currently is a Director at Digital Fortress Services in Melbourne Australia. Saad has been active in the cybersecurity world for a long time and we are grateful he was willing to spend some time with us and share his knowledge and experience. [Digital Fortress Services - Cybersecurity] Saad Ayad (@saadayad_) Cyber Security, Big Data Analytics & Operations http://www.digitalfortress.services @DigFortServ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.
-
Episode 134 – Roaring News: Dataworks Summit Lightning Interviews
02/04/2019 Duración: 37minA special edition of Big Data News featuring a number of quick interviews at the booths in the community expo hall. A big thank you to the brave people there that were willing to face the Roving Roaring Mike at the Barcelona Dataworks summit a couple, of weeks ago. 03:04 Attunity https://www.attunity.com/ 07:41 Cloudera Fast Forward Labs https://www.cloudera.com/products/fast-forward-labs-research.html 11:09 DataVard https://www.datavard.com 17:19 Cazena https://www.cazena.com/ 22:39 Syncsort https://www.syncsort.com 26:22 Accenture https://www.accenture.com 30:44 Unravel Data https://unraveldata.com Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.