Roaring Elephant

Roaring Elephant

Autor: Vários
Narrador: Vários
Editor: Podcast
Duración: 300:03:29

Mas informaciones

Sinopsis

Bite-Sized Big Data

Show more

Episodios

Episode 73 – Roaring News

06/02/2018 Duración: 34min

In this edition of the Roaring News series, we talk about delivering business value and how to build an analytics team. For the Machine learning aficionados, we cover the top ML algorithms and we round off with an article on sizing a Apache Flink cluster, which fits nicely with the previous and next episode! Breaking News Delivering Business Value with Big Data Projects https://www.techrepublic.com/article/4-tips-for-delivering-more-business-value-with-short-term-big-data-projects/ Sizing Flink (and other streaming?) https://data-artisans.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Building The Analytics Team At Wish Part 1 — Rebuilding The Foundation Part 2 — Scaling Data Engineering Part 3 — Scaling Data Analysis Part 4 — Recruiting A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 Please use the Contact Form on this blog or o

Escucha
Episode 72 – Hadoop sizing part 2: Storage sizing

30/01/2018 Duración: 32min

In this continuation of our Hadoop Sizing series we started last September, we move on from sizing your cluster to sizing the individual server chassis or virtual machines in your cluster. We did not finish the entire story just yet, concentrating mainly on the storage component. The final part 3 where we round off the subject with sizing your compute and network resources is planned to be published in the next topic episode. Hadoop Node Sizing Hadoop Data Node Density Tradeoffpost on HCC: https://community.hortonworks.com/content/kbentry/48878/hadoop-data-node-density-tradeoff.html Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 71 – Roaring News

23/01/2018 Duración: 51min

This time Dave has prepared some articles for us to discuss. First we talk about something new on our radar: Apache Trafodion which is a transactional SQL on Hadoop. Next we spend some time on Artificial ignorance and we round off with some IoT predictions by IBM Breaking News Apache Trafodion - http://trafodion.apache.org/ goes TLP after 2.5 years… http://incubator.apache.org/projects/trafodion.html https://www.slideshare.net/mKrishnaKumar1/trafodion-an-enterprise-class-sql-based-on-hadoop Artificial ignorance: The 10 biggest AI failures of 2017 https://www.techrepublic.com/article/the-10-biggest-ai-failures-of-2017/ The Internet Of Things (IOT) Will Be Massive In 2018: Here Are The 4 Predictions From IBM https://www.forbes.com/sites/bernardmarr/2018/01/04/the-internet-of-things-iot-will-be-massive-in-2018-here-are-the-4-predictions-from-ibm Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics

Escucha
Episode 70 – 10 Facts about Hadoop, five years later

16/01/2018 Duración: 47min

In this trip down memory lane, we go over an article from five years ago and discuss how Hadoop and Big Data have changed since then, or has it...? Time Machine Data tunnel Hadoop is 10 years old. Lets look back at public opinion just five years ago. (https://www.developer.com/db/10-facts-about-hadoop.html) Import/Export Data to and from HDFS Data Compression in HDFS Transformation in Hadoop Achieve Common Task Combining Large Volume Data Ways to Analyze High Volume Data Debugging in Hadoop World Easy to Control Hadoop System Scalable Persistence Data Read and Write in Hadoop Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 69 – Roaring News

09/01/2018 Duración: 34min

The first news episode of 2018 has landed. We discuss the new Big Data architecture at CERN, a curious case of a broken benchmark and the future plans of the Apache Hadoop project. Breaking News The Architecture of the Next CERN Accelerator Logging Service https://databricks.com/blog/2017/12/14/the-architecture-of-the-next-cern-accelerator-logging-service.html The Curious Case of the Broken Benchmark: Revisiting Apache Flink® vs. Databricks Runtime https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime Hadoop 3.0 Ships, But What Does the Roadmap Reveal? https://www.datanami.com/2017/12/15/hadoop-3-0-ships-roadmap-reveal/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 68 – Future Predictions

02/01/2018 Duración: 48min

Welcome to 2018! And welcome to our 110% fact based prediction show for 2018. As you may expect from your two hosts, everything in this episode is 110% sure to become reality in the next twelve months. And since 110% is not actually possible, our predictions might also be just a little bit off? But we have 365 days to bask in the glory of our predictions before we, as usual, are shot back down to earth. Nancy comic Dave The year of cloud first and hybrid cloud Many organisations will move from solely on prem to cloud or hybrid with new workloads seeking alternatives to their traditional on-prem. Edge computing for IoT With edge devices becoming more powerful and IoT workloads increasing (and bandwidth not getting cheaper at the same rate) we'll see more of the intelligence pushed further to the edge. GDPR will fundamentally change the face of data governance, collection, anon, retention in big data GDPR regulations start to arrive in the form of concrete plans being in place and

Escucha
Episode 67 – Roaring News

26/12/2017 Duración: 43min

It's here: the final news episode for 2017! We finish off the year talking about Apache Pulsar, Hadoop Delegation tokens (aka Kerberos), the Hadoop on Container hype (or is it?), Apache Hadoop 3.0 release and all you need to know bout Data Prepping (or at least all we can tell you in about 10 minutes, that is). Breaking News Jhon Comparing Pulsar and Kafka: unified queuing and streaming https://streaml.io/blog/pulsar-streaming-queuing/ Hadoop Delegation Tokens Explained http://blog.cloudera.com/blog/2017/12/hadoop-delegation-tokens-explained/ Hadoop and Containers Big Data and Container Orchestration with Kubernetes (K8s) https://www.bluedata.com/blog/2017/12/big-data-container-orchestration-kubernetes-k8s/ Spark on Kubernetes series https://banzaicloud.com/blog/spark-k8s/ https://banzaicloud.com/blog/scaling-spark-k8s/ https://banzaicloud.com/blog/zeppelin-spark-k8/ Data Prepping in the clouds Google Cloud Dataprep: Spreadsheet-Style Data Wrangling Powered

Escucha
Episode 66 – Past Predictions

19/12/2017 Duración: 37min

It the time of the year again where you can call us out on being totally rubbish at predicting much of anything, or can we..? Listen to the episode and find out! In any case, we unabashedly will be recording a new "future predictions" show in a couple of weeks so if you have any predictions you want us to consider, send them to us by tweet or email! Bart Simpson - Being Right Sucks Predictions: Fragmentation of ecosystem Scale of data-breaches get larger and more IOT focused Chat-bots everywhere More options for self service big data platforms for SMB Commerce will muddy the waters - snake oil sales: call it big data and it will sell Cyber-security with big data becomes commonplace In-Memory and GPU will rule; commodity hardware will evolve into "big iron". Atlas repeat: it's here! Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 65 – Roaring news

12/12/2017 Duración: 36min

It's another Roaring News episode. Today Jhon talks about machine learning projects for beginners, data visualization and the new neural network hotness which is transfer learning. Dave covers the Dataworks Summit call for papers and Apache Impala reaching Top Level Project status. Breaking News Jhon 8 Fun Machine Learning Projects for Beginners https://elitedatascience.com/machine-learning-projects-for-beginners Data is Beautiful https://www.reddit.com/r/dataisbeautiful/ https://twitter.com/hashtag/dataisbeautiful Transfer Learning - Machine Learning's Next Frontier http://ruder.io/transfer-learning/index.html#whatistransferlearning Dave Apache Impala gains TLP https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces24 Dataworks Summit call for papers https://dataworkssummit.com/blog/dataworks-summit-berlin-call-for-papers-is-now-open/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or

Escucha
Episode 64 – Talking Apache Pulsar with Matteo and Sijie from Streamlio

05/12/2017 Duración: 01h22min

A while ago, the all knowing oracle that is twitter pointed out that we really did not do justice to the Apache Pulsar project when we covered it in or Roaring News episode. The good people at Streamlio reached out to us and here is the 80+ minutes long discussion we had with Matteo Merli and Sijie Guo, going in depth on the merits and technical details, setting the Roaring Pulsar record straight! Apache Pulsar logo Matteo Merli (https://www.linkedin.com/in/matteomerli/) Co-Founder - Software Engineer Sijie Guo (https://www.linkedin.com/in/samuelguo/) Co-Founder Apache Pulsar (incubating) https://pulsar.apache.org/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 63 – Roaring News

28/11/2017 Duración: 31min

It's another news episode folks. This time Dave and Jhon talk about extracting telemetry from a PS3 steering wheel and pedal set, IBM sun-setting BigInsights and 6 things a budding Data Scientist should be aware of. Breaking News Dave Taking KSQL for a Spin Using Real-time Device Data https://www.rittmanmead.com/blog/2017/11/taking-ksql-for-a-spin-using-real-time-device-data/ Jhon IBM leads BigInsights for Hadoop out behind barn. Shots heard https://www.theregister.co.uk/2017/11/08/ibm_retires_biginsights_for_hadoop/ If you want to be a data scientist, you need to know about these 6 trends https://www.siliconrepublic.com/advice/data-scientist-trends Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 62 – Second Year Anniversary

21/11/2017 Duración: 01h28min

Are there really two years worth of Roaring Elephant podcasts out there? Well, since this is our second anniversary party, it must be! Join some of the guests we had on the podcast this year to reminisce about the months gone by. Due to the drop-in drop-out nature, this episode is a little rough but we hope you can enjoy being part of our little party! Discussion topics ranged from what our guests have been up to, Apache Kafka, Dremio the effects of GDPR on the industry and how our guests see the future of Big Data. Our returning guests today are: Eduardo Barbaro Sr. Data Scientist at Mobiquity, Inc – Europe https://www.linkedin.com/in/edbarbaro/ Marcel-Jan Krijgsman Data Engineer at Open Circle Solutions B.V. https://www.linkedin.com/in/marcel-jankrijgsman/ Youen Chéné CTO @Saagie https://www.linkedin.com/in/youenchene/ Pitt Fagan Senior Data Analyst at Zendesk https://www.linkedin.com/in/pittfagan/ Big Data Madison Meetups: https://www.meetup.com/BigDataMadison/ Please use the Contact F

Escucha
Episode 61 – Roaring News

14/11/2017 Duración: 31min

In this episode of Roaring News, we talk about the seemingly inevitable block chain, Fraud detection in banking and a celebration of the DevOps engineer. Dave: The continued journey to understand enterprise usage of block-chain http://fortune.com/2017/10/17/blockchain-berners-lee/ https://www.hyperledger.org/blog/2017/10/17/qa-does-blockchain-alleviate-security-concerns-or-create-new-challenges Jhon: StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Runtime with Apache Flink® https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink DevOps might be the key to your Big Data project success https://datahub.packtpub.com/big-data/devops-for-big-data-success/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 60 – Big Data Roles: Recruiting and hiring

07/11/2017 Duración: 47min

In this entry in our "Roles in Big Data" series, we talk to Chuck Waygood, global director of talent Acquisition at Hortonworks. Chuck has been in this space since 2013 and in this episode he talks about his experiences, what recruiters are looking for, how you can attract that perfect candidate and what you can do to improve your chances of landing that great career in Big Data. Chuck Waygood Director, Global Talent Acquisition at Hortonworks https://www.linkedin.com/in/chuckwaygood/ Please use the Contact Form on this blog or our twitter feed to send us your questio

Escucha
Episode 59 – Roaring News

31/10/2017 Duración: 35min

It's another installment of Roaring News! This time, we talk about the ensemble recommendation system allegedly used by Spotify, not-so-new kid-on-the-block-after-all Apache Pulsar, the ever so popular "Hadoop is dead" and end with a quick shout-out to the Tokyo Data Platform Conference. Dave Apache Pulsar https://pulsar.apache.org/ https://www.slideshare.net/ydn/october-2016-hug-pulsar-a-highly-scalable-low-latency-pubsub-messaging-system https://streaml.io/blog/apache-pulsar-geo-replication/ https://streaml.io/blog/geo-replication-patterns-practices/ https://news.ycombinator.com/item?id=12453080 Data Platform Conference Tokyo http://dataplatform.jp/ Jhon Spotify’s Discover Weekly: How machine learning finds your new music https://hackernoon.com/spotifys-discover-weekly-how-machine-learning-finds-your-new-music-19a41ab76efe Hadoop Was Hard to Find at Strata This Week https://www.datanami.com/2017/09/29/hadoop-hard-find-strata-week/ Please use the Contact Form

Escucha
Episode 58 – Big Data Roles: The data scientist

24/10/2017 Duración: 01h09min

In this entry in our long-running "roles in Big Data" series, we talk to Eduardo Barbaro, a Sr. Data Scientist at Mobiquity. To say that the data scientist is a pivotal person in any big data or advanced analytics project is not an exaggeration and we are really grateful to Eduardo for spending some time on the podcast to give us his views and recount his experiences. Eduardo Barbaro Sr. Data Scientist at Mobiquity, Inc - Europe https://www.linkedin.com/in/edbarbaro/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha
Episode 57 – Dataworks Summit Sydney recap by Dave – Part 2

17/10/2017 Duración: 57min

In this second part of Dave's tale of the Sidney Dataworks Summit, the subjects range from Apache Metron, a talk by Telstra, Australia's leading mobile provider, Yarn 3.0 and Apache Zeppelin Solving Cyber at Scale - Simon Ball https://www.slideshare.net/Hadoop_Summit/solving-cyber-at-scale-80187657 Implementing greenfield Apache Metron SOC – Telstra - Saad Ayad Slides not available :( Yarn past present future - Rohith Sharma KS - Sunil G https://www.slideshare.net/Hadoop_Summit/yarn-past-present-future Model as a service - Casey Stella https://www.slideshare.net/Hadoop_Summit/maas-model-as-a-service-modern-streaming-data-science-with-apache-metron-incubating Protecting your Critical Hadoop Clusters against Disasters - Jeff Sposetti / Sankar Hariappan https://www.slideshare.net/Hadoop_Summit/protecting-your-critical-hadoop-clusters-against-disasters Running Zeppelin in the Enterprise https://www.slideshare.net/Hadoop_Summit/running-zeppelin-in-enterprise-80

Escucha
Episode 56 – Dataworks Summit Sydney recap by Dave – Part 1

10/10/2017 Duración: 01h02min

Dave has attended the Dataworks Summit in Sidney and we go over the different sessions he attended there. In this first of two episodes, the focus lies on the new goodness that Hadoop 3.0 will bring us soon. Hadoop 3.0 – Sanjay Radia https://www.slideshare.net/Hadoop_Summit/apache-hadoop-30-community-update-79999467 JDK 8+ Port number changes Class-path isolation HDFS – 3 node Namenode, intra data node balancer for balanced storage within a node, erasure coding 10TB node recovering in a few hours on a large cluster (3000 nodes) Erasure coding 2012, 2013, 2014 Erasure coding methods, blogs or stripes Surprisingly little performance difference for EC, what’s not shown is the network bandwidth cost, which is significantly higher Yarn 3.0 Scheduler, priorities within a queue Q – Inter queue priorities Long running services, dynamic container configuration, cpu and io easy, hard to do memory Service discovery in YARN via zookeeper, dns Elastic resource model, graceful decommissi

Escucha
Episode 55 – Roaring News

03/10/2017 Duración: 46min

In this edition of Roaring News, Dave covers the release of Apache Metron based HCP 1.3 and an HBase vs Cassandra benchmark battle. Jhon talks about some Spark tuning and scheduler inner-workings and finishes with a tale of a compliance kettle... Dave HCP 1.3 release https://hortonworks.com/blog/hortonworks-cybersecurity-platform-big-data-cybersecurity-solution/ https://docs.hortonworks.com/HDPDocuments/HCP1/HCP-1.3.0/bk_release-notes/content/ch01.html Battle of the Apache NoSQL heavyweights https://hortonworks.com/blog/hbase-cassandra-benchmark/ Jhon Spark Performance Tuning: A Checklist https://medium.com/zero-gravity-labs/spark-performance-tuning-a-checklist-abb3c80efb44 How the Spark Scheduler Work http://www.russellspitzer.com/2017/09/01/Spark-Locality/ A tale of a compliance kettle… https://cupfighter.net/2017/09/a-tale-of-a-compliance-kettle Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future

Escucha
Episode 54 – Hadoop sizing part 1: One big cluster, or many small ones

26/09/2017 Duración: 52min

In this episode, we took an online article by Chris Riccomini and give our take on the discussion on having a single big cluster versus many smaller ones. If you are architecting a Hadoop cluster and are faced with this choice, this episode should give you a lot of information on the subject. One big cluster, or many small ones? by Chris Riccomini https://medium.com/@criccomini/one-big-cluster-or-many-small-ones-5f3126ed7045 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Escucha

|<
<<
>>
>|

página 21 de 24