Roaring Elephant

Episode 56 – Dataworks Summit Sydney recap by Dave – Part 1

Informações:

Sinopsis

Dave has attended the Dataworks Summit in Sidney and we go over the different sessions he attended there. In this first of two episodes, the focus lies on the new goodness that Hadoop 3.0 will bring us soon. Hadoop 3.0 – Sanjay Radia https://www.slideshare.net/Hadoop_Summit/apache-hadoop-30-community-update-79999467 JDK 8+ Port number changes Class-path isolation HDFS – 3 node Namenode, intra data node balancer for balanced storage within a node, erasure coding 10TB node recovering in a few hours on a large cluster (3000 nodes) Erasure coding 2012, 2013, 2014 Erasure coding methods, blogs or stripes Surprisingly little performance difference for EC, what’s not shown is the network bandwidth cost, which is significantly higher Yarn 3.0 Scheduler, priorities within a queue Q – Inter queue priorities Long running services, dynamic container configuration, cpu and io easy, hard to do memory Service discovery in YARN via zookeeper, dns Elastic resource model, graceful decommissi