May 2-4, 2018 - Copenhagen, Denmark
Click Here For Information & Registration
Back To Schedule
Tuesday, May 1 • 17:40 - 17:45
Lightning Talk: Scaling Distributed Deep Learning with Service Discovery: How CoreDNS Helps Distributed TensorFlow Tasks - Yong Tang, Infoblox Inc. (Intermediate Skill Level) (Slides Attached)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Training models with modern deep learning architecture is often computationally intensive and requires an efficient distributed system at scale. Such systems in distributed machine learning community often have special requirements and may involve additional efforts.

This talk discusses the usage of CoreDNS for service discovery on distributed TensorFlow clusters for resolving deep learning problems.

While CoreDNS has been widely used for service discovery in Kubernetes, its unique plugin based design allows CoreDNS to be easily extended and deployed in non-traditional distributed systems as well.

Deployed on cloud (AWS), our distributed TensorFlow clusters have been greatly helped by CoreDNS for robustness against partial node failures. The deployment has also been simplified for non-DevOps (e.g., machine learning researchers) to launch and execute deep learning tasks at great ease.


Yong Tang

Director of Engineering, MobileIron
Yong Tang is the Director of Engineering at MobileIron working on cloud infrastructure. He contributes to different container and machine learning projects for the open source community. He is a maintainer of CoreDNS and Docker/Moby projects, and had multiple talks in KubeCon before... Read More →

Tuesday May 1, 2018 17:40 - 17:45 CEST
Auditorium 10-12
  Lightning Talk, Intermediate