Constants Are Changing : a Software and Technology scrapbook: New ML frameworks: Google TensorFlow and Samsung VELES

TensorFlow: Google Open Sources Their Machine Learning Tool

TensorFlow is a machine learning library created by the Brain Team researchers at Google and now open sourced under the Apache License 2.0. TensorFlow is detailed in the whitepaperTensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. The source code can be found on Google Git.

TensorFlow is a tool for writing and executing machine learning algorithms. Computations are done in a data flow graph where the nodes are mathematical operations and the edges aretensors (multidimensional data arrays) that are exchanged between nodes. An user constructs the graph and writes the algorithms that executed on each node. TensorFlow takes care of executing the code asynchronously on different devices, cores, and threads.

TensorFlow runs on CPU and GPUs on the desktop, server or mobile devices. It can be containerized with Docker to be deploy in the cloud. The version that is open sourced runs on single machines, not on clusters.

TensorFlow has a complete Python API and C++ interface for building and executing graphs. It also has a C-based client API. Google invites the community to write interfaces in other languages, the most probably being Lua, R, Java, Go and JavaScript.

Google considers the library is not final and will continue to improve it. They will make public some of the actual implementations they have created.

TensorFlow is used by Google for GMail (SmartReply), Search (RankBrain), Pictures (Inception Image Classification Model), Translator (Character Recognition), and other products.

Google TensorFlow

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

What is a Data Flow Graph?

Data flow graphs describe mathematical computation with a directed graph of nodes & edges. Nodes typically implement mathematical operations, but can also represent endpoints to feed in data, push out results, or read/write persistent variables. Edges describe the input/output relationships between nodes. These data edges carry dynamically-sized multidimensional data arrays, or tensors. The flow of tensors through the graph is where TensorFlow gets its name. Nodes are assigned to computational devices and execute asynchronously and in parallel once all the tensors on their incoming edges becomes available.

Deep Flexibility

TensorFlow isn't a rigid neural networks library. If you can express your computation as a data flow graph, you can use TensorFlow. You construct the graph, and you write the inner loop that drives computation. We provide helpful tools to assemble subgraphs common in neural networks, but users can write their own higher-level libraries on top of TensorFlow. Defining handy new compositions of operators is as easy as writing a Python function and costs you nothing in performance. And if you don't see the low-level data operator you need, write a bit of C++ to add a new one.

True Portability

TensorFlow runs on CPUs or GPUs, and on desktop, server, or mobile computing platforms. Want to play around with a machine learning idea on your laptop without need of any special hardware? TensorFlow has you covered. Ready to scale-up and train that model faster on GPUs with no code changes? TensorFlow has you covered. Want to deploy that trained model on mobile as part of your product? TensorFlow has you covered. Changed your mind and want to run the model as a service in the cloud? Containerize with Docker and TensorFlow just works.

Connect Research and Production

Gone are the days when moving a machine learning idea from research to product require a major rewrite. At Google, research scientists experiment with new algorithms in TensorFlow, and product teams use TensorFlow to train and serve models live to real customers. Using TensorFlow allows industrial researchers to push ideas to products faster, and allows academic researchers to share code more directly and with greater scientific reproducibility.

Auto-Differentiation

Gradient based machine learning algorithms will benefit from TensorFlow's automatic differentiation capabilities. As a TensorFlow user, you define the computational architecture of your predictive model, combine that with your objective function, and just add data -- TensorFlow handles computing the derivatives for you. Computing the derivative of some values w.r.t. other values in the model just extends your graph, so you can always see exactly what's going on.

Language Options

TensorFlow comes with an easy to use Python interface and a no-nonsense C++ interface to build and execute your computational graphs. Write stand-alone TensorFlow Python or C++ programs, or try things out in an interactive TensorFlow iPython notebook where you can keep notes, code, and visualizations logically grouped. This is just the start though -- we’re hoping to entice you to contribute SWIG interfaces to your favorite language -- be it Go, Java, Lua, Javascript, or R.

Maximize Performance

Want to use every ounce of muscle in that workstation with 32 CPU cores and 4 GPU cards? With first-class support for threads, queues, and asynchronous computation, TensorFlow allows you to make the most of your available hardware. Freely assign compute elements of your TensorFlow graph to different devices, and let TensorFlow handle the copies.

Samsung VELES

Neural networks done right. With a human face.

veles.znicz NN engine focuses on performance and flexibility. It has little hard-coded entities and enables training of all the widely recognized topologies, such as fully connected nets, convolutional nets, reccurent nets etc.

In-depth introspection and rich debugging facilities helps to develop nets quickly with predictable result.

All backends have the same interface and yield the same calculation results or accuracy, with single or doubleprecision.

Veles and Hadoop are friends.

Veles is bundled with Mastodon, a subproject which integrates it with any Java application. Mastodon includes a simple load balancer between Java nodes and Veles slaves.

Extract features in Hadoop and classify objects in Veles. Analyze data in Veles and pass the results next through the pipeline.

Machine learning model execution goes in a workflow, which consists of unitsand child workflows. Units represent logically independent and integral algorithm parts. E.g. single network layer is a unit; the gradient descent step is another unit.

Veles uses a typical data driven approach, where control flow is ruled by data processing through control flow links between units.

Such architecture allows easily manageable data parallel and model parallel execution of models.

Veles is cluster and Docker friendly. Deploy the exported tarball, initialize the pyenv environment and you are ready to go - with just 2 simple commands.

REST API allows to use the trained model in a production environment at once.

Manhole feature allows to execute an interactive IPython session in the running process context at any time.

Live reload feature allows changing the Python code without restarting the process.

Constants Are Changing : a Software and Technology scrapbook

Tuesday, 10 November 2015

New ML frameworks: Google TensorFlow and Samsung VELES