Time Series Analysis #1: Introduction to Window Functions | Pivotal P.O.V.: "Time series data is an ordered sequence of observations of a particular variable, usually at evenly spaced time intervals. It is found in many real world applications, including click stream processing, financial analysis, and sensor data. Modeling time series data within a database presents a challenge, in that the fundamental ordered nature of the data will cause many of the interesting calculations to be outside of the traditional relational calculus. Fortunately, there are tools in the analyst’s toolbox that can aid in solving many common time series related problems.
The first of those tools, and the subject of this article, is the Window Function. First introduced in the SQL 2003 Standards specification, Window Functions enable queries to perform certain types of ordered capabilities. This is primarily accomplished through the “OVER” clause, which enables the specification of a “window” over which the calculation should be performed."
'via Blog this'
Be warned that this is mostly just a collection of links to articles and demos by smarter people than I. Areas of interest include Java, C++, Scala, Go, Rust, Python, Networking, Cloud, Containers, Machine Learning, the Web, Visualization, Linux, System Performance, Software Architecture, Microservices, Functional Programming....
Saturday, 30 April 2016
Friday, 29 April 2016
10 Common Mistakes Java Developers Make when Writing SQL | Java, SQL and jOOQ.
10 Common Mistakes Java Developers Make when Writing SQL | Java, SQL and jOOQ.: "Java developers mix object-oriented thinking with imperative thinking, depending on their levels of:
Skill (anyone can code imperatively)
Dogma (some use the “Pattern-Pattern”, i.e. the pattern of applying patterns everywhere and giving them names)
Mood (true OO is more clumsy to write than imperative code. At first)
But when Java developers write SQL, everything changes. SQL is a declarative language that has nothing to do with either object-oriented or imperative thinking. It is very easy to express a query in SQL. It is not so easy to express it optimally or correctly. Not only do developers need to re-think their programming paradigm, they also need to think in terms of set theory.
Here are common mistakes that a Java developer makes when writing SQL through JDBC or jOOQ (in no particular order)."
'via Blog this'
Skill (anyone can code imperatively)
Dogma (some use the “Pattern-Pattern”, i.e. the pattern of applying patterns everywhere and giving them names)
Mood (true OO is more clumsy to write than imperative code. At first)
But when Java developers write SQL, everything changes. SQL is a declarative language that has nothing to do with either object-oriented or imperative thinking. It is very easy to express a query in SQL. It is not so easy to express it optimally or correctly. Not only do developers need to re-think their programming paradigm, they also need to think in terms of set theory.
Here are common mistakes that a Java developer makes when writing SQL through JDBC or jOOQ (in no particular order)."
'via Blog this'
Thursday, 28 April 2016
Rainer Grimm: Multithreading in modern C++
http://www.modernescpp.com/index.php/multithreading-in-modern-c
With the new C++11 Standard, C++ faces the first time the challenges of multicore architectures. The 2011 published standard defines how a C++ program has to behave in the presence of multiple threads. The C++11 multithreading capabilities are composed of two components. This is on the one hand, the defined memory model, which is on the other hand, the standardized threading interface.
Wednesday, 27 April 2016
My Top 100 Programming, Computer and Science Books: Part One - good coders code, great coders reuse
My Top 100 Programming, Computer and Science Books: Part One - good coders code, great coders reuse: "I was recently interviewed by Fog Creek and one of the questions was about my favorite programming, coding and development books. I got very excited by this question as I'm a huge book nerd. And by a huge book nerd I mean I'm crazy about science, computer and programming books. Every few months I spend a day or two researching the latest literature and buying the most interesting titles. I can probably go on forever about my favorite books. I've so many.
I was so excited about this question that I decided to start a new article series here on catonmat about my top 100 programming, software development, science, physics, mathematics and computer books. I'll do five books at a time as breaking huge tasks in tiny sub tasks is the easiest way to get things done."
'via Blog this'
I was so excited about this question that I decided to start a new article series here on catonmat about my top 100 programming, software development, science, physics, mathematics and computer books. I'll do five books at a time as breaking huge tasks in tiny sub tasks is the easiest way to get things done."
'via Blog this'
OpenMP - Open Multi-Processing
OpenMP - Wikipedia, the free encyclopedia: "OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran,[3] on most platforms, processor architectures and operating systems, including Solaris, AIX, HP-UX, Linux, OS X, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.[2][4][5]
OpenMP is managed by the nonprofit technology consortium OpenMP Architecture Review Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software vendors, including AMD, IBM, Intel, Cray, HP, Fujitsu, Nvidia, NEC, Red Hat, Texas Instruments, Oracle Corporation, and more.[1]
OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.
An application built with the hybrid model of parallel programming can run on a computer cluster using both OpenMP and Message Passing Interface (MPI), or more transparently through the use of OpenMP "
'via Blog this'
OpenMP is managed by the nonprofit technology consortium OpenMP Architecture Review Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software vendors, including AMD, IBM, Intel, Cray, HP, Fujitsu, Nvidia, NEC, Red Hat, Texas Instruments, Oracle Corporation, and more.[1]
OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.
An application built with the hybrid model of parallel programming can run on a computer cluster using both OpenMP and Message Passing Interface (MPI), or more transparently through the use of OpenMP "
'via Blog this'
Tuesday, 26 April 2016
Event Sourced Systems is an Anti-Pattern
Event Sourced Systems is an Anti-Pattern: "The single biggest bad thing that Young has seen during the last ten years is the common anti-pattern of building a whole system based on Event sourcing. That is a really big failure, effectively creating an event sourced monolith. CQRS and Event sourcing are not top-level architectures and normally they should be applied selectively just in few places.
The last problem Young mentions is the lack of Process managers. Building a system with many services each directly subscribing to events from other services can make it very hard to understand what the system actually does. Finding the overall process can be quite difficult without going through the code in each service. Young notes that Process managers are difficult but believes that the best explanation is found in the Enterprise Integration Patterns book.
Looking into the future Young believes we will see more functional programming along with event sourcing. He notes that even though event sourcing often is tied into object orientation it’s a purely functional model making it fit very well with functional code and he thinks we will see an increasing amount of functional code on top of event sourced systems."
'via Blog this'
The last problem Young mentions is the lack of Process managers. Building a system with many services each directly subscribing to events from other services can make it very hard to understand what the system actually does. Finding the overall process can be quite difficult without going through the code in each service. Young notes that Process managers are difficult but believes that the best explanation is found in the Enterprise Integration Patterns book.
Looking into the future Young believes we will see more functional programming along with event sourcing. He notes that even though event sourcing often is tied into object orientation it’s a purely functional model making it fit very well with functional code and he thinks we will see an increasing amount of functional code on top of event sourced systems."
'via Blog this'
Segfaults are our friends and teachers
Segfaults are our friends and teachers: "If you’ve written some C, you’ve almost certainly seen a segmentation fault at some point. I spent a long time thinking of it as a ‘thing that happens when you use pointers wrong in C’. That’s mostly true, but a segmentation fault actually has a very specific meaning. You get one when a process tries to access memory in a way that it’s not allowed to, or accessing invalid memory."
'via Blog this'
'via Blog this'
infoQ:Top 10 Performance Mistakes
Top 10 Performance Mistakes: "Martin Thompson, co-founder of LMAX, keynoted on performance at QCon São Paulo 2016. Initially entitled “Top Performance Myths and Folklore”, Thompson renamed the presentation “Top 10 Performance Mistakes” because “we all make mistakes and it’s very easy to do that.”
This is a digest of the top 10 performance related mistakes he has seen in production, including advice on avoiding them."
8. Data Dependent Loads. Thompson presented the results of a benchmark measuring the time needed to perform one operation when attempting to sum up all longs in a 1GB array located in memory (RAM). The time depends on how the memory is accessed, and is presented in the following table:
The results of this benchmark show that not all memory operations are equal, and one needs to be careful how it deals with them. Thompson considers that it is important to know some basics regarding the performance of various data structures, noting that Java’s HashMap is over ten times slower than .NET Dictionary for structures larger than 2GB. He added that there are cases when .NET is much slower than Java.
7. Too Much Allocation. While allocation is almost free in many cases, claiming back that memory is not free because the garbage collector needs significant time when working on large sets of data. When lots of data is allocated, the cache is filled up and older data is discarded, making operations on that data to take 90ns/op instead of 7ns/op, which is more than an order of magnitude slower.
6. Going Parallel. While going parallel is very attractive for certain algorithms, there are some limitations and overhead associated with it. Thompson cited the paper Scalability! But at what COST? in which the authors compare parallel systems with single threaded ones by introducing COST (Configuration that Outperforms a Single Thread), defined as
In this context Thompson remarked that there is a certain communication and synchronization overhead associated with parallel tasks, and some of the activity being intrinsically serial and not parallelizable. According to Amdahl Law, if 5% of a system’s activity needs to be serial, then the system’s speed improvement will be maximum 20 times no matter how many processors are used.
5. Not Understanding TCP. On this one Thompson remarked that many are considering a microservices architecture without a solid understanding of TCP. In certain cases it is possible to experience delayed ACK limiting the number of packets sent over the wire to 2-5 per second. This is due to a deadlock created by two algorithms introduced in TCP: Nagle and TCP Delayed Acknowledgement. There is a 200-500 ms timeout that interrupts the deadlock, but the communication between microservices is severally affected by it. The recommended solution is to use TCP_NODELAY which disables Nagle’s algorithm and multiple smaller packets can be sent one after the other. The difference is between 5 and 500 req/sec, according to Thompson.
4. Synchronous Communications. Synchronous communication between a client and a server incurs a time penalty which becomes problematic in a system where machines need fast communication. The solution is not buying more expensive and faster hardware but using asynchronous communication, said Thompson. In this case, a client can send multiple requests to the server without having to wait on the response between them. This approach requires a change in how the client deals with responses, but it is worthwhile.
3. Text Encoding. Many times developers choose to send data over the wire using a text encoding format such as JSON, XML or Base64 because “it is human readable.” But Thompson noted that no human reads it when two systems talk to each other. It may be easier to debug with a simple text editor, but there is a high CPU penalty related to converting binary data to text and back. The solution is using better tools that understand binary, Thompson mentioning Wireshark
1. Logging. For the #1 performance culprit Thompson listed the time spent for logging. He showed a graph depicting the average time spent for a logging operation when the number of threads increases:
'via Blog this'
This is a digest of the top 10 performance related mistakes he has seen in production, including advice on avoiding them."
8. Data Dependent Loads. Thompson presented the results of a benchmark measuring the time needed to perform one operation when attempting to sum up all longs in a 1GB array located in memory (RAM). The time depends on how the memory is accessed, and is presented in the following table:
The results of this benchmark show that not all memory operations are equal, and one needs to be careful how it deals with them. Thompson considers that it is important to know some basics regarding the performance of various data structures, noting that Java’s HashMap is over ten times slower than .NET Dictionary for structures larger than 2GB. He added that there are cases when .NET is much slower than Java.
7. Too Much Allocation. While allocation is almost free in many cases, claiming back that memory is not free because the garbage collector needs significant time when working on large sets of data. When lots of data is allocated, the cache is filled up and older data is discarded, making operations on that data to take 90ns/op instead of 7ns/op, which is more than an order of magnitude slower.
6. Going Parallel. While going parallel is very attractive for certain algorithms, there are some limitations and overhead associated with it. Thompson cited the paper Scalability! But at what COST? in which the authors compare parallel systems with single threaded ones by introducing COST (Configuration that Outperforms a Single Thread), defined as
The COST of a given platform for a given problem is the hardware configuration required before the platform outperforms a competent single-threaded implementation. COST weighs a system’s scalability against the overheads introduced by the system, and indicates the actual performance gains of the system, without rewarding systems that bring substantial but parallelizable overheads.The authors have analyzed the measurements of various data-parallel systems and concluded that “many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations.”
In this context Thompson remarked that there is a certain communication and synchronization overhead associated with parallel tasks, and some of the activity being intrinsically serial and not parallelizable. According to Amdahl Law, if 5% of a system’s activity needs to be serial, then the system’s speed improvement will be maximum 20 times no matter how many processors are used.
5. Not Understanding TCP. On this one Thompson remarked that many are considering a microservices architecture without a solid understanding of TCP. In certain cases it is possible to experience delayed ACK limiting the number of packets sent over the wire to 2-5 per second. This is due to a deadlock created by two algorithms introduced in TCP: Nagle and TCP Delayed Acknowledgement. There is a 200-500 ms timeout that interrupts the deadlock, but the communication between microservices is severally affected by it. The recommended solution is to use TCP_NODELAY which disables Nagle’s algorithm and multiple smaller packets can be sent one after the other. The difference is between 5 and 500 req/sec, according to Thompson.
4. Synchronous Communications. Synchronous communication between a client and a server incurs a time penalty which becomes problematic in a system where machines need fast communication. The solution is not buying more expensive and faster hardware but using asynchronous communication, said Thompson. In this case, a client can send multiple requests to the server without having to wait on the response between them. This approach requires a change in how the client deals with responses, but it is worthwhile.
3. Text Encoding. Many times developers choose to send data over the wire using a text encoding format such as JSON, XML or Base64 because “it is human readable.” But Thompson noted that no human reads it when two systems talk to each other. It may be easier to debug with a simple text editor, but there is a high CPU penalty related to converting binary data to text and back. The solution is using better tools that understand binary, Thompson mentioning Wireshark
1. Logging. For the #1 performance culprit Thompson listed the time spent for logging. He showed a graph depicting the average time spent for a logging operation when the number of threads increases:
'via Blog this'
Monday, 25 April 2016
Dan Luu: Recommended Programming blogs
Programming blogs: "Programming blogs
This is one of those “N technical things every programmer must read” lists, except that “programmer” is way too broad a term and the styles of writing people find helpful for them are too different for any such list to contain a non-zero number of items (if you want the entire list to be helpful to everyone). So here’s a list of some things you might want to read, and why you might (or might not) want to read them."
'via Blog this'
This is one of those “N technical things every programmer must read” lists, except that “programmer” is way too broad a term and the styles of writing people find helpful for them are too different for any such list to contain a non-zero number of items (if you want the entire list to be helpful to everyone). So here’s a list of some things you might want to read, and why you might (or might not) want to read them."
'via Blog this'
Introduction to Apache Kafka | Voxxed
Introduction to Apache Kafka | Voxxed: "The following diagram shows a typical Kafka cluster architecture:
Kafka uses ZooKeeper behind the scenes in order to keep its nodes in synch. The Kafka binaries provide it, so if the hosting machines don’t have ZooKeeper on board you can use the one that comes bundled with Kafka.
The communication between clients and servers happens using a high performant and language agnostic TCP protocol.
Although Kafka has been implemented in Scala, don’t worry if you are not familiar with this programming language. APIs to build producers and consumers are available in Java and other languages.
There are several use cases for Kafka. Here I am going to show a short list, but there are many more scenarios that you could add:
Messaging
Stream processing
Log Aggregation
Metrics
Web activities tracking
Event Sourcing"
'via Blog this'
Kafka uses ZooKeeper behind the scenes in order to keep its nodes in synch. The Kafka binaries provide it, so if the hosting machines don’t have ZooKeeper on board you can use the one that comes bundled with Kafka.
The communication between clients and servers happens using a high performant and language agnostic TCP protocol.
Although Kafka has been implemented in Scala, don’t worry if you are not familiar with this programming language. APIs to build producers and consumers are available in Java and other languages.
There are several use cases for Kafka. Here I am going to show a short list, but there are many more scenarios that you could add:
Messaging
Stream processing
Log Aggregation
Metrics
Web activities tracking
Event Sourcing"
'via Blog this'
Saturday, 23 April 2016
asyncio 3.4.3 : Python Package Index
asyncio 3.4.3 : Python Package Index: "The asyncio module provides infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives. Here is a more detailed list of the package contents:
a pluggable event loop with various system-specific implementations;
transport and protocol abstractions (similar to those in Twisted);
concrete support for TCP, UDP, SSL, subprocess pipes, delayed calls, and others (some may be system-dependent);
a Future class that mimics the one in the concurrent.futures module, but adapted for use with the event loop;
coroutines and tasks based on yield from (PEP 380), to help write concurrent code in a sequential fashion;
cancellation support for Futures and coroutines;
synchronization primitives for use between coroutines in a single thread, mimicking those in the threading module;
an interface for passing work off to a threadpool, for times when you absolutely, positively have to use a library that makes blocking I/O calls."
'via Blog this'
a pluggable event loop with various system-specific implementations;
transport and protocol abstractions (similar to those in Twisted);
concrete support for TCP, UDP, SSL, subprocess pipes, delayed calls, and others (some may be system-dependent);
a Future class that mimics the one in the concurrent.futures module, but adapted for use with the event loop;
coroutines and tasks based on yield from (PEP 380), to help write concurrent code in a sequential fashion;
cancellation support for Futures and coroutines;
synchronization primitives for use between coroutines in a single thread, mimicking those in the threading module;
an interface for passing work off to a threadpool, for times when you absolutely, positively have to use a library that makes blocking I/O calls."
'via Blog this'
Friday, 22 April 2016
MapReduce Patterns, Algorithms, and Use Cases – Highly Scalable Blog
MapReduce Patterns, Algorithms, and Use Cases – Highly Scalable Blog: "MAPREDUCE PATTERNS, ALGORITHMS, AND USE CASES
In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical case studies are also provided. All descriptions and code snippets use the standard Hadoop’s MapReduce model with Mappers, Reduces, Combiners, Partitioners, and sorting. This framework is depicted in the figure below.
MapReduce Framework"
https://highlyscalable.files.wordpress.com/2012/02/map-reduce.png
'via Blog this'
In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several practical case studies are also provided. All descriptions and code snippets use the standard Hadoop’s MapReduce model with Mappers, Reduces, Combiners, Partitioners, and sorting. This framework is depicted in the figure below.
MapReduce Framework"
https://highlyscalable.files.wordpress.com/2012/02/map-reduce.png
'via Blog this'
git: fetch and merge, don’t pull | Mark's Blog
git: fetch and merge, don’t pull | Mark's Blog: "one of the git tips that I find myself frequently passing on to people is:
Don’t use git pull, use git fetch and then git merge.
The problem with git pull is that it has all kinds of helpful magic that means you don’t really have to learn about the different types of branch in git. Mostly things Just Work, but when they don’t it’s often difficult to work out why. What seem like obvious bits of syntax for git pull may have rather surprising results, as even a cursory look through the manual page should convince you.
The other problem is that by both fetching and merging in one command, your working directory is updated without giving you a chance to examine the changes you’ve just brought into your repository. Of course, unless you turn off all the safety checks, the effects of a git pull on your working directory are never going to be catastrophic, but you might prefer to do things more slowly so you don’t have to backtrack."
'via Blog this'
Don’t use git pull, use git fetch and then git merge.
The problem with git pull is that it has all kinds of helpful magic that means you don’t really have to learn about the different types of branch in git. Mostly things Just Work, but when they don’t it’s often difficult to work out why. What seem like obvious bits of syntax for git pull may have rather surprising results, as even a cursory look through the manual page should convince you.
The other problem is that by both fetching and merging in one command, your working directory is updated without giving you a chance to examine the changes you’ve just brought into your repository. Of course, unless you turn off all the safety checks, the effects of a git pull on your working directory are never going to be catastrophic, but you might prefer to do things more slowly so you don’t have to backtrack."
'via Blog this'
Wednesday, 20 April 2016
Immutables.org
Immutables.org: "Java annotation processors to generate simple, safe and consistent value objects. Do not repeat yourself, try Immutables, the most comprehensive tool in this field!"
'via Blog this'
'via Blog this'
Combining Angular 2 with React Native
Combining Angular 2 with React Native: "Angular 2’s architecture makes it possible to render an application with various renderers including React Native.
One of the fundamental architectural decisions made for Angular 2 was the separation of the framework in two layers: the core, dealing with components, directives, filters, services, router, change detection, DI, I18n –, and the renderer, dealing with DOM, CSS, animation, templates, web components, custom events, etc. The core can be executed in a separate process, decoupling it from the interface and making the later more responsive when the core has lots of processing to do. More about this decision can be found in the Angular 2 Rendering Architecture document.
Traditionally, rendering an Angular.JS application was done via the DOM inside the browser, but now it is possible to draw the application through other renderers, including native ones on the desktop or mobile devices or even on the server. Rendering in Angular2 explains in more detail how Angular 2 can use different renderers to do its job.
This separation of rendering from the main app has multiple benefits. An Angular 2 application can run on Node.js, being very fast, according to Brad Green, Engineering Director at Google. “You can run Photoshop in this environment, why not?” And Node.js provides the needed access to the file system, processes and hardware. Also, Angular 2 can run on the desktop via Angular Electron or on Microsoft’s UWP."
'via Blog this'
One of the fundamental architectural decisions made for Angular 2 was the separation of the framework in two layers: the core, dealing with components, directives, filters, services, router, change detection, DI, I18n –, and the renderer, dealing with DOM, CSS, animation, templates, web components, custom events, etc. The core can be executed in a separate process, decoupling it from the interface and making the later more responsive when the core has lots of processing to do. More about this decision can be found in the Angular 2 Rendering Architecture document.
Traditionally, rendering an Angular.JS application was done via the DOM inside the browser, but now it is possible to draw the application through other renderers, including native ones on the desktop or mobile devices or even on the server. Rendering in Angular2 explains in more detail how Angular 2 can use different renderers to do its job.
This separation of rendering from the main app has multiple benefits. An Angular 2 application can run on Node.js, being very fast, according to Brad Green, Engineering Director at Google. “You can run Photoshop in this environment, why not?” And Node.js provides the needed access to the file system, processes and hardware. Also, Angular 2 can run on the desktop via Angular Electron or on Microsoft’s UWP."
'via Blog this'
Tuesday, 19 April 2016
The Ars guide to building a Linux router from scratch | Ars Technica
The Ars guide to building a Linux router from scratch | Ars Technica: "After finally reaching the tipping point with off-the-shelf solutions that can't match increasing speeds available, we recently took the plunge. Building a homebrew router turned out to be a better proposition than we could've ever imagined. With nearly any speed metric we analyzed, our little DIY kit outpaced routers whether they were of the $90- or $250-variety.
Naturally, many readers asked the obvious follow-up—"How exactly can we put that together?" Today it's time to finally pull back the curtain and offer that walkthrough. By taking a closer look at the actual build itself (hardware and software), the testing processes we used, and why we used them, hopefully any Ars readers of average technical abilities will be able to put together their own DIY speed machine. And the good news? Everything is as open source as it gets—the equipment, the processes, and the setup. If you want the DIY router we used, you can absolutely have it. This will be the guide to lead you, step-by-step."
'via Blog this'
Naturally, many readers asked the obvious follow-up—"How exactly can we put that together?" Today it's time to finally pull back the curtain and offer that walkthrough. By taking a closer look at the actual build itself (hardware and software), the testing processes we used, and why we used them, hopefully any Ars readers of average technical abilities will be able to put together their own DIY speed machine. And the good news? Everything is as open source as it gets—the equipment, the processes, and the setup. If you want the DIY router we used, you can absolutely have it. This will be the guide to lead you, step-by-step."
'via Blog this'
How HTTP/2 Is Changing Web Performance Best Practices | Voxxed
How HTTP/2 Is Changing Web Performance Best Practices | Voxxed: "The Hypertext Transfer Protocol (HTTP) underpins the World Wide Web and cyberspace. If that sounds dated, consider that the version of the protocol most commonly in use, HTTP 1.1, is nearly 20 years old. When it was ratified back in 1997, floppy drives and modems were must-have digital accessories and Java was a new, up-and-coming programming language. Ratified in May 2015, HTTP/2 was created to address some significant performance problems with HTTP 1.1 in the modern Web era. Adoption of HTTP/2 has increased in the past year as browsers, Web servers, commercial proxies, and major content delivery networks have committed to or released support.
Unfortunately for people who write code for the Web, transitioning to HTTP/2 isn’t always straightforward and a speed boost isn’t automatically guaranteed. The new protocol challenges some common wisdom when building performant Web applications and many existing tools—such as debugging proxies—don’t support it yet. This post is an introduction to HTTP/2 and how it changes Web performance best practices."
'via Blog this'
Unfortunately for people who write code for the Web, transitioning to HTTP/2 isn’t always straightforward and a speed boost isn’t automatically guaranteed. The new protocol challenges some common wisdom when building performant Web applications and many existing tools—such as debugging proxies—don’t support it yet. This post is an introduction to HTTP/2 and how it changes Web performance best practices."
'via Blog this'
Monday, 18 April 2016
All Machine Learning Models Have Flaws – Data Science Central
All Machine Learning Models Have Flaws – Data Science Central: "Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I've created a table (below) outlining the major flaws in some common models of machine learning. "
'via Blog this'
'via Blog this'
Friday, 15 April 2016
Learning How to Build a Web Application — Medium
Learning How to Build a Web Application — Medium: "Back in my undergraduate days, I did a LOT of mathematical proofs (e.g. Linear Algebra, Real Analysis, and my all time favorite/nightmare Measure Theory). In addition to learning how to think, I also learned to recognize many, and I mean many Greek and Hebrew letters.
However, as I took on more empirical work in graduate school, I realized that data visualization was often far more effective in communication than LaTeX alone. From crafting what information to convey to my readers, I learned that the art of presentation is far more important than I originally thought.
Luckily, I have always been a rather visual learner, so when it comes to beautiful data visualizations, they always grabbed my attentions and propel me to learn more. I began to wonder how people published their beautiful works on web; I became a frequent visitor of Nathan Yau’s FlowingData blog; And I am continued to be in awe when discovering visualizations like this, this, and this.
After many moments of envy, I couldn’t resist but to learn how to create them myself. This post is about the journey that I took to piece the puzzles together. By presenting my lessons learned here, I hope it can inspire all those who are interested in learning web development and data visualization to get started!"
'via Blog this'
However, as I took on more empirical work in graduate school, I realized that data visualization was often far more effective in communication than LaTeX alone. From crafting what information to convey to my readers, I learned that the art of presentation is far more important than I originally thought.
Luckily, I have always been a rather visual learner, so when it comes to beautiful data visualizations, they always grabbed my attentions and propel me to learn more. I began to wonder how people published their beautiful works on web; I became a frequent visitor of Nathan Yau’s FlowingData blog; And I am continued to be in awe when discovering visualizations like this, this, and this.
After many moments of envy, I couldn’t resist but to learn how to create them myself. This post is about the journey that I took to piece the puzzles together. By presenting my lessons learned here, I hope it can inspire all those who are interested in learning web development and data visualization to get started!"
'via Blog this'
How Synchronous REST Turns Microservices Back into Monoliths - The New Stack
How Synchronous REST Turns Microservices Back into Monoliths - The New Stack: "If you are breaking down a monolithic legacy application into a set of microservices, and if those microservices are communicating via REST (Representational State Transfer), then you still have, in effect, a monolithic application, asserted Lightbend tech lead, James Roper.
Roper laid down this heavy wisdom at the monthly New York Java Special Interest Group meeting on Wednesday, at the Viacom headquarters in Manhattan.
In the talk, Roper was a big advocate of asynchronous communication. Not surprisingly, Lightbend (formerly Typesafe) offers a Scala-based microservices platform, called Lagom, based on asynchronous communications.
Keep in mind, “asynchronous communications” is different from the commonly used term “asynchronous I/O.” Asynchronous I/O is all about not halting an operation to wait for a process thread to complete, while asynchronous communication, an artifact of microservices, is about designing a system such that one service doesn’t need to wait on another to complete its task.
“Using async communication, if a user makes a request to a service, and that service needs to make a request to another service, that communication is not going to block [the first service] service from returning a response to the user,” Roper said."
'via Blog this'
Roper laid down this heavy wisdom at the monthly New York Java Special Interest Group meeting on Wednesday, at the Viacom headquarters in Manhattan.
In the talk, Roper was a big advocate of asynchronous communication. Not surprisingly, Lightbend (formerly Typesafe) offers a Scala-based microservices platform, called Lagom, based on asynchronous communications.
Keep in mind, “asynchronous communications” is different from the commonly used term “asynchronous I/O.” Asynchronous I/O is all about not halting an operation to wait for a process thread to complete, while asynchronous communication, an artifact of microservices, is about designing a system such that one service doesn’t need to wait on another to complete its task.
“Using async communication, if a user makes a request to a service, and that service needs to make a request to another service, that communication is not going to block [the first service] service from returning a response to the user,” Roper said."
'via Blog this'
Thursday, 14 April 2016
Monitoring Docker Containers – docker stats, cAdvisor & Universal Control Plane | Voxxed
Monitoring Docker Containers – docker stats, cAdvisor & Universal Control Plane | Voxxed: "There are multiple ways to monitor Docker containers. This blog will explain a few simple and easy to use options:
docker stats command
Docker Remote API
cAdvisor
Prometheus
InfluxDB
Docker Universal Control Plane
Lets take a look at each one of them.
We’ll use a Couchbase server to gather the monitoring data."
'via Blog this'
docker stats command
Docker Remote API
cAdvisor
Prometheus
InfluxDB
Docker Universal Control Plane
Lets take a look at each one of them.
We’ll use a Couchbase server to gather the monitoring data."
'via Blog this'
Rust + nix = easier unix systems programming
Rust + nix = easier unix systems programming : "Lately I’m writing lots of Rust, and I’m particularly interested in systems programming on unix. I’ve been using and contributing to a library called nix1, whose mission is to provide ‘Rust friendly bindings to *nix APIs’.
In this blog post, I hope to convince you that you might want to reach for Rust and nix the next time you need to do some unix systems programming, especially if you aren’t fluent in C. It’s no harder to write, you won’t have to write more code, and it makes it much easier to avoid a few classes of mistakes."
'via Blog this'
In this blog post, I hope to convince you that you might want to reach for Rust and nix the next time you need to do some unix systems programming, especially if you aren’t fluent in C. It’s no harder to write, you won’t have to write more code, and it makes it much easier to avoid a few classes of mistakes."
'via Blog this'
Tuesday, 12 April 2016
Calculus Learning Guide | BetterExplained
Calculus Learning Guide | BetterExplained: "This is a realistic learning plan for Calculus based on the ADEPT method:
Explore high-level themes (analogies, diagrams) before the low-level details
See learning as a journey, not an all-or-nothing destination.
Acknowledge our limited motivation. What Aha! moments can we have in minutes, not weeks?"
'via Blog this'
Explore high-level themes (analogies, diagrams) before the low-level details
See learning as a journey, not an all-or-nothing destination.
Acknowledge our limited motivation. What Aha! moments can we have in minutes, not weeks?"
'via Blog this'
Dan Luu: Notes on Google's Site Reliability Engineering book
Notes on Google's Site Reliability Engineering book: "Notes on Google's Site Reliability Engineering book"
'via Blog this'
'via Blog this'
Monday, 11 April 2016
Brendan Gregg: SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs:
hap://techblog.neSlix.com/2015/11/linux-‐performance-‐analysis-‐in-‐60s.html
hap://techblog.neSlix.com/2015/11/linux-‐performance-‐analysis-‐in-‐60s.html
Why “Don’t Use Shared Libraries in Microservices” is Bad Advice - Evolvable Me
Why “Don’t Use Shared Libraries in Microservices” is Bad Advice - Evolvable Me: "If you’ve read a bit about microservices, you’ll probably have come across the mantra, “Don’t use shared libraries in microservices.” This is bad advice.
While the sentiment is borne from real issues and there’s real wisdom to be gained in this area, this little statement is too pithy, lacking the relevant context to make it useful. Consequently, it’s open to misinterpretation, in particular to being applied too liberally, and I’ve seen it misunderstood a number of times recently."
'via Blog this'
While the sentiment is borne from real issues and there’s real wisdom to be gained in this area, this little statement is too pithy, lacking the relevant context to make it useful. Consequently, it’s open to misinterpretation, in particular to being applied too liberally, and I’ve seen it misunderstood a number of times recently."
'via Blog this'
CPU pinning Java threads with jstack and taskset
CPU pinning Java threads with jstack and taskset: "Recently I've been tuning our market data feed processors. This is a single threaded "straight line speed" application that reads data from the network, parses it, converts to our internal data format, and writes out again.
I've already reduced the time taken for a benchmark workload (parse a large data feed file) by 68% using only code changes (going to blog this soon) but wanted to see if OS-level optimisations could reduce this further.
Here is an experiment into pinning the main application thread to a CPU core to see if that prevents loss of performance when the OS scheduler moves the thread between CPU cores or sockets (thus losing cache contents).
Results were measured using
perf stat -o perf.log java MyWorkloadClass
On Linux you can use the taskset command to pin a process to a CPU core.
First you need to discover the native ID for the Java thread you want to pin."
'via Blog this'
I've already reduced the time taken for a benchmark workload (parse a large data feed file) by 68% using only code changes (going to blog this soon) but wanted to see if OS-level optimisations could reduce this further.
Here is an experiment into pinning the main application thread to a CPU core to see if that prevents loss of performance when the OS scheduler moves the thread between CPU cores or sockets (thus losing cache contents).
Results were measured using
perf stat -o perf.log java MyWorkloadClass
On Linux you can use the taskset command to pin a process to a CPU core.
First you need to discover the native ID for the Java thread you want to pin."
'via Blog this'
Determining whether an application has poor cache performance – Red Hat Developer Blog
Determining whether an application has poor cache performance – Red Hat Developer Blog: "Modern computer systems include cache memory to hide the higher latency and lower bandwidth of RAM memory from the processor. The cache has access latencies ranging from a few processor cycles to ten or twenty cycles rather than the hundreds of cycles needed to access RAM. If the processor must frequently obtain data from the RAM rather than the cache, performance will suffer. With Red Hat Enterprise Linux 6 and newer distributions, the system use of cache can be measured with the perf utility available from the perf RPM.
perf uses the Performance Monitoring Units (PMUs) hardware in modern processors to collect data on hardware events such as cache accesses and cache misses without undue overhead on the system. The PMU hardware is processor implementation specific and the specific underlying events may differ between processors. For example one processor implementation measure the first-level cache events of the cache closest to the processor and another processor implementation may measure lower-level cache events for a cache farther from the processor and closer to main memory. The configuration of the cache may also differ between processors models; one processor in the processor family may have 2MB of last level cache and another member in the same processor family may have 8MB of last level cache. These differences makes direct comparison of event counts between processors difficult."
'via Blog this'
perf uses the Performance Monitoring Units (PMUs) hardware in modern processors to collect data on hardware events such as cache accesses and cache misses without undue overhead on the system. The PMU hardware is processor implementation specific and the specific underlying events may differ between processors. For example one processor implementation measure the first-level cache events of the cache closest to the processor and another processor implementation may measure lower-level cache events for a cache farther from the processor and closer to main memory. The configuration of the cache may also differ between processors models; one processor in the processor family may have 2MB of last level cache and another member in the same processor family may have 8MB of last level cache. These differences makes direct comparison of event counts between processors difficult."
'via Blog this'
Saturday, 9 April 2016
WebPagetest - Website Performance and Optimization Test
WebPagetest - Website Performance and Optimization Test: "Run a free website speed test from multiple locations around the globe using real browsers (IE and Chrome) and at real consumer connection speeds. You can run simple tests or perform advanced testing including multi-step transactions, video capture, content blocking and much more. Your results will provide rich diagnostic information including resource loading waterfall charts, Page Speed optimization checks and suggestions for improvements."
'via Blog this'
'via Blog this'
A quick tutorial on implementing and debugging malloc, free, calloc, and realloc
A quick tutorial on implementing and debugging malloc, free, calloc, and realloc: "A quick tutorial on implementing and debugging malloc, free, calloc, and realloc
Let’s write a malloc and see how it works with existing programs!"
'via Blog this'
Let’s write a malloc and see how it works with existing programs!"
'via Blog this'
Friday, 8 April 2016
Simple Sketches for Diagramming Your Software Architecture | Voxxed
Simple Sketches for Diagramming Your Software Architecture | Voxxed: "Informal boxes and lines sketches can work very well, but there are many pitfalls associated with communicating software designs in this way. My approach is to use a small collection of simple diagrams that each show a different part of the same overall story. In order to do this though, you need to agree on a simple way to think about the software system that you’re building.
Assuming an object oriented programming language, the way that I like to think about a software system is as follows: a software system is made up of a number of containers, which themselves are made up of a number of components, which in turn are implemented by one or more classes. It’s a simple hierarchy of logical technical building blocks that can be used to illustrate the static structure of most of the software systems I’ve ever encountered. With this set of abstractions in mind, you can then draw diagrams at each level in turn. I call this my C4 model: context, containers, components and classes. Some diagrams will help to explain this further."
'via Blog this'
Assuming an object oriented programming language, the way that I like to think about a software system is as follows: a software system is made up of a number of containers, which themselves are made up of a number of components, which in turn are implemented by one or more classes. It’s a simple hierarchy of logical technical building blocks that can be used to illustrate the static structure of most of the software systems I’ve ever encountered. With this set of abstractions in mind, you can then draw diagrams at each level in turn. I call this my C4 model: context, containers, components and classes. Some diagrams will help to explain this further."
'via Blog this'
Angular 2: High-Level Overview | Voxxed
Angular 2: High-Level Overview | Voxxed: "By Yakov Fain
This article was excerpted from the book “Angular Development With TypeScript” (see http://bit.ly/1QYeqL0).
The Angular 2 framework is a re-write of popular framework AngularJS. In short, the newer version has the following advantages over AngularJS.
The code is simpler to write and read
It performs better than AngularJS
It’s easier to learn
The application architecture is simplified as it’s component-based
This article contains a high-level overview of Angular highlighting improvements comparing to AngularJS. For a more detailed architecture overview of Angular visit product documentation here."
'via Blog this'
This article was excerpted from the book “Angular Development With TypeScript” (see http://bit.ly/1QYeqL0).
The Angular 2 framework is a re-write of popular framework AngularJS. In short, the newer version has the following advantages over AngularJS.
The code is simpler to write and read
It performs better than AngularJS
It’s easier to learn
The application architecture is simplified as it’s component-based
This article contains a high-level overview of Angular highlighting improvements comparing to AngularJS. For a more detailed architecture overview of Angular visit product documentation here."
'via Blog this'
Thursday, 7 April 2016
The revenge of the listening sockets
The revenge of the listening sockets: "Back in November we wrote a blog post about one latency spike. Today I'd like to share a continuation of that story. As it turns out, the misconfigured rmem setting wasn't the only source of added latency."
'via Blog this'
'via Blog this'
The story of one latency spike
The story of one latency spike: "A customer reported an unusual problem with our CloudFlare CDN: our servers were responding to some HTTP requests slowly. Extremely slowly. 30 seconds slowly. This happened very rarely and wasn't easily reproducible. To make things worse all our usual monitoring hadn't caught the problem. At the application layer everything was fine: our NGINX servers were not reporting any long running requests."
'via Blog this'
'via Blog this'
Tuesday, 5 April 2016
TypeScript 2.0 Preview
TypeScript 2.0 Preview: "Anders Hejlsberg returned to Microsoft's Build conference in 2016 to talk about the current state of TypeScript and show off some amazing features coming in the next few months. Hejlsberg divided his talk into three main parts, allocating the first 15 minutes to retelling of the high-level story of TypeScript. "TypeScript: JavaScript that scales" is how he described the language and its goal of closing the "JavaScript feature gap". The demos involved basic type checking, statement completion, and how the compiler output compares to the source. After the brief introduction, he showed off what's changed since Build 2015. The team has a 3 to 4 month cadence that has resulted in 4 main releases in the past year. In an Angular 2 demo, Hejlsberg showed how to embed the TypeScript compiler in the browser, eliminating the separate step of recompiling code after a file change. He took the same demo application and repeated it using React, showing off TypeScript's ability to understand JSX, the embedded markup technology favored by React developers. To drive the point home, he refactored the name of a component and showed how TypeScript updated all of the component references throughout the project, including inside the embedded JSX code. Included in the demo was the integration with webpack and the community driven TypeScript loader."
'via Blog this'
'via Blog this'
Monday, 4 April 2016
confluentinc/bottledwater-pg: Change data capture from PostgreSQL into Kafka
confluentinc/bottledwater-pg: Change data capture from PostgreSQL into Kafka: "Bottled Water uses the logical decoding feature (introduced in PostgreSQL 9.4) to extract a consistent snapshot and a continuous stream of change events from a database. The data is extracted at a row level, and encoded using Avro. A client program connects to your database, extracts this data, and relays it to Kafka (you could also integrate it with other systems if you wish, but Kafka is pretty awesome). Key features of Bottled Water are: Works with any PostgreSQL database (version 9.4 or later). There are no restrictions on your database schema. No schema changes are required, no triggers or additional tables. (However, you do need to be able to install a PostgreSQL extension on the database server. More on this below.) Negligible impact on database performance. Transactionally consistent output. That means: writes appear only when they are committed to the database (writes by aborted transactions are discarded), writes appear in the same order as they were committed (no race conditions). Fault-tolerant: does not lose data, even if processes crash, machines die, the network is interrupted, etc."
'via Blog this'
'via Blog this'
Sunday, 3 April 2016
One API, Many Facades?
One API, Many Facades?: "An interesting trend is emerging in the world of Web APIs, with various engineers and companies advocating for dedicated APIs for each consumer with particular needs. Imagine a world where your system needs to expose not only one API for iOS, one API for Android, one for the website, and one for the AngularJS app front end, but also APIs for various set-top boxes and exotic mobile platforms or for third-party companies that call your API. Beyond any ideal design of your API, reality strikes back with the concrete and differing concerns of varied API consumers. You might need to optimize your API accordingly."
'via Blog this'
'via Blog this'
Kyle Kingsbury: Class materials for a distributed systems lecture series
aphyr/distsys-class: Class materials for a distributed systems lecture series:
"An introduction to distributed systems Copyright 2014, 2016 Kyle Kingsbury
This outline accompanies a 12-16 hour overview class on distributed systems fundamentals. The course aims to introduce software engineers to the practical basics of distributed systems, through lecture and discussion. Participants will gain an intuitive understanding of key distributed systems terms, an overview of the algorithmic landscape, and explore production concerns."
'via Blog this'
"An introduction to distributed systems Copyright 2014, 2016 Kyle Kingsbury
This outline accompanies a 12-16 hour overview class on distributed systems fundamentals. The course aims to introduce software engineers to the practical basics of distributed systems, through lecture and discussion. Participants will gain an intuitive understanding of key distributed systems terms, an overview of the algorithmic landscape, and explore production concerns."
'via Blog this'
Saturday, 2 April 2016
bcc: Dynamic Tracing Tools for Linux (IOvisor github)
bcc: "bcc is more than just tools. The BPF enhancements that bcc uses were originally intended for software defined networking (SDN). In bcc, there are examples of this with distributed bridges, HTTP filters, fast packet droppers, and tunnel monitors. BPF was enhanced to support more than just networking, and has general tracing support in the Linux 4.x series. bcc is really a compiler for BPF, that comes with many sample tools. So far bcc has both Python and lua front ends. bcc/BPF, or just BPF, should become an standard resource for performance monitoring and analysis tools, to provide detailed metrics beyond /proc. Latency heat maps, flame graphs, and more should become commonplace in performance GUIs, powered by BPF."
'via Blog this'
'via Blog this'
Friday, 1 April 2016
Hotpatching a C Function on x86 « null program
Hotpatching a C Function on x86 « null program: "In this post I’m going to do a silly, but interesting, exercise that should never be done in any program that actually matters. I’m going write a program that changes one of its function definitions while it’s actively running and using that function. Unlike last time, this won’t involve shared libraries, but it will require x86_64 and GCC. Most of the time it will work with Clang, too, but it’s missing an important compiler option that makes it stable."
'via Blog this'
'via Blog this'
Subscribe to:
Posts (Atom)