Tuesday, 30 December 2014

radimrehurek.com: Practical Data Science in Python


End-to-end example: automated spam filtering

http://www.johndcook.com : Object oriented vs. functional programming

OO makes code understandable by encapsulating moving parts.
FP makes code understandable by minimizing moving parts.
This explains some of the tension between object oriented programming and functional programming. The former tries to control state behind object interfaces. The latter tries to minimize state by using pure functions as much as possible.
It’s understandable that programmers accustomed to object oriented programming would like to add functional programming on top of OO, but I believe you have to make more of an exclusive commitment to functional programming to get the most benefit. For example, pure functions are easier to debug and to execute in parallel due to their lack of side effects. But if your code is only semi-functional, you can’t have the same confidence in testing your code or in spreading it across processors.
James Hague argues that 100% functional purity is impractical and that one should aim for 85% purity. But the 15% impurity needs to be partitioned, not randomly scattered across your code base. A simple strategy for doing this is to use functional in the small and OO in the large.Clojure also has some very interesting ideas for isolating the stateful parts of a program.
http://www.johndcook.com/blog/2010/11/03/object-oriented-vs-functional-programming/

vladmihalcea.com: A BEGINNER’S GUIDE TO TRANSACTION ISOLATION LEVELS IN ENTERPRISE JAVA

http://vladmihalcea.com/2014/12/23/a-beginners-guide-to-transaction-isolation-levels-in-enterprise-java/

A relational database strong consistency model is based on ACID transaction properties. In this post we are going to unravel the reasons behind using different transaction isolation levels and various configuration patterns for both resource local and JTA transactions.

Monday, 29 December 2014

Apache Camel <-> Enterprise Integration Patterns

In order to understand what Apache Camel is, you need to understand what Enterprise Integration Patterns are.

Paris Review: http://www.theparisreview.org/blog/2014/09/05/the-beauty-of-code/

theparisreview.org


as software programs grow bigger and more complex, the code they comprise tends to become unreadable and incomprehensible to human beings. Programmers like to point out that if each line of code, or even each logical statement (which may spread to more than one physical line), is understood to be a component, software systems are the most complicated things that humans have ever built: the Lucent 5ESS switch, used in telephone exchanges, derives its functionality from a hundred million lines of code; the 2008 Fedora 9 distribution of Linux comprises over two hundred million lines of code. No temple, no cathedral has ever contained as many moving parts. So if you’ve ever written code, you understand in your bones the truth of Donald Knuth’s assertion, “Software is hard. It’s harder than anything else I’ve ever had to do.” If you’ve ever written code, the fact that so much software works so much of the time can seem profoundly miraculous.

The Literate Programmer: The microservice declaration of independence

In order for "microservice architecture" to be a useful way of characterising systems, it needs an accepted and well-understood definition. Mark Baker laments the lack of a precise description, because he feels its absence compromises the value of the term:
Really unfortunate that the Microservices style isn't defined in terms of architectural constraints #ambiguity #herewegoagain
In their article, Martin and James have this to say about the style's definition:
We cannot say there is a formal definition of the microservices architectural style, but we can attempt to describe what we see as common characteristics for architectures that fit the label.
I'm still optimistic that a clearer picture is yet to emerge, though it might not be as unambiguous as the REST architectural style, which is defined in terms of six constraints.

My own opinion is that microservice architectures can be understood through a single abstract architectural constraint which can be interpreted along many different degrees of freedom.
X can be varied independently of the rest of the system.
What might X be? There are many qualities that might be locally varied in an architecture to beneficial effect. Here are some ways we could introduce novelty into a service without wishing to disturb the greater system:
  • Select implementation and storage technology
  • Test
  • Deploy to production
  • Recover from failure
  • Monitor
  • Horizontally scale
  • Replace
There are other aspects of systems that are harder to formalise, but which the human beings that work on our software systems might wish to do with without being troubled by the fearsome complexity of the whole architecture:
I don't wish to imply that this is an exhaustive list. To me, the commonality lies in the concept of independent variation, not the kinds of variation themselves.
The microservice declaration of independence

Thursday, 25 September 2014

Wednesday, 20 August 2014

InfoQ: Comparing Virtual Machines and Linux Containers Performance

INFOQ: Comparing Virtual Machines and Linux Containers Performance



The paper authors ran CPU, memory, network and I/O benchmarks against native, container and virtualized execution, using KVM and Docker as virtualization and container technologies respectively. Benchmarks also include sample Redis and MySQL workloads; Redis exercises the networking stack, with small packets and a large number of clients, while MySQL stresses memory, network and the filesystem.

The results show that Docker equals or exceeds KVM performance in every case tested. For CPU and memory performance KVM and Docker introduce a measurable but negligible overhead, although for I/O intensive applications both require tuning.

Docker performance degrades when using files stored in AUFS, compared to using volumes, that have better performance. A volume is a specially-designated directory within one or more containers that bypasses the union file system, so it does not have the overhead that the storage backends may have. The default AUFS backend causes significant I/O overhead specially when using many layers and deep nested directory hierarchies.


Friday, 11 July 2014

Why Twitter picked Scala...

Engineer-to-Engineer Talk: How and Why Twitter Uses Scala

Intesting quote:
We knew we needed another language. How did we pick a language that was really fun for us? We considered Java, C/C++ of course. And we looked at Haskell and OCaml for functional programming, though neither has gotten much commercial use. Erlang developers are doing stuff with a lot of network I/O but not with a lot of disk I/O; the knowledge-base around the language wasn’t great though, and the community seemed inaccessible.Java is easy to use, but it’s not very fun, especially if you’ve been using Ruby for a while. Java’s productive, but it’s just not sexy anymore. C++ was barely considered as an option. Some guys said, if I have to work in C++ again, I’m going to stab my eyes out with a shrimp fork. Java-script on the server-side via Rhino had performance problems, and it wasn’t quite there yet when we were evaluating it.
So what were our criteria for choosing Scala? Well first we asked, was it fast, and fun, and good for long-running process? Does it have advanced features? Can you be productive quickly? Developers of the language itself had to be accessible to us as we’d been burned by Ruby in that respect. Ruby’s developers had been clear about focusing it on fun, even sometimes at the expense of performance. They understood our concerns about enterprise-class support and sometimes had other priorities.
We wanted to be able to talk to the guys building the language, not to steer the language, but at least to have a conversation with them.

Friday, 13 June 2014

Adam Thornhill : Code as a Crime Scene

Code as a Crime Scene
To understand large-scale software systems we need to look at their evolution. The history of our system provides us with data we cannot derive from a single snapshot of the source code. Instead Versions Control System (VCS) data blends technical, social and organizational information along a temporal axis that let us map out our interaction patterns in the code. Analyzing these patterns gives us early warnings on potential design issues and development bottlenecks, as well as suggesting new modularities based on actual interactions with the code. Addressing these issues saves costs, simplifies maintenance and let us evolve our systems in the direction of how we actually work with the code.


The road ahead points to a wider application of the techniques. While this article focused on analyzing the design aspect of software, reading code is a harder problem to solve. Integrating analysis of VCS data in the daily workflow of the programmer would allow such a system to provide reading recommendations. For example, “programmers that read the code for the Communication module also checked-out the UserStatistics module” is a likely future recommendation to be seen in your favorite IDE.


Integrating VCS data into our daily workflow would allow future analysis methods to be more fine-grained. There's much improvement to be made if we could consider the time-scale within a single commit. As such, VCS data serves as both feedback and a helpful guide.

Saturday, 7 June 2014

Twitter-Scale Computing with OpenJDK

Interesting video on how Twitter tweak their own branch of OpenJDK to optimise latency (e.g. garbage collection), and JVM handling of Scala.

Tuesday, 27 May 2014

'Knowing and Doing' blog

Excellent Computer Science blog: http://www.cs.uni.edu/~wallingf/blog

Interesting articles include:

Thinking in Types, and Good Design

Students who become accustomed to programming in languages like Python and Ruby often become accustomed to using untyped lists, arrays, hashes, and tuples as their go-to collections. They are oh, so, handy, often the quickest route to a program that works on the small examples at hand. But those very handy data structures promote sloppy design, or at least enable it; they make it easy not to see very basic objects living in the code.
Who needs a Game class when a Python list or Ruby array works out of the box? I'll tell you: you do, as soon as you try to almost anything else in your program. Otherwise, you begin working around the generality of the list or array, writing code to handle special cases really aren't special cases at all. They are simply unbundled objects running wild in the program.
Good design is good design. Most of the features of a good design transcend any particular programming style or language.
Code Duplication as a Hint to Think Differently

When defining a program to process an inductively-defined data type, the structure of the program should follow the structure of the data.
This guideline helps many programmers begin to write recursive programs in a functional style, rather than an imperative style.

Take Small Steps

If a CS major learns only one habit of professional practice in four years, it should be:
Take small steps.
A corollary:
If things aren't working, take smaller steps.
I once heard Kent Beck say something similar, in the context of TDD and XP. When my colleague Mark Jacobson works with students who are struggling, he uses a similar mantra: Solve a simpler problem. As Dr. Nick notes, students and professionals alike should scale the step size according to their level of knowledge or their confidence about the problem.

Software Design is a Bet on a Particular Future

This truth is expressed nicely by Reginald Braithwaite:
Software design is the act of making bets about the future. A well-designed program is a bet on what will change in the future, and what will not change. And a well-designed program communicates the nature of that bet by being relatively flexible about things that the designers think are most likely to change, and being relatively inflexible about the things the designers think are least likely to change.
That's what refactoring is all about, of course. Sometimes, a particular guess turns out to be wrong. We have the wrong factors, the wrong components, for adding a new feature. So we change the shape of the code -- we factor it into different components -- to reflect our new best understanding of the future. Then we move on.
Sometimes, though, there are forces that make more desirable a relatively monolithic piece of code (or, as Braithwaite points out, a system decomposed into relatively less flexible components). In these cases, we need to defactor, to use Braithwaite's term: we recombine some or all of the parts to create a new design.
Predicting the future is hard, even for experienced programmers. One of the goals of agile design is to not think too far ahead, because that means committing to a future too far removed from what we already know to be true about our program.




The Computer Language Benchmarks Game


Interesting, but always a controversial topic.

http://benchmarksgame.alioth.debian.org/

Monday, 26 May 2014

Apple and the little i

Article on how Apple got the little i....

This was a key moment for Apple, when its love of Simplicity won the day and set it on a course it follows to this very day. Steve was unrelenting in his desire to give this great product a great name. He appreciated the power of words. In this case, he appreciated the power of a single letter. And that little letter “i” became one of the most important parts of the Apple brand.

http://www.fastcodesign.com/1669924/steve-jobs-almost-named-the-imac-the-macman-until-this-guy-stopped-him