Constants Are Changing : a Software and Technology scrapbook: 2015

Thursday, 24 December 2015

Choose Boring Technology - Discourage "Cleverness"

Dan McKinley: Choose Boring Technology (Link to Slides)

Embrace Boredom.

Let's say every company gets about three innovation tokens. You can spend these however you want, but the supply is fixed for a long while. You might get a few more after you achieve a certain level of stability and maturity, but the general tendency is to overestimate the contents of your wallet. Clearly this model is approximate, but I think it helps.
If you choose to write your website in NodeJS, you just spent one of your innovation tokens. If you choose to use MongoDB, you just spent one of your innovation tokens. If you choose to use service discovery tech that's existed for a year or less, you just spent one of your innovation tokens. If you choose to write your own database, oh god, you're in trouble.
Any of those choices might be sensible if you're a javascript consultancy, or a database company. But you're probably not. You're probably working for a company that is at least ostensibly rethinking global commerce or reinventing payments on the web or pursuing some other suitably epic mission. In that context, devoting any of your limited attention to innovating ssh is an excellent way to fail. Or at best, delay success [1].
What counts as boring? That's a little tricky. "Boring" should not be conflated with "bad." There is technology out there that is both boring and bad [2]. You should not use any of that. But there are many choices of technology that are boring and good, or at least good enough. MySQL is boring. Postgres is boring. PHP is boring. Python is boring. Memcached is boring. Squid is boring. Cron is boring.
The nice thing about boringness (so constrained) is that the capabilities of these things are well understood. But more importantly, their failure modes are well understood. Anyone who knows me well will understand that it's only with a overwhelming sense of malaise that I now invoke the spectre of Don Rumsfeld, but I must.

Why discouraging cleverness is important in programming languages and codebases.

One of the other benefits of Go was widening the pool of backgrounds we can hire. We can take someone from any language background and have them ramped up on Go in weeks. With the Scala side there’s the JVM learning curve, the Java world of containers, black magic of JVM tuning etc…
New developers we’ve hired are ramped up in weeks vs months. We now have the majority of our services written in Go and one of the last holdouts to move to Go just wrote his first project and said to me afterward “Wow, I read through that library once and I knew exactly what it was doing , I’ve read the Scala version of that library four times and I still have no idea what it does, I can see why you guys like it so much”. That was one of our senior engineers who’s previously worked for one of the largest web properties in the world. This process was a complete bottoms up initiative from our development team who pushed for the move to Go.
We now process hundreds of thousands of messages per second and Terabytes of data per day with our GoLang services.
I’m not here to bash Scala, or ScalaZ (I love ValidationNel! ) but more to give some real world context of Scala in a production environment over 7 years with two companies. I still use Scala and love hacking in Scalding. Some of our more ambitious projects coming up will most likely be Scala based but I’m just not as sold on it anymore as the core language when trying to scale a growing engineering team. Go has since become my personal language of choice.

Tuesday, 22 December 2015

db-engines.com Database Popularity Rankings (updated)

Database Rankings

1.	1.	1.	Oracle	Relational DBMS	1497.55	+16.61	+37.76
2.	2.	2.	MySQL	Relational DBMS	1298.54	+11.70	+29.96
3.	3.	3.	Microsoft SQL Server	Relational DBMS	1123.16	+0.83	-76.89
4.	4.	5.	MongoDB	Document store	301.39	-3.22	+54.87
5.	5.	4.	PostgreSQL	Relational DBMS	280.09	-5.60	+26.09
6.	6.	6.	DB2	Relational DBMS	196.13	-6.40	-14.13
7.	7.	7.	Microsoft Access	Relational DBMS	140.21	-0.75	+0.31
8.	8.	9.	Cassandra	Wide column store	130.84	-2.08	+36.78
9.	9.	8.	SQLite	Relational DBMS	100.85	-2.60	+6.15
10.	10.	10.	Redis	Key-value store	100.54	-1.87	+12.66

11.	11.	11.	SAP Adaptive Server	Relational DBMS	81.47	-2.24	-4.52
12.	12.	12.	Solr	Search engine	79.15	-0.63	+0.73
13.	14.	16.	Elasticsearch	Search engine	76.57	+1.79	+30.67
14.	13.	13.	Teradata	Relational DBMS	75.72	-1.37	+8.32
15.	16.	17.	Hive	Relational DBMS	55.27	+0.36	+18.90
16.	15.	15.	HBase	Wide column store	54.25	-2.21	+3.17
17.	17.	14.	FileMaker	Relational DBMS	50.12	-1.61	-2.10
18.	18.	20.	Splunk	Search engine	43.86	-0.76	+12.39
19.	19.	21.	SAP HANA	Relational DBMS	38.86	-0.76	+11.05
20.	20.	18.	Informix	Relational DBMS	36.40	-2.05	+1.28
21.	21.	23.	Neo4j	Graph DBMS	33.18	-0.86	+8.02

Monday, 16 November 2015

Mark Little: Is Java EE Still Relevant?

https://developer.jboss.org/blogs/mark.little/2015/11/15/is-java-ee-still-relevant

I had a great time at JavaOne this year and Red Hat had a fantastic showing, with many sessions at the event, many more booth sessions (which were packed out as usual) and of course I made a brief appearance during the kickoff keynote. There had been a lot of interest in this event due to some recent events and rumours about what Oracle would or wouldn't say, specifically about the future of Java EE. This was even a topic of conversation during the JCP EC face-to-face meeting preceding JavaOne. But nothing much really happened and life, it seemed, would go on as usual. However, there was still a lot of discussion during several sessions and through the usual chatter on the floors or parties about the relevancy of Java EE these days.

But let's face it, this isn't a question that is only being asked in 2015; I've been involved in enterprise Java since before it was called J2EE and within a few years of bring created people were asking the same question. As I've mentioned many times over the past few years, people have asked the question when Web Services came along, then REST, PaaS/Cloud, mobile and now IoT. Each time the concept of middleware has evolved and Java EE, and its associated implementations, with it. People get hung up on the concept of Java EE and application servers as bloated, monolithic entities without actually looking at the reality of today's implementations (well, at least one). However, I'm not here to repeat what I've said time and time again(remember this from 2011?), most recently at HPTS 2015. Approaches such as WildFly Swarm and the massive interest it has seen, show that there's still a lot of interest. And I've already suggested how Java EE can fit into the next generation platforms.

Rather than revisit this question, which I think I've answered sufficiently, I wanted to link to Ian's entry on the JavaOne session he did, which I attended. It's always good to hear from someone else on the topic, though I'm sure some will argue he's not objective about this and neither am I. If you're still not convinced, think of the core capabilities within any Java EE application server, such as transactions, messaging, security, cacheing etc; services/capabilities that are needed way beyond Java and Java EE, pre-dating Java by decades and used together or independently in many applications. You can consider Java EE as a way of packaging these things together into a convenient bundle, where they are guaranteed to work well together or independently, and in most cases you probably don't even know they're there. Over the years, before Java was created and way beyond when it is just a mention in history books, that packaging will change but you'll still have something recognisable within middleware implementations, perhaps not co-locating services, perhaps not all implemented in the same language etc. And your future applications will probably not be able to tell the difference or know they're there ... again.

So whilst I think the original question is an important one to ask, I think a much better question is "Where should I use the core capabilities within Java EE?" And if you've got a suitably flexible and agile application server implementation, your application won't need to care that Java EE is, or is not, under the covers providing the desired dependability and reliability.

https://developer.jboss.org/blogs/mark.little/2015/10/27/next-generation-frameworks-and-stacks

Of course microservices are the future. Ok, maybe there was a hint of sarcasm in that last sentences! Microservices have a role to play, just as SOA does (yes, I still believe the two are closely tied). There is some truth in there though: more streamlined, agile and dedicated services will be the basis of future application development, whether using (immutable) containers such as Docker or just the standard JVM, perhaps with fatjars. However, anyone who believes that the future of software (middleware) will appear instantaneously has obviously not looked back at other transitions, such as bespoke-to-CORBA or CORBA-to-J2EE. These things take time and evolution rather than revolution is the natural approach. Even if you've not been involved in middleware there are similar examples elsewhere in our industry: COBOL really is still in deployment today! Look at the interest we have around Blacktie!

Therefore, the future will evolve. Yes people will want to develop new applications (so called greenfield sites) using the latest and greatest framework or stack. But they'll also want to integrate with existing business logic and services written in a variety of present day technologies. So there'll still be Java EE application servers (e.g., EAP) with business logic within them, some of it legacy, some written from scratch today and into the future, despite what some may believe.

I believe that due to the fact Java EE has been the dominant non-Microsoft development and deployment platform for well over a decade, there are so many developers out there who are comfortable with it. Yes some may complain about the apparent bloat of implementations, but the reality is that it's still very easy to develop against and use from a variety of different programming languages. That's why the evolution towards any new paradigm is going to be heavily influenced by it, if not driven directly by those developers. So yes, I believe that microservices and Java EE go hand in hand for a large percentage of developers. Approaches such as WildFly-Swarm offer precisely what I'm describing: a comfortable entry point for developers and even existing applications, yet the power to move to a more flexible DevOps driven paradigm. WildFly, when used correctly, offers a mature and easy to use platform that has a minimal footprint and faster boot times than the most popular web servers around! And don't forget that Swarm builds on WildFly so we immediately get the maturity of implementations(s) from it.

However, this mad rush towards microservices, trimming of application servers, creation of applications from fatjars etc. needs to be approached with caution. Our industry is renowned for offering panaceas to problems that require throwing away all we've done before and relearning all of those hard earned lesson! We've got to break that cycle and in Red Hat we've got the pieces to do so: with so many open source projects out there purporting to be right for enterprise applications and new ones springing into life almost on a daily basis, it's easy to understand why people believe every new project is good enough for their requirements. Although I believe open source is a superior development methodology, it takes time and effort to build enterprise-ready components such as transaction servers, messaging brokers, etc. They don't just spring into life ready formed and fully capable. "Good enough" is rarely sufficient for enterprises. It's the edge cases like management, reliability, recoverability, scalability and bullet proof security that are hard to do and get right, yet it's precisely those edge cases that matter time and again. Through core development or acquisition, we've built up a stack that is mature and capable. Whether deployed as a stack or individual pieces, it's what we should be building the next generation of middleware solutions upon. A strong base exists today and we need to reuse as much of that as possible rather than rewrite from scratch in some new popular programming language.

Now as I hinted above, maybe we don't package our future stack or platform in quite the same way as we do today. Microservices offers an approach that is in line with the kinds of trimming we've been seeing anyway. Bundling individual components from the application server as easily deployed (container based) services that can then be exposed to other programming languages, frameworks, solutions etc. is definitely part of the overall solution space. Those core services, such as transactions or storage, could be deployed as individual services or, as is more typical with something like Swarm, deployed with the business logic that uses them. I keep coming back to the JBossEverywhere initiative we had a few years ago - ahead of its time!

Ok so we've looked to microservices and how Java EE fits in. But that can't be the entire answer - and it isn't. As I've mentioned before, at least for a very important set of applications and use cases the future is reactive and event driven. Now that could mean Node.js but just as likely in the Java world it probably means Vert.x. Note, given that many of the Java EE APIs aren't reactive or asynchronous in nature then we'll need to evolve them if we wish to tie them in to Vert.x and I do think we need to do that. Whilst some people will want to develop their applications and microservices in Vert.x from scratch, others will want to tie in legacy systems or have access to some of the core services I mention earlier. I see Vert.x as the ideal backbone or glue that brings all of these things together. The mature core services that we've got are precisely the sorts of things that enterprise developers will need for their applications as they grow in complexity - and let's face it, eventually many applications are going to need security, transactions, high performance messaging etc.

In the Java world the unit of containerisation is essentially the JVM. However, most Java developers realise that unless you ship a farjar, which contains everything you need to run your application/service, it's often typical to find that changes in third-party jars downloaded at deployment time can result in the application or service failing to run first time. This is where operating system containers, such as Docker, really come into their own. The ability to create a deployment unit which runs first time and every time is crucial! Container orchestration technologies, such as Kubernetes, are likewise important if you want to deploy services (via Containers) which are highly available, load balanced etc. Therefore, hopefully it's not too difficult to see where Containers will fit into the future architecture - not mandatory by any means, but definitely a piece of the puzzle which should be considered from the outset.

The combination of OpenShift for Container deployment and management, with Fabric8 for developer experience with CI and CD, provides a compelling hybrid cloud environment, especially once you consider all of the JBoss/Fuse middleware integrated, i.e., xPaaS. As I've mentioned before though, xPaaS isn't about simply adding the middleware products to OpenShift; we're also going to make them much more cloud-aware/cloud-enabled. This has a number of implications, but the one I want to mention specifically is that the core capabilities will be made available to developers in a more cloud-natural manner, e.g., users who want reliable messaging won't need to understand the various intricacies of JMS to use A-MQ and in fact won't even have to know A-MQ is working under the covers. And yes, for those of you still paying attention, those core capabilities I mentioned are precisely the same core services we covered earlier on. See the connection?

Up to this point we've really been playing in the traditional enterprise deployment arena: clients, middle tiers and servers. The cloud comes into play here at the back tier (servers), but what about mobile, IoT and ubiquitous computing? I've discussed this a few times before so won't repeat much here except that I think everything we've discussed so far has immediate applicability for ubiquitous computing and that mobile, as well as IoT, are just limited aspects of it. In fact as I showed separately, if the cloud is to scale, mobile/IoT needs to take on a more fat-client approach - anyone remember what I wrote about Shannon's Limit over 4 years ago? Mobile, which really means developing applications for phones that tend to rely upon backend services, is a specific implementation of IoT, which really means developing applications for a range of devices that tend to rely upon backend services (ok, with some gateway technologies in there for good measure.) See what I mean?

If you follow that assertion that everything we need to do going forward is some aspect of ubiquitous computing, then it follows from what we discussed earlier that the new stack approach of core services, Containers, management etc. all come into play and across a variety of different languages and frameworks. Whether you're developing enterprise applications for mobile devices, clouds, involving sensors, or traditional mainframes, you need a stack that is mature, rich, scalable, reliable, trustworthy and open. The Red Hat stack, which has evolved over the last decade and is continuing to evolve, is the only one that matches all of the requirements!

Below is a hand-drawn outline of where I see these things going. Apologies that it's not a nice block diagram and for my handwriting

Saturday, 14 November 2015

JVM Concurrency links

Jessica Kerr: Concurrency Options on the JVM

Jessica Kerr covers some of the concurrency tools existing in JVM languages including ExecutorService, Futures, Akka actors, and core.async coroutines, providing advice on writing deadlock-free code.

Gentle Introduction to Lockless Concurrency

I've recently visited #jcrete, a Java Un-Conference. After talking with people, it struck me that many things that were available and extremely popular in the other communities (like Clojure and Haskell) are still so little used in Java world.

Ever since Java got Lambdas, Streams and default method implementations in Interfaces, it actually became much more enjoyable language to use. It should have also changed ways people write their code, although this process is much slower than I would have expected.

Lambdas allowed us to pass anonymous functions in order to avoid creating a class for every single possible clause where variables escape the context. They also allow to think of operations in terms of small, atomic pieces that can be combined, passed around, reused and referenced.

In this post I’ll try to examine the question of lockless concurrency in Java - an extremely useful, yet shamefully overlooked topic. Implementing lockless algorithms and data structures requires some intuition on basic principles, so let's start with something simple.

New ML frameworks: Google TensorFlow and Samsung VELES

TensorFlow: Google Open Sources Their Machine Learning Tool

TensorFlow is a machine learning library created by the Brain Team researchers at Google and now open sourced under the Apache License 2.0. TensorFlow is detailed in the whitepaperTensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. The source code can be found on Google Git.

TensorFlow is a tool for writing and executing machine learning algorithms. Computations are done in a data flow graph where the nodes are mathematical operations and the edges aretensors (multidimensional data arrays) that are exchanged between nodes. An user constructs the graph and writes the algorithms that executed on each node. TensorFlow takes care of executing the code asynchronously on different devices, cores, and threads.

TensorFlow runs on CPU and GPUs on the desktop, server or mobile devices. It can be containerized with Docker to be deploy in the cloud. The version that is open sourced runs on single machines, not on clusters.

TensorFlow has a complete Python API and C++ interface for building and executing graphs. It also has a C-based client API. Google invites the community to write interfaces in other languages, the most probably being Lua, R, Java, Go and JavaScript.

Google considers the library is not final and will continue to improve it. They will make public some of the actual implementations they have created.

TensorFlow is used by Google for GMail (SmartReply), Search (RankBrain), Pictures (Inception Image Classification Model), Translator (Character Recognition), and other products.

Google TensorFlow

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

What is a Data Flow Graph?

Data flow graphs describe mathematical computation with a directed graph of nodes & edges. Nodes typically implement mathematical operations, but can also represent endpoints to feed in data, push out results, or read/write persistent variables. Edges describe the input/output relationships between nodes. These data edges carry dynamically-sized multidimensional data arrays, or tensors. The flow of tensors through the graph is where TensorFlow gets its name. Nodes are assigned to computational devices and execute asynchronously and in parallel once all the tensors on their incoming edges becomes available.

Software Test links (Including Agile Testing)

Amazon: Agile Testing: A Practical Guide for Testers and Agile Teams

Brian Marick’s testing quadrant

Documenting Architecture

amazon: Applied Software Architecture

[Soni/Nord/Hofmeister] is the first article that has propagated
the idea of specifying and architecture language with UML
 Conceptual level: Conceptual architecture (components, connectors)
 Modules interconnection architecture (modules and their connections)
 Execution architecture: runtime architecture
 Code architecture level: division of systems onto files

Constants Are Changing : a Software and Technology scrapbook