Here at Intent HQ we use Wikipedia and Wikidata as sources of data. They are very important to us because they both encode an enormous amount of information in several languages that we use to build our Topic Graph.
Although the current process we have to process these dumps works well enough, we are always interested in finding new and better ways of doing our work. It’s because of that that we were very excited when we saw the Reactive Streams initiative 1. We thought it could be used to process the largest encyclopædia in the world 2.
Be warned that this is mostly just a collection of links to articles and demos by smarter people than I. Areas of interest include Java, C++, Scala, Go, Rust, Python, Networking, Cloud, Containers, Machine Learning, the Web, Visualization, Linux, System Performance, Software Architecture, Microservices, Functional Programming....
Tuesday, 23 June 2015
Reactive: Akka Streams, RxJava, Ratpack
Wednesday, 10 June 2015
James Ward: Java EE: Comparing Application Deployment: 2005 vs. 2015
James Ward: Java EE: Comparing Application Deployment: 2005 vs. 2015
Java Doesn’t Suck – You’re Just Using it Wrong
Refactoring to Microservices
2005 = Multi-App Containers / App Servers / Monolithic Apps
2015 = Microservices / Docker Containers / Containerless AppsBack in 2005 many of us worked on projects that resulted in a WAR file – a zip file containing a Java web application and its library dependencies. That web application would be deployed alongside other web applications into a single app server sometimes called a “container” because it contained and ran one or more applications. The app server provided a bunch of common services to the web apps like an HTTP server, a service directory, and shared libraries. Unfortunately deploying multiple apps in a single container created high friction for scaling, deployment, and resource usage. App servers were supposed to isolate an app from its underlying system dependencies in order to avoid “it works on my machine” problems but things often didn’t work that smoothly due to differing system dependencies and configuration that lived outside of the app server / container.In 2015 apps are being deployed as self-contained units, meaning the app includes everything it needs to run on top of a standard set of system dependencies. The granularity of the self-contained unit differs depending on the deployment paradigm. In the Java / JVM world a “containerless” app is a zip file that includes everything the app needs on top of the JVM. Most modern JVM frameworks have switched to this containerless approach including Play Framework, Dropwizard, and Spring Boot. A few years ago I wrote in more detail about how app servers are fading away in the move from monolithic middleware to microservices and cloud services.For a more complete and portable self-contained unit, system-level container technologies like Docker and LXC bundle the app with its system dependencies. Instead of deploying a bunch of apps into a single container, a single app is added to a Docker image and deployed on one or more servers. On Heroku a “Slug” file is similar to a Docker image.Microservices play a role in this new landscape because deployment across microservices is independent, whereas with traditional app servers individual app deployment often involved restarting the whole server. This was one reason for the snail’s pace of deployment in enterprises – deployments were incredibly risky and had to be coordinated months in advance across numerous teams. Hot deployment was a promise that was never realized for production apps. Microservices enable individual teams to deploy at will and as often as they want. Microservices require the ability to quickly provision, deploy, and scale services which may have only a single responsibility. These requirements fit well with the infrastructure provided by containerless apps running on Docker(ish) Containers.2005 = Manual Deployment
2015 = Continuous Delivery / Continuous DeploymentThe app servers of 2005 that ran multiple monolithic apps combined with manual load balancer configurations made application upgrades risky and painful so deployments were usually done sparingly in designated maintenance windows. Back then it was pretty much unheard of to have a deployment pipeline that fully automated delivery from an SCM to production.Today Continuous Delivery and Continuous Deployment enable developers to get code to staging and production sometimes as often as tens or even hundreds of times a day. Scalable deployment pipelines range from the simple “git push heroku master” to a more risk averse pipeline that includes pull requests, Continuous Integration, staging auto-deployment, manual promotion to production, and possibly Canary Releases & Feature Flags. These pipelines enable organizations to move fast and distribute risk across many small releases.In order for Continuous Delivery to work well there are a few ancillary requirements:
- Release rollbacks must be instant and easy because sometimes things are going to break and getting back to a working state quickly must be painless and fast.
- Patch releases must be able to make it from SCM to production (through a continuous delivery pipeline) in minutes.
- Load balancers must be able to handle automatic switching between releases.
- Database schema changes should be decoupled from app releases otherwise releases and rollbacks can be blocked.
- App-tier servers should be stateless with state living in external data stores otherwise state will be frequently lost and/or inconsistent.
2005 = Persistent Servers / “Pray it never goes down”
2015 = Immutable Infrastructure / Ephemeral ServersWhen a server crashed in 2005 stuff usually broke. Some used session replication and server affinity but sessions were lost and bringing up new instances usually took quite a bit of manual work. Often changes were made to production systems via SSH making it difficult to accurately reproduce a production environment. Logging was usually done to local disk making it hard to see what was going on across servers and load balancers.Servers in 2015 are disposable, immutable, and ephemeral forcing us to plan for them to go down. Tools like Netflix’s Chaos Monkey randomly shut down servers to make sure we are preparing for crashes. Load balancers and management backplanes work together to start and stop new instances in an instant enabling rapid scaling both up and down. By being immutable we can no longer fix production issues by SSHing into a server but now environments are easily reproducible. Logging services route STDOUT to an external service enabling us to see the log stream in real time, across the whole system.2005 = Ops Team
2015 = DevOpsIn 2005 there was a team that would take your WAR file (or other deployable artifact) and be responsible for deploying it, managing it, and monitoring it. This was nice because developers didn’t have to wear pagers but ultimately the Ops team often couldn’t do much if there was a production issue at 3am. The biggest downside of this was that Ops became all about risk mitigation causing a tremendous slowdown in software delivery.Modern technical organizations of all sizes are ditching the Ops velocity killer and making developers responsible for the stuff they put into production. Services like New Relic, VictorOps, and Slack help developers stay on top of their new operational responsibilities. The DevOps culture also directly incentivizes devs not to deploy things that will end up waking them or a team member up at 3am. A core indicator of a DevOps culture is whether a new team member can get code to production on their first day. Doing that one thing right means doing so many other things right, like:
- 3 Step Dev Setup: Provision the system, Checkout the code, and Run the App
- SCM / Team Review (e.g. GitHub Flow)
- Continuous Integration & Continuous Deployment / Delivery
- Monitoring and Notifications
DevOps can sound very scary to traditional enterprise developers like myself. But from experience I can attest that wearing a pager (metaphorically) and assuming the direct risk of my deployments has made me a much better developer. The quality of my code and my feelings of fulfillment have increased with my new level of ownership over what is in production.Learn MoreI’ve just touched the surface of many of the deployment changes over the past 10 years but hopefully you now have a better understanding of some of the terminology you might be hearing at conferences and on blogs. For more details on these and related topics, check out The Twelve-Factor App and my blog Java Doesn’t Suck – You’re Just Using it Wrong. Let me know what you think!
Java Doesn’t Suck – You’re Just Using it Wrong
Refactoring to Microservices
Monday, 8 June 2015
Machine Learning links
A Visual Introduction to Machine Learning
10 more lessons learned from building Machine Learning systems
A Tour of Machine Learning Algorithms
10 more lessons learned from building Machine Learning systems
A Tour of Machine Learning Algorithms
There are only a few main learning styles or learning models that an algorithm can have and we’ll go through them here with a few examples of algorithms and problem types that they suit. This taxonomy or way of organizing machine learning algorithms is useful because it forces you to think about the the roles of the input data and the model preparation process and select one that is the most appropriate for your problem in order to get the best result.
- Supervised Learning: Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time. A model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data. Example problems are classification and regression. Example algorithms are Logistic Regression and the Back Propagation Neural Network.
- Unsupervised Learning: Input data is not labelled and does not have a known result. A model is prepared by deducing structures present in the input data. Example problems are association rule learning and clustering. Example algorithms are the Apriori algorithm and k-means.
- Semi-Supervised Learning: Input data is a mixture of labelled and unlabelled examples. There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions. Example problems are classification and regression. Example algorithms are extensions to other flexible methods that make assumptions about how to model the unlabelled data.
When crunching data to model business decisions, you are most typically using supervised and unsupervised learning methods. A hot topic at the moment is semi-supervised learning methods in areas such as image classification where there are large datasets with very few labelled examples. Reinforcement learning is more likely to turn up in robotic control and other control systems development.
- Reinforcement Learning: Input data is provided as stimulus to a model from an environment to which the model must respond and react. Feedback is provided not from of a teaching process as in supervised learning, but as punishments and rewards in the environment. Example problems are systems and robot control. Example algorithms are Q-learning and Temporal difference learning.
Subscribe to:
Posts (Atom)