Constants Are Changing : a Software and Technology scrapbook: July 2016

Friday, 29 July 2016

Making OpenStack production ready with Kubernetes and OpenStack-Salt - Part 1 - Superuser

Making OpenStack production ready with Kubernetes and OpenStack-Salt - Part 1 - Superuser: "This tutorial introduces and explains how to build a workflow for life cycle management and operation of an enterprise OpenStack private cloud coupled with OpenContrail SDN running in Docker containers and Kubernetes.

The following blog post is divided into five parts, the first is an explanation of a deployment journey into a continuous DevOps workflow. The second offers steps on how to build and integrate containers with your build pipeline.

The third part details the orchestration of containers with a walkthrough of Kubernetes architecture, including plugins and prepping OpenStack for decomposition. In the fourth part, we introduce the tcp cloud theory of a “single source of truth” solution for central orchestration. In the fifth and final step we bring it all together, demonstrating how to deploy and upgrade of OpenStack with OpenContrail.

We decided to divide the process into two blog posts for better reading. This first post covers creating a continuous DevOps workflow and containers build."

'via Blog this'

Good to Great Python reads — jesse noller

Good to Great Python reads — jesse noller: "A COLLECTION OF PYTHON "MUST READS":"

'via Blog this'

Wednesday, 27 July 2016

Why Uber Engineering Switched from Postgres to MySQL - Uber Engineering Blog

Why Uber Engineering Switched from Postgres to MySQL - Uber Engineering Blog: "The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL."

'via Blog this'

Cloud: OpenStack will soon be able to run in containers on top of Kubernetes | TechCrunch

OpenStack will soon be able to run in containers on top of Kubernetes | TechCrunch: "OpenStack, the open source project that allows enterprises to run an AWS-like cloud computing service in their own data centers, added support for containers over the course of its last few releases. Running OpenStack itself on top of containers is a different problem, though.

Even though CoreOS has done some work on running OpenStack in containers thanks to its oddly named Stackanetes project, that project happened outside of the OpenStack community and the core OpenStack deployment and management tools.

Soon, however, thanks to the work of Mirantis, Google and Intel, the OpenStack Fuel deployment tool will be able to use Kubernetes as its orchestration engine, too. Ideally, this will make it easier to manage OpenStack deployments at scale.

“With the emergence of Docker as the standard container image format and Kubernetes as the standard for container orchestration, we are finally seeing continuity in how people approach operations of distributed applications,” said Mirantis CMO Boris Renski. “Combining Kubernetes and Fuel will open OpenStack up to a new delivery model that allows faster consumption of updates, helping customers get to outcomes faster.”

This also means that OpenStack will soon be able to run in containers on Google’s Cloud — or really any cloud service that supports Kubernetes."

'via Blog this'

Linux: bcc tutorial by Brendan Gregg· iovisor/bcc

bcc/tutorial.md at master · iovisor/bcc · GitHub: "bcc Tutorial

This tutorial covers how to use bcc tools to quickly solve performance, troubleshooting, and networking issues. If you want to develop new bcc tools, see tutorial_bcc_python_developer.md for that tutorial.

It is assumed for this tutorial that bcc is already installed, and you can run tools like execsnoop successfully. See INSTALL.md. This uses enhancements added to the Linux 4.x series."

'via Blog this'

Friday, 22 July 2016

Building beautiful RESTful APIs using Flask, Swagger UI and Flask-RESTPlus - Michał Karzyński

Building beautiful RESTful APIs using Flask, Swagger UI and Flask-RESTPlus - Michał Karzyński: "This article outlines steps needed to create a RESTful API using Flask and Flask-RESTPlus. These tools combine into a framework, which automates common tasks:

API input validation
formatting output (as JSON)
generating interactive documentation (with Swagger UI)
turning Python exceptions into machine-readable HTTP responses

Flask

Flask is a web micro-framework written in Python. Since it’s a micro-framework, Flask does very little by itself. In contrast to a framework like Django, which takes the “batteries included” approach, Flask does not come with an ORM, serializers, user management or built-in internationalization. All these features and many others are available as Flask extensions, which make up a rich, but loosely coupled ecosystem.

The challenge, then, for an aspiring Flask developer lies in picking the right extensions and combining them together to get just the right set of functions. In this article we will describe how to use the Flask-RESTPlus extension to create a Flask-based RESTful JSON API.

Flask-RESTPlus

Flask-RESTPlus aims to make building REST APIs quick and easy. It provides just enough syntactic sugar to make your code readable and easy to maintain. The killer feature of RESTPlus is its ability to automatically generate an interactive documentation for your API using Swagger UI.

Swagger UI

Swagger UI is part of a suite of technologies for documenting RESTful web services. Swagger has evolved into the OpenAPI specification, currently curated by the Linux Foundation. Once you have an OpenAPI description of your web service, you can use a variety of tool to generate documentation or even boilerplate code in a variety of languages. Take a look at swagger.io for more information."

'via Blog this'

REST etc: Don’t Version Your Web API - InfoQ

Don’t Version Your Web API: "For Lambla, none of these versioning techniques work on the open web. What we really want is something that can evolve, using contracts that smoothly can evolve with needed changes. For Lambla this is a well understood problem, and refers to the web; with many millions of microservices it has been running for 25 years without too many problems and no versioning with the exception of HTTP.

The first pillar for evolvable contracts is backward compatibility. Lambla uses HTML as an example where there are lot of elements that we are discouraged from using, but they are still supported by clients since we can’t update all web sites in the world. The same principles should apply to an API; as it evolves it must still support old formats.

The second pillar is forward compatibility. To achieve this, you have to ignore the unknown or what is not understood. Lambla refers to CSS as an example where new attributes can be handled without problems. To achieve this, fall-back rules are used to handle unknown attributes, an important way to get extensibility points.

XML is still commonly used, and often with XML schemas. To support evolvability here we must be flexible in content and Lambla therefore strongly recommends against validating schemas in servers as well as in clients. Instead we should just find the elements and attributes we need and ignore the rest.

To avoid versioning, we need to continue with support of all features, but we can’t keep all changes in an API forever. Old features that are rarely used should be removed. To know when they can be removed we need to use metrics and measure usage. We can then decide that when usage falls below for example, 1%, the feature can be removed."

'via Blog this'

ACID: How to Screw it Up! - DZone Database

ACID: How to Screw it Up! - DZone Database: "In my previous post, I described four applications (three implemented, one an example) that require, or at least strongly benefit from, strong ACID transactions. With that out of the way, we can now get to the fun bits.

Today's claim: Most databases that claim to be 100% ACID compliant actually fudge definitions, goof up, or deliberately mislead. This applies to 30-year old giants as well as strapping upstarts; it's just a fact of life in databases.

I could make a giant chart with all of the products that claim to be ACID — and list where they fail. But that chart would be partial, soon-dated, and probably contain errors. It would certainly open up a discussion about specifics that might miss the forest for the trees.

So in lieu of that, here's general guidance about the ways many databases fail to meet that 100% ACID standard. This is how you can evaluate a new database's ACID promises. The content below will help you look past the hype and/or marketing to get a sense of how mature the development is, what engineering tradeoffs have been made, and how serious the database developers are about backing up their promises. "

'via Blog this'

Tuesday, 19 July 2016

why i like netstat (h/t @icco) pic.twitter.com/PunGzBqjLg
— Julia Evans (@b0rk) July 19, 2016

Monday, 18 July 2016

some words about perf (this one's gonna need more than one page) pic.twitter.com/F4eqpCaaWe
— Julia Evans (@b0rk) July 18, 2016

a sketch about why I like dstat pic.twitter.com/lgQHm8Sq7J
— Julia Evans (@b0rk) July 18, 2016

Sunday, 17 July 2016

networking: One way to make containers network: BGP - Julia Evans

One way to make containers network: BGP - Julia Evans: "Okay, so, again, I have 5 containers, all running things on port 4000. The key observation here is -- it's okay to run a bunch of things on the same port as long as they're on different IPs. Normally on one computer you only use one IP address. But that doesn't have to be true! You can have lots!

So. I have 5 containers, and I've assigned them IPs 10.0.1.101, 10.0.1.102, 10.0.1.103, 10.0.1.104, 10.0.1.105. Inside my computer, this is fine -- it's easy to imagine that I can just know which one is which.

But what do I do if I have another computer on the same network? How does that container know that 10.0.1.104 belongs to a container on my computer?

The Linux kernel knows about the BGP protocol. Calico knows about the Linux kernel. So Calico says "hey linux! Tell these other computers on the network to find these IP addresses here, okay?" And then all the traffic for those IPs comes to our computer, and everything is great.

To me, this seems pretty nice. It means that you can easily interpret the packets coming in and out of your machine (and, because we love tcpdump, we want to be able to understand our network traffic). I think there are other advantages but I'm not sure what they are.

I find reading this networking stuff pretty difficult; more difficult than usual. For example, Docker also has a networking product they released recently. The webpage says they're doing "overlay networking". I don't know what that is, but it seems like you need etcd or consul or zookeeper. So the networking thing involves a distributed key-value store? Why do I need to have a distributed key-value store to do networking? There is probably a talk about this that I can watch but I don't understand it yet."

'via Blog this'

Tuesday, 12 July 2016

Architecture: The Log: What every software engineer should know about real-time data's unifying abstraction | LinkedIn Engineering

The Log: What every software engineer should know about real-time data's unifying abstraction | LinkedIn Engineering: "I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems. This has been an interesting experience: we built, deployed, and run to this day a distributed graph database, a distributed search backend, a Hadoop installation, and a first and second generation key-value store.

One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures.

You can't fully understand databases, NoSQL stores, key value stores, replication, paxos, hadoop, version control, or almost any software system without understanding logs; and yet, most software engineers are not familiar with them. I'd like to change that. In this post, I'll walk you through everything you need to know about logs, including what is log and how to use logs for data integration, real time processing, and system building."

'via Blog this'

Sunday, 10 July 2016

Linux debugging tools I love - Julia Evans

Linux debugging tools I love - Julia Evans: "

strace
A tool that traces system calls. My favorite thing. I have a bunch of posts with examples of how to use it on this blog. If you want to read what I think of it you should read the fanzine reference that you can read here.

strace is pretty broadly useful, but keep in mind it can really slow down your programs.

I would be remiss if I did not mention the even-more-awesome dtrace. Colin Jones has a nice introduction.

dstat
A really simple tool that prints out how much data got sent over the network / written to disk every second. This is great when you suspect something is going on with network/disk usage and want to see what's happening in real time.

There's also iostat and netstat and atop and a bunch of other tools, but dstat is my favorite.

tcpdump + wireshark
For spying on network traffic. I wrote an introduction explaining how to use them in tcpdump is amazing.

When using these, it really helps to have a basic understanding of how networking works. Luckily the basics ("what's the difference between IP and TCP and HTTP? what's a network packet?") are totally possible to pick up :D.

perf

Have a C program and want to know which functions it's spending the most time in? perf is a sampling profiler for Linux that can tell you that.

perf top gives you a live view of which functions are running right now, just like top. I like to use perf top no matter what language my programs are written in, just to see if I can understand anything from it. Sometimes it works!

node.js has built-in support for using perf to show you which Node function is running right now. You can also get this for JVM programs with perf-map-agent.

Brendan Gregg's website has the best introduction to perf I know.

You can use perf to generate amazing flame graphs.

opensnoop

Opensnoop is a new script that you can get as of Ubuntu 16.04. It's a delightfully simple tool -- it just shows you which files are being opened right now on your computer. And it's fast, unlike strace!

opensnoop also exists on OS X and does basically the same thing.

Go to the iovisor/bcc repo on github for installation instructions. It works using eBPF, which is a new thing that I will not explain yet here but Brendan Gregg has been writing about enthusiastically for some time. You don't need to know how it works to use it, though :)."

'via Blog this'

A checklist for Docker in the Enterprise | zwischenzugs

A checklist for Docker in the Enterprise | zwischenzugs: "Docker is extremely popular with developers, having gone as a product from zero to pretty much everywhere in a few years.

I started tinkering with Docker three years ago, got it going in a relatively small corp (700 employees) in a relatively unregulated environment. This was great fun: we set up our own registry, installed Docker on our development servers, installed Jenkins plugins to use Docker containers in our CI pipeline, even wrote our own build tool to get over the limitations of Dockerfiles.

I now work for an organisation working in arguably the most heavily regulated industry, with over 100K employees. The IT security department itself is bigger than the entire company I used to work for.

I want to outline the areas that may be important to an enterprise when considering developing a Docker infrastructure."

'via Blog this'

Qcon: Much Faster Networking

Qcon: Much Faster Networking
David Riddoch talks about technologies that make very high performance networking possible on commodity servers and networks, with a special focus on kernel bypass technologies including sockets acceleration and NFV. These techniques give user-space applications direct access to the network adapter hardware, making possible sub-microsecond latencies and millions of messages per second per thread

Thursday, 7 July 2016

Cloudfare: Why we use the Linux kernel's TCP stack - networking

Why we use the Linux kernel's TCP stack: "There are two general themes: first, there is no stable open-source partial kernel bypass technology yet. We hope Netmap will occupy this niche, and we are actively supporting it with our patches. Second, the Linux TCP stack has many critical features and very good debugging capabilities. It will take years to compete with this rich ecosystem.

For these reasons it's unlikely userspace networking will become mainstream. In practice I can think only of a few reasonable applications of kernel bypass techniques:

Software switches or routers. Here you want to hand over network cards to the application, deal with raw packets and skip the kernel altogether.
Dedicated loadbalancers. Similarly, if the machine is only doing packet shuffling skipping the kernel makes sense.

Partial bypass for selected high throughput / low latency applications. This is the setup we use for our DDoS mitigations. Unfortunately I'm not aware of a stable open source TCP stack that fits this category.

For the general user the Linux network stack is the right choice. Although it's less exciting than rewriting TCP stacks, we should focus on understanding the Linux stack performance and fixing its problems. There are some serious initiatives underway to improve the performance of the good old Linux TCP stack."

'via Blog this'

Wednesday, 6 July 2016

softwareyoga.: How to be invaluable employee?

How to be invaluable employee?

Or in other words how to bring more value to the company as a result of your work?

You got to show them how your work is making a difference to their world – A world where $, savings and time to market has precedence over exceptions, design patterns and interfaces.

Listed below are some aspects that will definitely get you recognized as a programmer who goes the extra mile beyond and above the official duties.

Understand what the end users want in the software. Talk to them. Figure out the things they don’t like in the existing system, things they would like to improve etc. You have to solve their problems using your technical skills.
Identify what modules or components have the most defects in your system. Come up with high level technical approach to solve the issues. Present the ideas to the management, explain what benefits it would bring to the company and how it can save time and cost. Use statistical figures to explain how your ideas will bring vast improvements.
Identify things that are being done manually or with complex workarounds. Eliminate them or handle them in a better way.
Speak to management on what they think are the biggest problem areas. A lot of the times, getting input from someone who is away from the technical stuff helps immensely.
Evaluate new technologies for your product and present to the team and management on what benefits they may bring.
Identify and improve workplace processes. You will surely have few issues regarding the way some things are done in the company. Figure a way to alleviate the issues.
Many a times, the project managers might over-promise or set extremely tight deadlines – not because they like sucking the last drop of blood out of you, but often because they don’t understand enough technically. Have a discussion with them on their expectations. With your technical knowledge, provide them inputs on whats possible realistically and how the scope can be altered so that everyone is a winner.
Draw comparisons of your work with other industries and bring in the best practices. Often, you can draw inspiration from the most unexpected places.
Mentor junior employees. Train other employees on various topics and practices. In most companies, you will have someone in the company who you aspire to be like some day. Meet them and ask them if they could be your mentor or if you could shadow them in their work.
Automate stuff outside of your product – Find people within your company who are not as tech savvy as you are. There will be some one out there who needs a helping hand. Solve people’s problems.

In conclusion, I believe that a programmer should get involved in other activities in a company, not just coding. It serves two purposes – provide you insights from a non-technical viewpoint and an understanding that programming is not the center of the universe.

Tuesday, 5 July 2016

Why is Python slow? « kmod's blog

Why is Python slow « kmod's blog: "Python spends almost all of its time in the C runtime

This means that it doesn't really matter how quickly you execute the "Python" part of Python. Another way of saying this is that Python opcodes are very complex, and the cost of executing them dwarfs the cost of dispatching them. Another analogy I give is that executing Python is more similar to rendering HTML than it is to executing JS -- it's more of a description of what the runtime should do rather than an explicit step-by-step account of how to do it.

Update: why is the Python C runtime slow?

Here's the example I gave in my talk illustrating the slowness of the C runtime. This is a for loop written in Python, but that doesn't execute any Python bytecodes:

import itertools
sum(itertools.repeat(1.0, 100000000))
The amazing thing about this is that if you write the equivalent loop in native JS, V8 can run it 6x faster than CPython. In the talk I mistakenly attributed this to boxing overhead, but Raymond Hettinger kindly pointed out that CPython's sum() has an optimization to avoid boxing when the summands are all floats (or ints). So it's not boxing overhead, and it's not dispatching on tp_as_number->tp_add to figure out how to add the arguments together.

My current best explanation is that it's not so much that the C runtime is slow at any given thing it does, but it just has to do a lot. In this itertools example, about 50% of the time is dedicated to catching floating point exceptions. The other 50% is spent figuring out how to iterate the itertools.repeat object, and checking whether the return value is a float or not. All of these checks are fast and well optimized, but they are done every loop iteration so they add up. A back-of-the-envelope calculation says that CPython takes about 30 CPU cycles per iteration of the loop, which is not very many, but is proportionally much more than V8's 5."

'via Blog this'

Friday, 1 July 2016

Why do we use the Linux kernel's TCP stack? - Julia Evans

Why do we use the Linux kernel's TCP stack? - Julia Evans: "I'm at PolyConf in Poland today, and I watched this super interesting talk by Leandro Pereira about Lwan, an ~8000 line of code web server. He talked about a bunch of the optimizations they'd done (improve CPU cache performance! be really careful about locking!). You can read more about the performance on the website & the links there.

It's a super cool project because it started out as a hobby project, and now he says it's getting to a state where it kinda actually really works and people are using it for real things. This web server is extremely fast -- it can do, in some benchmarks, 2 million requests per second.

Before I start talking about this -- of course practically nobody needs to do 2 million requests per second. I sure don't. But thinking about high performance computing is a really awesome way to understand the limits of computers better!

I tracked him down to ask him questions later, and he mentioned that most of the time is spent talking to the Linux kernel and copying things back and forth.

Then he said something really surprising: that in the Seastar HTTP framework, they wrote their own TCP stack, and it made everything several times times faster. What?!"

'via Blog this'