Wednesday, 27 May 2015

UNIX/LINUX command links


Below, we've listed a few problems and their awesome command line solutions. If you know of a more efficient solution, please share in the comments. Otherwise, we'd love to hear how you or your team solve problems with UNIX and the command line.

General Data Science links



NY Times: For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

et far too much handcrafted work — what data scientists call “data wrangling,” “data munging” and “data janitor work” — is still required. Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.
“Data wrangling is a huge — and surprisingly so — part of the job,” said Monica Rogati, vice president for data science at Jawbone, whose sensor-filled wristband and software track activity, sleep and food consumption, and suggest dietary and health tips based on the numbers. “It’s something that is not appreciated by data civilians. At times, it feels like everything we do.”
Several start-ups are trying to break through these big data bottlenecks by developing software to automate the gathering, cleaning and organizing of disparate data, which is plentiful but messy. The modern Wild West of data needs to be tamed somewhat so it can be recognized and exploited by a computer program.
“It’s an absolute myth that you can send an algorithm over raw data and have insights pop up,” said Jeffrey Heer, a professor of computer science at the University of Washington and a co-founder of Trifacta, a start-up based in San Francisco.


Wednesday, 20 May 2015

State and Strategy patterns

http://en.wikipedia.org/wiki/State_pattern

The state pattern, which closely resembles Strategy Pattern, is a behavioral software design pattern, also known as the objects for states pattern. This pattern is used incomputer programming to encapsulate varying behavior for the same object based on its internal state. This can be a cleaner way for an object to change its behavior at runtime without resorting to large monolithic conditional statements[1]:395 and thus improve maintainability.


Monday, 18 May 2015

Fuzzy Logic



PDF:  Fuzzy Logic in Machine Learning

Wikipedia: Fuzzy Logic

Classical logic only permits propositions having a value of truth or falsity. The notion of whether 1+1=2 is an absolute, immutable and mathematical truth. However, there exist certain propositions with variable answers, such as asking various people to identify a colour. The notion of truth doesn't fall by the wayside, but rather on a means of representing and reasoning over partial knowledge when afforded, by aggregating all possible outcomes into a dimensional spectrum.
Both degrees of truth and probabilities range between 0 and 1 and hence may seem similar at first. For example, let a 100 ml glass contain 30 ml of water. Then we may consider two concepts: empty and full. The meaning of each of them can be represented by a certain fuzzy set. Then one might define the glass as being 0.7 empty and 0.3 full. Note that the concept of emptiness would be subjective and thus would depend on the observer or designer. Another designer might, equally well, design a set membership function where the glass would be considered full for all values down to 50 ml. It is essential to realize that fuzzy logic uses truth degrees as a mathematical model of thevagueness phenomenon while probability is a mathematical model of ignorance.


Continuous Deployment, Integration links


The level of testing that is performed in CI can completely vary but the key fundamentals are that multiple integrations from different developers are done through out the day. The biggest advantage of following this approach is that if there are any errors then they are identified early in the cycle, typically soon after the commit. Finding the bugs closer to commit does make them much more easier to fix. This is explained well by Martin Fowler:
Continuous Integrations doesn’t get rid of bugs, but it does make them dramatically easier to find and remove.