May 17, 2012

I Would Like Sphinx PDF Generation to Support the LaTeX Memoir Package

I would really like to use the same toolset for publishing POD (Print On Demand) books that I use for ebooks. Yet the pdf generation abilities of both docutils and Sphinx seem more focused on pdf “ebooks”—pdf’s that you would read from a computer—not pdf’s that are approriate to create physical books from. I think the solution is to add support for the LaTeX “memoir” package to Sphinx. I’ll eventually get around to it, but it is not on the top of my stack. To that end I’ll pay up to $200 (paypal or Amazon gift) to get that support.

For those not familiar with memoir, here’s a snippet of the README:

Memoir is a flexible class for typesetting poetry, fiction, non-fiction and mathematical works as books, reports, articles or manuscripts. Documents can use 9pt, 10pt, 11pt, 12pt, 14pt, or 17pt as the normal fontsize and, if you have scalable fonts, 20pt, 25pt, 30pt, 36pt, 48pt, or 60pt sizes, or even larger. Many methods are provided to let you create your particular design. The class incorporates over 30 of the more popular packages.

Here are the details. The first person who notifies me of two github repos that meet the following points will get $100:

  • A github repo containing a Sphinx project that renders out to pdf using the memoir package. The document must have an example of:
    • Half Title Page
    • Title Page
    • Copyright Page (on verso page)
    • Dedication
    • Table of Contents
    • Introduction
    • Chapters (should start on right page)
    • An index
    • blank pages at the end
    • Customizing page size
    • The chapers should have an example of:
      • Text spanning multiple pages (lorem ipsum is fine)
      • Normal text
      • Italicized text
      • bold text
      • monospace text
      • Sections
      • footnotes
      • a code block
      • an admonition
      • a long admonition that is longer than a page
      • an admonition with a code block inside of it
      • a figure
      • a table
  • A github repo containing a fork of Sphinx that has memoir support. When using this fork with the above project, a suitable pdf should fall out. Numbering of chapters is correct, page numbering is correct (odd on right side), index is correct, etc.

Ideally most of the book (Table of Contents through Index) would be pure rst (or sphinx rst++). I’m ok with raw directives on the front matter.

If you are familiar with LaTeX and Sphinx, this should be pretty straightforward. It would also be a great project for someone interesting in learning those technologies.

The person who can deliver the above will make me quite happy. But, as I said, I would pay up to $200. The remaining $100 will be appropriated to the developer (or someone else) who sees that this gets pushed upstream into Sphinx proper (ie when I see a bitbucket commit with the changes on the Sphinx project). So the person who files the bug, patch and follows through with any needed tweaks to get this into mainline will get that money. I reserve the right to split up this portion among people as well.

(I apologize in advance for the github requirement, I prefer that system, so the first deliverable needs to be there. Given that Sphinx prefers bitbucket, you need to play well with them for the second deliverable.)

Coding Is a Life Skill That Should Be Taught in Elementary School

As a seventh grader I had the choice between learning French and Spanish. I choose Spanish because it was practical (or at least my parents convinced me of that), while it seemed that all the cool kids took French. Excuse my French but I think my parents were right—taking French expecting to actually use it (or have a reason to use it) would be a waste of time. (Sorry to the 2% of my blog visitors to whom that hits home). In fact the same could be said of the two years I took Spanish. I remembered only the colors and my Spanish name, “Paco”.

But taking Spanish for two years twisted my brain enough that I actually took a year of German after (again useless if I were expecting to use it to purchase say a Mercedes). And it was easier this time. When I did become fluent in Spanish three years later, I was grateful that I had learned how to conjugate verbs previously. My uptake was quicker because I had practice. In two months I was dropped off in the middle of Colombia and held my ground ok. (Later my Spanish professor would inform me that speaking in Spanish is insulting in Gringolandia, and I don’t pull it out too much now. So it was useless.)

Likewise, the Pascal class I took in High School wasn’t super practical. I didn’t even have a machine at home to code Pascal on. I haven’t coded in Pascal since. Yet the logical manner it taught me to think has been a wonderful life skill. It has even helped me to debug plumbing issues. When a the faucet is clogged, I could just disconnect the pipes connecting the faucet to the wall and start snaking around in the wall. But (given that I have girls around with long hair) it makes sense to start right at the faucet itself and snaking that without even disconnecting any pipes. Less mess, and more time for other less grimey things.

(Alas learning more about plumbing has helped in other areas as well, such as knowing how to replace a toilet, or stop a leak. It’s probably a decent life skill to have if you are a homeowner or live in a place with running water.)

The section I took on Lisp in college was not super practical either. Even then no one programmed in Lisp. Why did I waste that time? Because I have experienced functional constructs, I now appreciate them and see their utility (and even the allure of Clojure). I appreciate list comprehensions, first class functions and the laziness of generators. (If you have read this far in the post you’ll notice that I also like parentheticals.)

I had the chance to teach ~80 year old man how to program. He wanted to learn just for the sake of understanding what the little creatures inside of his computer were doing. Waste of time?

I’ve taught 3rd graders how to create ebooks. That was really an excuse to teach them xml, html, css, typography, fibonacci sequences, image compression, debugging and actually learning how to use a computer for something other than clicking the blue “e” and typing in starfall.com. I neglected to inform them that they could have just as easily written their books in Word and exported to HTML. I don’t feel guilty about that. (Some even wrote thank you cards in both HTML and the rendered output!)

I have also been able to teach 3rd graders programming. They were able to create simple text games with conditionals, functions and looping constructs. Did they create the next 3D shooter? Did they even create a game that they’d like to play more than once? Mostly not. But they learned a little bit about how computers work. They also learned how to think logically, and how to be specific. (My wife compiler still complains that my English isn’t specific enough—too many “this” and “that”s at line 0. To which her compiler responds “I have no idea what you are talking about!”)

Today, the world still runs on Excel. A poor man’s database or programming environment. Just today a co-worker asked to help move some data from Excel to a program so more advanced calculations could be performed on the data. There was also an urgency to get rid of the manual labor of tweaking rows and columns. The end user wanted automation and power. (And to be lazy).

If you have any inclination of working with programmers you should take CS101. You need to have some semblance of understanding of what they need and what you need to help them. Do you feel like a doofus when a car repair person is telling you what you need done to your car? Do you want to get ripped off by the plumber telling you the twenty things he needs to fix?

How often am I burdened by my (loving) family and neighbors, who knowing that I “work with computers” think that that means I can clean the spyware off their XP installation! I imagine that I would not play that beast of burden if others had some understanding of how computers and the internet actually worked.

Surely a little understanding of what powers the digital aspects of our lives can’t be too harmful.

May 16, 2012

Python Profiling - Part 1

I gave a talk on profiling python code at the 2012 Utah Open Source Conference. Here are the slides and the accompanying code.

There are three parts to this profiling talk:

  • Standard Lib Tools - cProfile, Pstats
  • Third Party Tools - line_profiler, mem_profiler
  • Commercial Tools - New Relic

This is Part 1 of that talk. It covers:

  • cProfile module - usage
  • Pstats module - usage
  • RunSnakeRun - GUI viewer

Why Profiling:

  • Identify the bottle-necks.
  • Optimize intelligently. 

In God we trust, everyone else bring data

cProfile:

cProfile is a profiling module that is included in the Python's standard library. It instruments the code and reports the time to run each function and the number of times each function is called. 

Basic Usage:

The sample code I'm profiling is finding the lowest common multiplier of two numbers. lcm.py

# lcm.py - ver1 
    def lcm(arg1, arg2):
        i = max(arg1, arg2)
        while i < (arg1 * arg2):
            if i % min(arg1,arg2) == 0:
                return i
            i += max(arg1,arg2)
        return(arg1 * arg2)

    lcm(21498497, 3890120)

Let's run the profiler.

$ python -m cProfile lcm.py 
     7780242 function calls in 4.474 seconds
    
    Ordered by: standard name
   
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
         1    0.000    0.000    4.474    4.474 lcm.py:3()
         1    2.713    2.713    4.474    4.474 lcm.py:3(lcm)
   3890120    0.881    0.000    0.881    0.000 {max}
         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   3890119    0.880    0.000    0.880    0.000 {min}

Output Columns:

  • ncalls - number of calls to a function.
  • tottime - total time spent in the function without counting calls to sub-functions.
  • percall - tottime/ncalls
  • cumtime - cumulative time spent in a function and it's sub-functions.
  • percall - cumtime/ncalls

It's clear from the output that the built-in functions max() and min() are called a few thousand times which could be optimized by saving the results in a variable instead of calling it every time. 

Pstats:

Pstats is also included in the standard library that is used to analyze profiles that are saved using the cProfile module. 

Usage:

For scripts that are bigger it's not feasible to analyze the output of the cProfile module on the command-line. The solution is to save the profile to a file and use Pstats to analyze it like a database. Example:  Let's analyze shorten.py.

$ python -m cProfile -o shorten.prof shorten.py   # saves the output to shorten.prof

$ ls
shorten.py shorten.prof

Let's analyze the profiler output to list the top 5 frequently called functions.

$ python 
>>> import pstats
>>> p  = pstats.Stats('script.prof')   # Load the profiler output
>>> p.sort_stats('calls')              # Sort the results by the ncalls column
>>> p.print_stats(5)                   # Print top 5 items

    95665 function calls (93215 primitive calls) in 2.371 seconds
    
   Ordered by: call count
   List reduced from 1919 to 5 due to restriction <5>
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10819/10539    0.002    0.000    0.002    0.000 {len}
           9432    0.002    0.000    0.002    0.000 {method 'append' of 'list' objects}
           6061    0.003    0.000    0.003    0.000 {isinstance}
           3092    0.004    0.000    0.005    0.000 /lib/python2.7/sre_parse.py:182(__next)
           2617    0.001    0.000    0.001    0.000 {method 'endswith' of 'str' objects}

This is quite tedious or not a lot of fun. Let's introduce a GUI so we can easily drill down. 

RunSnakeRun:

This cleverly named GUI written in wxPython makes life a lot easy. 

Install it from PyPI using (requires wxPython)

$ pip install SquareMap RunSnakeRun
$ runsnake shorten.prof     #load the profile using GUI

The output is displayed using squaremaps that clearly highlights the bigger pieces of the pie that are worth optimizing. 

Runsnake

It also lets you sort by clicking the columns or drill down by double clicking on a piece of the SquareMap.

Conclusion:

That concludes Part 1 of the profiling series. All the tools except RunSnakeRun are available as part of the standard library. It is essential to introspect the code before we start shooting in the dark in the hopes of optimizing the code.

We'll look at line_profilers and mem_profilers in Part 2. Stay tuned. 

You are welcome to follow me on twitter (@amjithr).

Permalink | Leave a comment  »

May 15, 2012

Standard Information Sharing Labels
Standard Label for Facebook

Some years ago, based on an idea that came up on a train ride to the airport from OSCON, Kaliya Hamlin, Aldo Castaneda and I put together a The paper for the W3C Workshop on Transparency and Usability of Web Authentication was accepted for presentation on identity rights agreements. The idea is that you ought to be able to mark up data you share to let people know how it can be used. Think Creative Commons for personal data. Recently a number of people, including myself, Drummond Reed, and Marc Davis, discussed a similar idea at a WEF Tiger day.

Joe Andrieu has a proposal that is slightly less ambitious and serves as the launching pad for more complete solutions. Joe's idea is simple and easy to understand. Just like we have a standard label for drugs so that people can more easily understand how to take a drug and what it does, we should have a standard label for sites that want you to share your personal information so it's easy to understand what's going to happen if you say yes. Contrast this with the current EULA model where people are faced with 70 pages of information in a non-standard format that they need to understand if they're to truly be smart about what they share.

Joe has a Kickstarter project for Standard Label to get money to design the label. If you care about understanding what you're sharing and think people should be smarter about what they share, then I encourage you to support this project, even if you can only give a $1, give something so show you're behind the work. Here's the video from the Kickstarter page.

Now, go and back the project!

Tags: identity sharing privacy identity+rights standard+label pdrl

Rich Sharing and Personal Channels
[]

The Social Web has shown us the power of connecting. Facebook has friends, LinkedIn has connections, and Twitter has followers. These channels allow their owners to communicate with others, although their capabilities vary greatly. But the resulting relationship graphs are stilted because their proprietary nature makes interoperation and extension difficult—in spite of all of the money and time invested in creating APIs to access them.

I look forward to a relationship network that is based on open standards just as the email network and indeed the Internet itself are. The power of the Internet to serve an untold variety of purposes in a flexible way is a direct result of the open standards upon which it is based. Relationship networks based on open standards will provide unprecedented value and opportunities for people because of the new applications it will engender.

This paper will describe something called a personal channel, based on open standards and protocols, that can form such a relationship network. Personal channels link personal clouds, the subject of an earlier white paper. This paper assumes a knowledge of personal clouds, their features and their capabilities. We will share that channels have properties necessary to induce rich sharing, a hallmark of flexibility without which they would not be able to accomplish all that is needed.

Personal Channels

Long ago, personal computers were interesting in their own right. That changed in the 90's with the emergence of widespread network connectivity. Anymore, a PC that's not connected to the Internet is not only boring, it's non-functional for many of the tasks that people perform every day. If you don't believe me, just turn off the network on your computer for a day. And of course, the modern personal computer—the smartphone—makes connectivity the very foundation of the platform.

Like personal computers, personal clouds are only interesting when they are connected. Personal channels link personal clouds. The collection of channels connecting myriad personal clouds form a relationship network. On an open standard relationship network, the attributes, permissions, and capabilities of a relationship are standardized and extensible. Every relationship is a link. A link may be a simple one-way (asymmetric) subscriber relationship that does not require involvement of the second party, or it may be a stronger two-way (symmetric) relationship in which both parties act as publisher and subscriber.

In either case, when data and messages can flow in one or both directions across a link, it is called a channel. The control each party has over the channel--the terms and conditions to which they agree over how it will work--is called a link contract. Control over the channel still resides in the link contract(s) with the connected parties. The following figure shows two personal clouds connected via a channel controlled with a link contract.

Personal clouds linked by personal channels

Channels exhibit the following properties:

  • Personal channels provide separately revocable, separately trackable authority to share between personal clouds.
  • Any given personal cloud can have any number of inbound and outbound channels. Any two personal clouds may share multiple channels for different purposes.
  • Channels use a combination of the Event eXchange Protocol (EXP) and XRI Data Interchange (XDI) protocol that give them metaprotocol capabilities. Channels are ways of doing something instead of a place for doing something.
  • Link contracts are a flexible means of declaring fine-grained access control to data and services. Link contracts specify the nature and behavior of a channel.
  • Channels are the conduits over which messages pass between personal clouds. These messages include event notifications, data queries, and data transfers.
  • A channel need not be restricted to just two parties. It may connect the members of a group (e.g., email distribution lists), or access may be fully public (e.g., blogs or Twitter feeds).

Like email, channels form a point-to-point network between personal clouds all speaking the same protocol. Unlike an email server, whose sole function is usually email processing, a personal cloud is more like a general-purpose computer in the cloud; it has an operating system that runs applications, processes events, and manages data under direct control of its owner.

This is why channels on the relationship web can be dramatically more useful to individuals and businesses than ordinary email or Web connections.

Rich Sharing

Marc Stiegler of HP Labs has written (PDF) and spoken about rich sharing. Alan Karp has written about PubShare, a system Marc built that demonstrates rich sharing. Alan relates two stories that contrast our expectations about sharing in the physical and online worlds. The first takes place in the physical world:

In an emergency, Marc asked me to park his car in my garage. I couldn't do it, so I asked my neighbor to do it for me and said to get the garage key from my son.

The second involves an online file sharing scenario:

In an emergency, Marc asked me to copy a file from his computer to mine. I couldn't do it, so I asked my neighbor to do it for me and said to get access to my computer account from my son.

The second story is ludicrous to us because we can't see a reasonable way for it to work even though it closely resembles the scenario from the physical world.

Rich sharing characterizes what makes human communication in the physical world work. Using this model, we can determine how to create better online communication systems. Communication systems, like email, that embody rich sharing feel natural to users and thus succeed. Systems that don't feel stilted or unwieldy and thus don't scale the way their designers intended.

Sharing is easy and technically uninteresting in situations where the shared item is public and there's no need to authorize access to it. Similarly workgroup-style sharing is relatively straightforward and the tools for protecting resources in workgroups such as role-based authorization control (RBAC) and access control lists (ACLs) are well understood. For purposes of contrast, let's call unprotected and workgroup-style simple sharing.

Sharing becomes much more nuanced when access to the shared item must be restricted and the players in the sharing scenario operate in independent security domains. Many real-world scenarios require rich sharing. Stiegler and Karp demonstrate why workgroup-style sharing can't accommodate rich sharing scenarios.

Rich sharing is characterized by six key features:

  • Dynamic—Sharing can be done without reconfiguring the system or having other work done by the sharer's IT department.
  • Attenuated—Sharing happens with the right permissions on the right items.
  • Chained—A shared item can be reshared in appropriate ways. Authority can be re-delegated. Building attenuated chains of delegated authority is difficult in simple sharing architectures.
  • Cross domain—Sharing can occur across security domains without the user linking the domains in an ad hoc manner or the IT department having to setup special purpose federated identity systems.
  • Recomposable—The shared item or service can be used in conjunction with other resources and services even if those documents and services exist in a separate security domain.
  • Accountable—Even though sharing can be re-delegated along a chain, the original owner must maintain the ability to audit and track the use of the shared item and hold the appropriate parties accountable for misuse.

Stiegler and Karp make a case that email succeeds because email demonstrates these six attributes. In contrast, it's easy to find examples in other sharing architectures that fail to incorporate one or more of these and thus become difficult to use as the sharing scenarios get more complicated. Today's popular social networks all fail to meet one or more of the above attributes.

Personal Channels Support Rich Sharing

Personal channels exhibit rich sharing. We mentioned in an earlier section of this paper that channels provide a metaprotocol for interaction. Thus they represent a way of doing things rather than a place. Rich sharing is more easily supported by ways—protocols—rather than by places. In fact, I argue that properties of rich sharing such as being cross domain and recomposable are nearly impossible to achieve using a place such as a Web site.

Let's examine the attributes of rich sharing and see how channels stack up:

  • Dynamic—A personal cloud can use a personal channel to send a message to any other personal cloud that subscribes to it at any time. Subscriptions can be formed between two personal clouds or between a cloud and another network service at will.
  • Attenuated—Link contracts provide a means of fine grained access control that enables attenuation.
  • Chained—Upon receiving a message on a channel, a personal cloud can delegate that message to other personal clouds. This delegation may be algorithmic, but is always under the ultimate control of the personal clouds owner.
  • Cross domain—Each personal cloud functions as its own domain in the same sense that an email inbox represents an independent domain controlled by its owner. Thus a channel carries messages from one domain to another.
  • Recomposable—Messages sent along a channel, be they events, queries, or data are composed with other information from other sources (e.g. APIs, other channels, etc.) as part of the processing done by a personal cloud.
  • Accountable—Channels are uniquely identified and individually revokable. The unique identity combined with the ability to declare authoritatively the nature and behavior of the channel via link contracts provides flexible accountability that can be tuned to a given purpose.

Conclusion

Rich sharing requires that the sharing be dynamic, accountable, recomposable, and cross-domain, while enabling the chaining (repeated redelegation) of attenuated access (including separable revokablity). We have shown that personal channels exhibit these properties and thus enable rich sharing.

Because channels support rich sharing, they are extremely flexible and can be used for many purposes. Personal channels provide a messaging system for personal clouds that provides access-controlled, filtered, trustworthy notifications, data exchange, and sharing. Future papers will expand on these benefits of personal channels.

Tags: krl personal+clouds personal+channels personal+events rich+sharing

May 14, 2012

New Theme is Here!!!

After far far too long (my bad), the Ubuntu Forums has a theme that matches the Ubuntu branding.

The new look of ubuntuforums.org

To use the new theme, select Ubuntu from the selection box on the lower left hand corner of any forums page. I hope to see it as the default theme soon.

We will still looking at upgrading the forums to the vB4 series, which has a mobile theme that will be a great benefit to our users, plus other new features.

Thank everyone for there patience, and if you find a bug in the new theme be sure to file a bug.

May 13, 2012

Video: Look At Your Data – John Rauser – Velocity 2011

Last year I saw some videos from Velocity 2011 that were really enlightening, and today I wanted to re-watch one of them. Of course, I hadn’t bookmarked them, or remembered which conference they were from. So after some searching and remembering, I found the video. “Look at Your Data” by John Rauser. It is an excellent video and understanding your performance data and how to get more from it. I’ve embedded it below, but I would encourage everyone to checkout all the videos they posted from Velocity 2011, its a bunch of great stuff.

No related posts.

May 10, 2012

Exaile needs a team leader

Due to the recent news Regarding Stallman’s heart attack, I ended up on the wikipedia page for The Cathedral and the Bazaar, and I was reading the “Guidelines for creating good opensource software”. Number 3 says:

“When you lose interest in a program, your last duty to it is to hand it off to a competent successor.”

I have shirked my duty. I should have done this a long time ago. It’s not necessarily that I have lost interest, though that is part of it (I’ve gone the route of cloud music, such as spotify and google music), I really don’t have the time anymore.

Had I done this a year or two ago, I can think of a few people who would have taken the reins. Now, though, there’s hardly anyone left who has any time anymore. Perhaps everyone is going the route of cloud music, I have no idea.

Anyway, I posted a “help wanted” listing of sorts, asking for someone to step up as the team leader. We’ll see where that leads us :)

Broke 100 Salt Contributors!

The Salt community keeps growing, and we are excited to have such wonderful talent helping to develop Salt. Just a few days ago Salt recived a pull request from the 100th contributor!

https://www.ohloh.net/p/salt/factoids

 


May 09, 2012

Unlocking Data Exchange: The Long Tail of Data
I-20 Stack Interchange

Much has been made of data lately. And with good reason. Data and the ability to exchange and process it are at the heart of modern society's productivity and prosperity. Data and algorithms are the engines that drive the economy in the 21st century.

But data is often onerous to obtain, difficult to trust, and hard to understand. Fixing these problems—making trustworthy, understandable data flow more freely, consistently, and reliably—will provide a wellspring of new ideas and companies to prosecute them.

This post makes a case that there is a structural problem standing in the way freely flowing data and describes a method for removing that structural barrier.

The Long Tail

In October 2004, Chris Anderson introduced the concept of the long tail in an article in Wired magazine. The idea, simply put is that the infinite shelf space and near-zero distribution costs brought about by the Web have revolutionized many businesses by allowing them to compete for business that was formerly too expensive to service.

The concept is called the long tail because if you plot the power law distribution of the relevant data (e.g. revenue from sales of a given book title, song title, airline ticket to a particular destination, and so on) there's always a cut off point where it gets too expensive to service the business using traditional business models. Here's one of the charts from the Wired article:

Anatomy of the Long Tail

Notice that in the example shown there is a line on the curve and to the left of that line the words "Songs available at WalMart and Rhapsody". The area under the curve to the left of the cut line is the head of the curve. The area under the curve to the right of the cut—the yellow sections—is the tail and since it's long when you have infinite shelf space, it's the long tail. The area in the long tail is the revenue available to Rhapsody but not to WalMart.

The important point is that Amazon, Rhapsody, and Netflix, to use the examples in the graph, can sell all the same product as their competitors as well as product their competitors can't. A brick and mortar book store can't stock every book, but Amazon can. In many cases the area—and thus the available revenue—of the tail is larger than the area in the head.

The Long Tail of Consumer Credit

In credit markets, the kings of the long tail are Visa and Mastercard. You need credit to make a purchase. Before credit cards, you would have made a deal with the local merchant to extend credit, or in the case of a large purchase, taken out a consumer loan at the bank (my parents used to do this). Now, we just put it on the card.

The credit card, largely developed in the 1950s and 1960s represents a huge leap forward in thinking about how credit is extended. Some companies, like Diner's club and American Express developed a credit system that was based on each merchant and consumer having a direct relationship with the credit card company. Many banks did the same thing. In contrast, Visa and the Mastercard established credit networks. The following diagram depicts the relationships in the credit network.

Visa model

In a credit network, both the customer and merchant have a relationship with their respective bank and their banks have relationships with the Visa network.

Table 1: Comparing Credit
Without Credit Network With Credit Network
Relationship one-to-one any-to-any
Credit Terms per-loan on demand
Penetration select merchants ubiquitous
Processing cost expensive cheap

Table 1 shows a few of the differences between credit before and after credit cards:

  • Relationship—before credit networks, credit was extended on a one-to-one basis. You made a credit arrangement with each lender. With credit cards, the arrangement is any-to-any, you can walk into almost any merchant on earth and use credit with no need for a prior relationship. Moreover, in the networked model, both customer and merchant have relationships with independent banks. Any bank will do, so long as they're a member of the network.
  • Credit terms—before credit networks, credit was done on a per-loan basis. When you needed credit, you filled out the forms for a particular credit transaction. The next week you might do it all again for another. With credit networks, you get credit on demand, in real-time
  • Penetration—before credit networks, you had to select merchants based on what cards they accepted. This was frustrating to merchants and customers alike. With a credit network, even though there are still many cards, they are interoperable with any merchant, making their penetration nearly ubiquitous.
  • Processing cost—without credit networks each transaction has to be negotiated and approved individually and, often manually. With a credit network transactions costs are greatly reduced through standardized contracts and automatic approval and settlement.

These attributes are what give credit networks their long tail potential. Credit transactions of all sorts are available to a wider range of people for a wider range of goods and services from a wider range of merchants.

The Credit Network

We call Visa a "network" but that label may be confusing to people who think of networks in terms of routers and data connections. In fact Visa is two things (yes, I'm simplifying a great deal here):

  1. A collection of contracts
  2. A protocol

Notice there are no wires. The wires are provided by companies like First Data Corp. who actually do the processing according to the terms of Visa's contracts and protocol. Nevertheless, a network it is because it links countless people and merchants via their banks through the mechanisms of contracts and protocols.

The magic of Visa is the realization that each bank didn't need a contract with every merchant and every customer or even a contract with every other bank. That's why Visa is a "network." Visa has contracts with each bank, the banks have contracts with customers and merchants and the chain of contracts from a customer, to her bank, to Visa, to another bank, and finally to the merchant is sufficient to convince the merchant that she will be paid when she walks in a buy a new pair of shoes. Every time you use your credit card, you exercise a different path through those chains of trust. Visa is thus a trust framework.

By establishing a network that was

  1. any-to-any,
  2. on demand,
  3. ubiquitous, and
  4. cheap,

Visa was able to create a system that services the long tail of credit. Almost any transaction, almost anywhere can be handled by their network for pennies on the dollar.

Data Exchange Networks

The world of data exchange looks, in many ways, like the world of credit before Visa. Companies like Acxiom, D&B, Experian, and Lexis-Nexis sell data on a one-to-one basis, according to pre-executed contracts, in batch. And it's not cheap. These are companies who have built profitable businesses servicing the head of the curve. But they don't service the long tail. They can't, because they don't have a network.

Imagine you want to start a business that needs access to risk data (i.e. data about the trustworthiness of a business or person). First, you'll have to go through the sales process where you'll be screened to ensure you can sign a contract that has a monthly minimum (say $5000/month), then you'll have to go through legal to get contracts in place, finally you'll agree to the format for your batch of data and integrate your systems with those of the data company. Of course, you'll pay more if you need data more frequently than the norm.

If you only need a little data, or data on demand, or from different sources depending on the transaction, you don't fit in the head of the curve. How many startups don't get built because their business model needs, but can't afford, access to data? How many startups don't get built because they can't make data available cheaply? These are lost opportunities that need a new model if they're to be realized.

A data network solves this problem in exactly the same way that the Visa network solves the credit problem. By putting contracts in place up front and building a trust framework upon those contracts, a data network allows cheap, ubiquitous, on demand, any-to-any access to data.

Drummond Reed has built a company around this very idea, called Respect Network Corp (RNC). The idea is that like Visa or Mastercard, RNC will use standardized contracts to create relationships with data providers and data consumers. Protocols will describe how data transactions are initiated, negotiated, and consummated. Payment will be based on the value of the data but is likely made outside the data network on an existing payment network since they're optimized for that. As an aside, if you look at RNC's business model, you'll see a slightly different version of this based not on raw data transfers as I've described here, but more long-term relationships between merchants and their customers.

Kynetx is working closely with RNC in building the network. The model and legal framework are fairly well understood. What is less well defined at this point is the nature of the data exchange protocols. Our recent white paper, From Personal Computers to Personal Clouds, outlines what we think the nodes in the network will be like. The network itself must provide services to these nodes so that they can interact efficiently and safely. Specifically, the network must provide the following services:

  • Reputation—in any-to-any interactions, players will frequently do business with nodes in the network with whom they don't have a pre-existing relationship. In the credit network, this function is performed by the banks who issue merchant accounts and by fraud algorithms that try to detect bad actors. In a data network, anyone, even the customer, might be a data provider, so a reputation system can remove some of the risk in knowing who is providing reliable data.
  • Discovery—finding the data provider who has the data you want at a price you're willing to pay is tough job without some help. The network will provide discovery services to aid in this task.
  • Semantic Mapping—the individual nodes in the network provide semantic data interchange, but for that to work, they need semantic maps (e.g. ontologies) that have been agreed to by participants in the network.
  • Brokerage—the network facilitates payment, probably through an existing credit network. The network also facilitates setting up subscriptions to data services by passing channel details from publisher to subscriber when the relationship is established.

Building this network is a tall order compared to building credit networks. Financial transactions, for example, have simple semantics compared to data transactions. A few well-established protocols suffice for authorizing and settling credit transactions. In contrast, data transactions may need multiple protocols depending on the exact exchange, even with semantic data interchange in place. Nevertheless, such a network would open up the long tail of data transactions for dozens, even hundreds of companies in the same way that the Web opened up the long tail to ecommerce companies.

The good news is that in the second decade of the 21st century, we're ready to take on this task. The Web provides a foundation for transport and recent advances in the understanding of APIs and data interchange have prepared countless developers and companies to work in this new world. The technologies and systems described in From Personal Computers to Personal Clouds including the Event eXchange Protocol (EXP), Kinetic Rule Language (KRL), and XRI Data Interchange (XDI) are the key components in building this network. The legal framework being put in place by Respect Network Corp provides the glue that binds them together.

Public and Private

There may be some reading this who have grave misgivings about what I've described because it envisions a private, rather than public, data network. I believe that this network has to be, at least partially, private for the same reasons that no one has ever created a public credit network to rival Visa and Mastercard. The primary reason is trust.

The protocols that underlie the network I've described are all public or open source and thus available to anyone. What can't be open source is the legal framework that engenders that trust. There will necessarily be an organization that is the foundation of those contracts. While there may be several of these data interchange networks over the next few years, I believe this will likely devolve to duopoly as most other quasi-public utilities seem to do.

Unlocking Data Interchange

The network I've described in this paper solves a structural problem in data interchange that limits current business models to one-to-one, heavyweight relationships. Building an open data interchange network underneath a trust umbrella, enables new business models to thrive by reducing the friction and expense through lightweight, any-to-any interactions.

Tags: kynetx respect+network data personal+clouds

May 06, 2012

Utah Open Source Conference 2012 - Presentation slides

Thursday and Friday of last week I attended the Utah Open Source Conference held at Utah Valley University. Though registration fees were waived since I presented at the conference, dollar-for-dollar compared to other, much more expensive conferences I've attended, I think the Utah Open Source Conference is an amazing value. Between great sessions and informative content, and rubbing shoulders with so many smart people, I had a great time.

I presented on Friday morning, about Open Source tools for automating web performance analysis. The audience was great, and I extend my thanks to everyone who attended. Hopefully my content was helpful. Slides are available via the link below.

Automated, Open-Source Web Performance Analysis (PDF)

Comments

May 02, 2012

PDS Interoperability
Dead Data

Last week in London, I attended a workshop Iain Henderson put on at Innovation Warehouse on Personal Data Store interoperability. He used the following illustrative use cases to talk about what interoperability means for a personal data service (PDS):

  • As an individual, I want to be able to pick up my data from one personal data service (locker etc.) and drop it into another with minimal fuss
  • As an individual, I want to be able to have applications run on my personal data, even when it exists in multiple different services
  • As an individual, I want the apps I had running on my personal data in one locker service, to work when I move my data to another one
  • As an application developer, I want the apps I build to run, with minimal overhead, across multiple personal data services
  • As an organisation willing to receive and respond to VRM style data feeds; I don't want to have to set up different mechanisms for each different VRM provider
  • As an organisation willing to provide/ return data to customers as part of our proposition to them; I want to be able to make this data available in the minimum number of ways that meet the users needs, not a different format for each personal data service provider
  • As an organisation willing to provide/ return data to customers as part of our proposition to them; I don't want to have a different button/ connection mechanism for each personal data service provider on the 'customers signed in' page of my web site

These are worth striving for. Just having a place to put personal data isn't much use to me unless I can use it, move it, etc. As more and more companies become personal data service providers, they will do well to think through how they're planning for these. Kynetx is working on a project over the next 6 months to specifically address the developer side of this. Specifically, I think developers need two things:

  1. location independent references—applications will be easier to write if the developer doesn't have to know where the user's data is stored and the particulars of the API, authorization scheme, and so on.
  2. semantic data interchange—The developer shouldn't have to parse the semantics of each data services particular tags, field names, and so on. A simple example is "cell" vs. "mobile" in a contact data store.

The particular technology we're using to do this is XDI. We'll be experimenting with how an XDI server can be integrated with the Kinetic Rules Engine and how XRI references and XDI statements can be incorporated into KRL.

Tags: personal+data kynetx xdi

April 30, 2012

UTOSC 2012 Saturday – Bring your boardgames!

If it’s not clear, I love to play boardgames. Usually, I like strategy games like Settlers of Catan, Small World, Pandemic and much, much more. I’ve probably played around 100 different boardgames over the past 10 years. It’s a fun passion of mine and allows me to spend time annihilating friends without actually causing them physical harm.

Coming up this Saturday at UTOSC 2012, we will be setting up a room dedicated to boardgames. We’ll be LA 122 starting at 10am and going until someone kicks us out of the building. It could be a fairly late night of boardgame fun!!

If you plan on attending any or all of the boardgames session at UTOSC, please bring your boardgames. Mark them in some fashion with your name, and bring them down! We need tons of games as many people will come to play!

Myself, I’ll be bringing at least the following games, possibly more:

  • Mag Blast
  • Ticket to Ride (USA and Europe)
  • Small World
  • Killer Bunnies
  • Dominion
  • Apples to Apples
  • Bang!

Bring your games or come and play what is there. Either way it should be a blast!

Cheers,

herlo

May Meeting: OpenSHIFT architecture (Mark Atwood)

Date: Wednesday, May 9th, 2012
Time: 7:30 PM
Location: C7 Data Centers (Lindon)

Mark Atwood will be presenting on the OpenSHIFT architecture. Mark has been a long-time contributor to open source. His technology interests include Cloud Computing and NoSQL. He is the patch queue manager for MySQL Drizzle. He was the Senior Technology Advisor for Network.com at Sun Microsystems. He makes his home in Seattle USA.

We will be meeting at the C7 facility at Canopy Building 5 in Lindon, behind Home Depot. You will need to bring a photo ID in order to si

read more

Salt 0.9.9, the last pre 1.0

Salt 0.9.9 is now available! This is the last release before salt 1.0 and marks a major feature freeze. 0.9.9 comes with many bugfixes and feature enhancements. These include scaling fixes which makes Salt much more stable in large environments, extensive additions to the test suite, syntax additions to make states even easier and much more.

To see the full release announcement look here:

http://salt.readthedocs.org/en/latest/topics/releases/0.9.9.html

Salt 0.9.9 can be downloaded from the usual places:

github: https://github.com/downloads/saltstack/salt/salt-0.9.9.tar.gz

Pypi: http://pypi.python.org/packages/source/s/salt/salt-0.9.9.tar.gz

Enjoy the latest Salt and help us find any last bugs before 1.0.0 is pushed out the door!


April 28, 2012

Calipers and Science

Just for kicks I dug up the original Jackson/Pollock paper for skinfold measurements for determining body fat percentage. Turns out there's also a 7-point equation that also takes circumference of waist and forearm into account.

Here's a snapshot of the equations for men from the paper ("Generalized equations for predicting body density of men" by A.S. Jackson and M.L. Pollock, 1978. I couldn't find the PDF for the women paper online).
Generalized body density equations

Important notes: skinfolds are in millimeters, circumferences are in meters, and log is the natural log (ln in most computer languages). I plugged my values from two weeks back into a spreadsheet and got the following results:

JP Equation Density %BF
Sum of seven skinfolds
S, S^2, age 1.0518 20.62%
S, S^2, age,C 1.0476 22.51%
log S, age 1.0506 21.15%
log S, age, C 1.0482 22.25%
Sum of three skinfolds
S, S^2, age (5) 1.0607 16.69%
S, S^2, age,C (6) 1.0549 19.24%
log S, age (7) 1.0578 17.95%
log S, age, C (8) 1.0574 18.14%

The most interesting thing here is that there's a large difference between 7 and 3 site measurements, and the 3 site range is significantly larger. Also very interesting to note is that the one-site (suprailiac) AccuMeasure chart is, for me, in line with the 7-site measurement (22.1%). Given other measurements I've taken and just general guesswork based on what I see in the mirror, I think that is a decent estimate.

It's also curious that there are two sets of equations given, one using logs and one using squares.

Moral of the story: more data is better, sometimes not-enough more data is worse than a simpler estimate, and interesting things can be learned when you go to the original source. (This is just a quick note, but the paper is very interesting and reading it will be an interesting exercise that sets proper expectations for, and understanding of, the JP7 skinfold method).

April 27, 2012

Starting a High Tech Business: Being Startup Compliant
Visa Credit Card

I'm the founder and CTO of Kynetx. This series of articles relates my discoveries and feelings about starting a high-tech business. This is the thirty-second installment. You may find my efforts instructive. Or you may know a better way---if so, please let me know!

I run into people all the time whose lives are not startup compliant. They express a desire to start a company and have ideas, but they've made choices that limit their ability to live the startup life. They say "I'd like to start a company, but I need a salary."

Starting a company requires extreme flexibility in personal finances. You will likely work for long stretches with no or greatly reduced salary. Debt is the number one impediment to being flexible. Whether it's a high mortgage, student loans, child-support payments (not exactly debt, but still a future payment obligation), car loans, or credit cards, many people are not able to start a company because they've given away their freedom by going into debt in one way or another.

I estimate that since starting Kynetx five years ago, there has only been a 18 month period where I was at full salary. How do we get by? We have a very low debt load; we have no debt besides a mortgage. When you have a low debt load, slashing expenses is as simple as not buying stuff. There's all kinds of room in a typical family budget to do this. But when you have a high debt load, much of your income goes to servicing debt. As a consequence, there's just not as much room to slash expenses.

Young people are naturally more flexible in their finances because they don't have as many kids, a mortgage, and so on. I tell students that the best time to start a company is right when they graduate. They're already used to being poor, so it's not a big change. This is a point that I think is largely overlooked in discussions about how to unleash more innovation: student debt reduces the number of people who can afford to start a company. (Note: student loans are especially pernicious for other reasons.)

If you want to be an entrepreneur, you have to stay out of debt. If you're in debt, use the snowball method to get out of debt and save an emergency fund. Once that's done, you'll be free to enjoy starting a company because you'll have less worry about what happens if things don't go well. Debt is a form of enslavement that you just can't afford.

Tags: startup kynetx debt

Zombie Proccess- What They Are and How To Handle Them

First off, a zombie process isn’t really a process. At least it’s not executing anymore. A zombie process is more of a “state”, and that state is “defunct”. However, we typically refer to them as “zombie processes”, so I’ll stick with convention here. Second, a zombie process on a Unix system is a child process that has not been waited on by the parent. In a typical scenario, when a child process is finished executing its task, the chain of events will go something like this:

  1. Child process issues the signal SIGCHLD to the parent.
  2. Parent receives SIGCHLD, issues the “wait()” system call.
  3. Parent now receives the exit code of the child.
  4. Parent reaps the child from the process table.

So, when the child process has finished execution of its task, it will report the exit code to the parent. At this point, the child process will remain in the process table until it receives further instruction from the parent. This wait is the defunct, or zombie state. So, in reality, child processes are in this state all the time. It’s just that normally, the parent process acts on it immediately. When the parent does not respond, then we have the zombie state of that child process.

You can check if there are any zombie processes on your system with the following command:

$ ps -eo pid,ppid,user,args,stat --sort stat

Any state of “Z” is a zombie state. So, the question becomes, how do you clean out the zombie, if it is causing issues with your system? Well, you have 3 options:

  1. Physically wait around. Sometimes, the parent is busy, and just hasn’t acknowledged the child. When the parent is free, it could clean it up.
  2. Send the “SIGCHLD” signal to the parent process. The above command will give you that output in the “PPID” column.
  3. Fully kill the parent process. Any child processes will be orphaned, and picked up by INIT. INIT does frequent reaping of child processes and will reap any zombie states.
Printable Poster

You may have seen this poster around town. Now you too can spread the news. We’ve printed these on bigger paper and then trimmed down to letter size, or you can simply let your printer crop the picture for you.

Armed Protesters

The Associated Press today reported the news, shocking to all who haven't ever bothered to think about it, that protesters at the Republican and Democratic national conventions might be armed. Then the article starts to raise my ire.

Various authorities in Tampa, site of the Republican convention, get my wrath first. For some reason they figure they need to disarm protesters. They've created a list of items protesters aren't allowed to carry, to include lumber and squirt guns, and claim it's state law that is making them look silly because state law prevents them from banning handguns. How about this for an alternative take on it: you look silly for trying to ban anything at all. Do protesters get the pleasure of a pre-protest TSA-style grope? Who's to say they're not packing explosive shoes or underwear? If protesters can bring more than three ounces of "liquids or gels" to the protest, they will be able to wreak havoc, right? According to a/the city attorney, "everything we are doing is based on something that happened at another convention or another national security event," and the article describes a few such events where protesters and police started mixing it up. Oddly enough, though the AP's list of convention protests gone wrong is certainly not meant to be comprehensive, neither does it include any conventions where individuals' right to bear arms made the problem worse. Nor, in fact, do the weapons described from previous violent protests include things on Tampa's list of banned items.

Apparently Tampa city leaders have asked Florida's governor Scott to "issue an executive order" disallowing protesters from carrying, saying, "we believe it is necessary and prudent to take this reasonable step to prevent a potential tragedy. Such an executive order would be no different from any despotic decree or imperial dictum. That Tampa's public officials would care so little about personal freedom to suggest such an action, while hardly surprising, certainly should give pause to anyone who considers the Constitution something of importance. The article explains in less detail how city leaders in Tampa also plan to abridge freedoms of speech and peaceable assembly.

Just as a refresher, the Constitution is the thing that 1) is the supreme law of the land, and 2) says that the right of the people to keep and bear arms shall not be infringed. Both cities are quoted as saying they're unable to establish decrees of their own because of state law, but it's the Bill of Rights talking here. The fact that various federal and state statutes already violate the second amendment in no way justifies further infringing on a natural right. Nor would it justify pretending that state governors have the power to establish themselves as kings and potentates at their own whim.

The crux of the matter is that freedom is a messy thing. It's all about trusting your neighbor not to infringe on your rights, not about using the government to infringe on your neighbor's rights because you don't trust him to leave you alone, and don't trust yourself to be able to take care of things if he does step on your toes.

The Associated Press gets my goose next, in particular for failing to point out that the same protesters could also be armed in the grocery store, the movie theater, or the city park. I don't know about how Florida or North Carolina law has infringed on the right that "shall not be infringed" to know if it's the same there or not, but here in Utah they could be armed in the public school. With children. Please try to avoid wetting yourself at that revelation. I imagine it must be shocking that we don't have Columbine II every other day here. The AP further describes Tampa and Charlotte as feeling "hamstrung" by state law. Perhaps they do (though the article fails to show evidence to that effect, simply including quotes from authorities in each city saying they're subject to state law. Dear AP, realize, please, that these state laws are the ones that have been implemented by the people, through a democratic process. They're not supposed to be rescinded simply because people get their shorts in a knot when they realize freedom applies to their neighbors too.

Charlotte's leaders have been less vocal about gun control, but they took an early lead restricting inalienable rights when they kicked out an Occupy Something-or-other camp months ago and restricted backpacks and other stuff in various zones. Police will apparently get to stop protesters who have failed to obtain a permit for their Constitutionally-protected peaceable assembly and the city has passed an ordinance allowing leaders to declare particular events "extraordinary" and thereby salve their consciences as they further enslave their subjects.

Finally, none of those who call for gun control among protesters has indicated on what basis they conclude a person already willing to murder another for his political opinions would be deterred by an ordinance banning concealed carry. Nor can they explain why they're so concerned, except that it's a "politically charged" environment and bad things involving items they haven't banned have happened at past conventions. Dear city leaders, if you could stop assuming that your subjects (and yes, disarmed sheep over whom the governor and city leaders have the power these folks are pushing for are subjects, not citizens) are all out to get you, and if you can stop assuming that others are always responsible for your own safety and doing your thinking for you, you'll find you wet your pants over stuff like this far less often.