Wednesday, May 18, 2011

Socio-technical design realities

A hard problem is specified, in order to make it tractable it is broken down into software components. These components get reified into human and managerial structures in the organisation responsible for building the system.

Overtime these abstraction or managerial boundaries impede the development of new features and ideas. In a user focused organisation, communication between the teams tends to lead. New human connections are formed, followed or coupled with new technical connections. These lead to exciting new user-facing innovations.

More hierarchical, less user-focused, organisations tend to be incapable of supporting the new social dynamics, and so try to enforce their socio-technical boundaries, occasionally with security mechanisms. The only benefit they reap from this, is that rather than incrementally adapting, they are replaced wholesale by a different organisation that views the separation differently.

As such, the oscillation that is typical in large business between horizontal and vertical orientation is reflected directly in the oscillation in software structures as new ideas are negotiated around the abstraction boundaries (see e.g. the oscillation between objects and aspects, or mocking). The lack of such oscillation doesn’t imply that the design was somehow ‘right’ to start with, only that the organisation was too inflexible to adapt to the needs of its users.

The oscillation cycle is both healthy, and endemic across technical aspects of IT systems. See for example:

  • The separation between processes in an operating system, first hard, then IPC, then DMA, then Virtual Machine Monitors
  • The separation between the ‘compile’ and ‘execute’ phases of a language. Then dynamic class loading and dynamic languages. Then NEX bits.
  • The separation of browser from operating system. Then ActiveX. Then Windows Isolation Mode. Then Chrome. Then Native Client....

This isn’t a failure, it’s a design process. The question is, we oscillating enough? How can we oscillate faster?

Saturday, April 21, 2007

Broken Social Protocols: Chip and Pin


The UK's move to Chip and (s)Pin has been greeted with criticism from elements of the security community. However most of the usability criticisms have been focused on two problems:

1. Memorability of four digit codes

2. Usability of terminals, which generally is very poor.

However, I argue there is a much more insideous usability problem with Chip and Pin which comes from using a broken social protocol. To start with, let's considered what happened with Card and Signature, the old system.

Merchant -> Customer: Cost
Customer -> Merchant: (Card and Signature)
Merchant -> Customer: (Reciept and Copy) (includes Cost)

*Customer checks cost
Customer Signs reciept and copy

Customer -> Merchant: Reciept and Copy with Signature

*Merchant verifies signature

Merchant -> Customer: Card + Receipt

Items marked with a '*' are optional steps. One of the main problems with this protocol is that really security is managed by the user possessing the card What you have authentication, rather than the signature matching What you are authentication. So just about any signature would do.

Chip and Pin promises to replace this with What you know authentication, based on a four digit number. However, I assert that the social protocol used to handle Chip and Pin is broken.

The first problem we have with describing the Chip and Pin protocol is, there isn't one. Consider the most basic operation:

(I'm using Point of Sale Terminal to refer to the thing that the merchant uses, and Chip and Pin Terminal to be the thing the customer enters a pin number into)

--Phase 1--

Merchant -> Point of Sale Terminal: Cost
Point of Sale Terminal -> Chip and Pin Terminal: Cost

Customer verifies cost

--Phase 2--

Customer -> Chip and Pin Terminal: Card
Customer -> Chip and Pin Terminal: Pin

(Assuming correct, authorised etc.)

Delay

--Phase 3--

(Concurrent) Chip and Pin Terminal: 'Remove Card'
(Concurrent) Merchant -> Customer: Receipt
(Concurrent) Collect shopping

--End--

So this looks OK?

Well there are a litany of problems:

Phase 1 is not at all standardised. In some systems the merchant enters the data into the console and pre-accepts the amount so the user has no idea how much they're actually paying for until the receipt is produced, in others the user gets to approve it by pressing a key, others just type in the pin. This is very poor usability, though considering how poor the rest of the protocol is forcing the user to think may actually help security(!!)

Phase 2 is where most of the previous usability work has been done. Problems with shoulder surfing, exclusion of blind people, people with dispraxia etc. But it's been covered elsewhere, I'm going to ignore it. The delay at the end is important though (see later)

Phase 3 here things get really dodgy. Notice, that the three items occur in parallel.

And there is a delay between entering the pin, and being able to remove the card.

The delay matters, because the customer gets bored, and rather than only waiting for the prompt to remove the card, starts collecting their shopping. So the customer is faced with:
-> Collecting their shopping (completion of task)
-> A person handing them something
-> A machine with some unreadably low contrast text which changed from 'please wait' to 'remove card'. The environment is noisy, and visually distracting and the shopper may well be stressed.
There are intuitive reasons why this is bad, but there is also, at least one, fairly sound psychological problem.


POST-COMPLETION Errors

Consider another task, withdrawing cash from an ATM in the UK. The machine won't give you your cash until you take your card. Why? To stop you getting your cash (completing the task that you went to the ATM for), and then leaving without collecting your card as your mind has moved on to whatever you're wanting to do next.

Again, consider Card + Signature, the old scheme. Here, you don't get your card back until the merchant has looked at it. The process is sequential, you're not doing anything else, you've handed the merchant your card and are expecting it back in a second or so. Further, you've given it to a person, most polite people will pay more attention to a human than they will to a machine. Further, the merchant has a tangible reminder to give you your card back. Not only that, but you gave them your card, this imbues an unspoken responsibility to take care of it, including giving it back to you. All the cues point the right way.

But in Chip and Pin none of this happens. As mentioned above the user has at least three competing actions:

1. A person is giving them their receipt. (This takes precedence over machine interaction)

2. You're collecting your shopping. This completes the activity you were there to do in the first place. You probably do this whilst you're waiting as the delay between pin entry and remove card is long, several seconds

3. The (unreadable) text on the chip and pin terminal changes.


----

And now the shocking news. People leave their cards behind (Post Completion Error). Chip and Pin has been around for short while, maybe about a year and a bit. In the past month, I've now seen 3 people leave their cards behind. In 10 odd years I've been watching people use signatures, I never saw this happen.

Observations:

If a security system can't survive Murphy (Random chance), it will break horribly under Satan (malice). There are any number of things that a merchant who wanted to steal cards could do. E.g. they notice you're about to retrieve your card, so they hand you your shopping or receipt to interrupt the process. They then remove your card after you've left. They're already stolen your pin number by watching you type it in.

It's not surprising that this didn't emerge for a while, at first people were going through this process consciously, it's a kind of damning indication that it took so long; 12 months to learn how to do a task that you perform several times a day is not great.

----

So what can we do about this?

The Post Office seems to be one of the very few organisations who has put some thought into Chip and Pin. They're terminals are big, tolerably easy to read, though they could be better and have easy to cover number pad, and they beep at you when they want you to remove your card.

It's not a fix to what is fundamentally a protocol that doesn't encourage the correct process, but it's certainly a lot better. Kudos to the Post Office.

----

So Chip and Pin is not only very dubious from a technical security perspective, it's also fundamentally worse from a social interaction perspective. It's so bad, that failures happen by accident. I hate to think how many Cards and Pins a smartly dressed attractive shop assistant could steal.

Tuesday, March 13, 2007

Beware of RAW files


The cat is now well and truely out of the bag. On the 17th of December 2006, I reported over 15 suspected vulnerabilities to a range of software vendors reporting problems with handling malformed Camera RAW Files.

In the last two weeks we've seen Microsoft patch IView Media Pro where the version history (http://downloads.iview-multimedia.com/ivmp313vh.pdf) 'Fixed crash caused by importing corrupt DNG files.'.

Their internal analysis indicated that it was a reliability issue that did not require a security patch. All good. :-)

And today, Apple have issued a security patch to their operating system(s) (http://docs.info.apple.com/article.html?artnum=61798)
CVE-ID: CVE-2007-0733

I would advise users of Mac OS X or Mac OS X Server v10.4 -> 10.4.8 to schedule this patch for testing and rolling out.

Several other vendors are still working on patches, so clearly their problems will not be discussed, however by now the 'bad guys' have several information sources that there is an attack vector, so without discussing specifics that might harm the vendors still working on fixes, it's worth considering a few precautions we should all be taking over the handling of raw files.

**Digital Camera RAW files should be treated as if they were programs**

Therefore, you should not download unsolicited camera RAW files (either from the web, form peer to peer software or from email attachments). Even placing such a file in a folder may be enough to cause hostile code to execute.

Many of the potentially vulnerable mechanisms have no automatic patching mechanism, it is therefore important that you check to see if the maker of your software has released any updates. However in many cases they have not yet, so your only protection is being very careful with RAW files.

If you are an organisation that routinely handles 3rd party RAW files extra security measures are definitely in order. If you contact me I will be happy to discuss your requirements with you. Contact details are
here

A more detailed discussion on some aspects of the problem will follow later...

Monday, February 05, 2007

The Moral Hazard of Apple

Paraphrase:
"Seriously, I've got this virus, stay well back"
"Not really, I'm a Mac"

Apple's latest adverts are annoying me.

Personally, I dislike negative marketing, if Apple are as amazingly much better as they claim, and their ~5% market share shows, then please can they talk about what they're doing right, not what Microsoft are doing wrong, and that's leaving aside the very dubious factual accuracy of much of what they have to say...

But beyond this, there are a couple of more serious points.

1. Homogeneity is bad for security. Irish potato farmers and NATO (sorry, can't find the reference) both understand this. As I suspect do Microsoft. Their monopoly hurts the security of their product by providing a high value target and a rapid adoption platform for infectious diseases.

2. Apple are creating a Moral Hazard. By redistributing the perceived risk they are encouraging unsafe behaviour. This is fine for them when they have a tiny market share, and so no real security threats (though a very large list of security problems), but if they ever became a more substantial force, and have spent the past years 'educating' their users that they're 'safe' because they're using a white computer, then they will have a huge problem and just performed society generally a serious dis-service.

So Apple, be different, it's good for both of us, but don't claim that you're better because you're different.

Oracle. Unbreakable 50+ different ways last quarter. Need I say more?

Saturday, November 04, 2006

Is reliability harmful?

Over time I've become increasingly concerned about the 'acquisition' of non-critical technologies for life/mission-critical purposes. I view this trend, and the lack of rational thought that is applied as a serious trend in the use of technology within society.

The story is usually the same.

First comes the idea. It's usually a bright idea, often from a bunch of off the wall researchers. Implementations are concept demonstrators, developed for research flexibility and rapid implementations.

Second comes the commercial application. Often, this looks disturbingly like the concept demonstrator, if we're lucky it's a hardened implementation that has actually been tested.

Third comes the impact of network economics. The value of the product to the user = n^2 for the number of users, n. This results in rapid explosive growth. During this time the technology 'crosses the chasm' into a serious product. However, it has to maintain backwards compatibility, the golden handcuffs click into place. The technology's fate is all but sealed for many years. (e.g. the address space of IP)

Forth comes acceptance. The technology becomes integrated into the way the society that uses exists. We start relying on the technology, expecting it to be there, 24/7/365.

In phase 1, the system is so unreliable that it barely works outside the lab, there is no perceived threat. It's so difficult to get the darn thing working, that no-one really cares about large-scale reliability.

Phase 2, it's a nascent technology, it's a cool gadget, no-one would rely on it.

Phase 3, OK, so it's a commercial technology, it works almost everytime that you turn it on. If it doesn't people put in a lot of effort to make it work.

Phase 4, it's too late. Huge effort is poured into engineering reliability in the system, If we're unlucky people die because our reliance on the technology, but society cannot excise the technology even if it wanted. People can be told not to rely on it all you like, it will do no good.

So, without any serious consideration being given, we've gone from a research toy, to something that your life may depend on, if you're lucky it might just be something that will inconvenience you or loose you money if it's not there.

Consider some examples:

Mild: Digital photography

Do you rely on your digital cameras? How about your computer? How would you feel if it all went away, tomorrow. Are you sure you don't rely on digital photography? As far as I can tell, lots of people do.

More serious example:

When you dial 999 (UK) or 911 (US) do you expect anyone to answer the phone? Why? You're relying on a system with multiple single points of failure, but it's been around for a while, people understand that we need it, a serious amount of effort has gone into maintenance of uptime, and there are various hacks which attempt to give emergency traffic priority.

Don't delude yourself, the availability of the emergency services is not guaranteed by the technology, you are relying on a high-reliability statistic, and that is all.

There are multiple ways for people to Denial Of Service attack the emergency services, we can but thank our luck that none of them appear to have been seriously exploited, yet. This is evidence based (which obviously cannot be discussed) and not merely speculation based.

And this is a mature technology!

Mobile telephony was not intended for emergency sensitive use. It was not supposed to carry life-critical traffic, and yet it now does. If anything I suspect that people rely on mobile phones more than landlines in many places now.

People in Downing College complain loudly when the Internet is unavailable for 1.5 days in a month? Comments like "I can't do without it", "I feel lost", "I can't get on with my work". Estimates as to impact of actual failure of the Internet include economic collapse of event tangentially related companies like insurance companies within a week.

Not bad, for a technology described by a networking expert as "a toy that no-one was ever supposed to actually use".

Also Consider also that most of the Internet runs on PCs, running Windows and UNIX, the very things that you're concerned about the reliability of when they store your work/photos.......

The same applies all over the place. Even the reliability paranoid military have struggled to resist the appeal of consumer end products.

--

The same attitude applies to software.

I have worked on multiple projects that were not life-critical certified, but because they almost never failed, people grew to depend on them. In the bad cases you hear things like "It's OK, they'll fall back to paper if it fails"

Even in the more optimistic cases when they had backup systems, they aren't used for years.

Even if we assume the optimistic case, that failures are randomly distributed, can you remember how to do something that you haven't done for 5 years? Can you remember how manage styles in MS Word 2.0 without looking?

Can you remember how OLE embedding worked in Windows 3.0?

And this is the ideal situation. Realistically during failures you're likely to would be working under stress possibly in a hostile, noisy, distracting environment. Experiments show that humans get almost everything that isn't purely internalised wrong in such circumstances (E.g. Three Mile Island)

My point is this:

"This technology is not certified for life critical use" is an exercise in blame management, it is technically and sociologically vacuous.

And the implication for design?

Should we deliberately design our systems to periodically fail to make sure they're not being relied on? I gather Tescos does this, every year they run an unscheduled real disaster test by pulling certain power cables.

I think that we should at least think about it.....

Tuesday, September 05, 2006

I finally understand

It's 01:23 in the morning, I'm sitting waiting for my compiler to finish the latest test case as why my application randomly quits when I modify a CPU flag.

Until tonight, I didn't understand why people complained about the speed of C++ compilers.

I now get it.

Friday, August 18, 2006

Ten years of Cognitive Dimensions

The Special CDs issue of JVLC has been published. :-)

I have a paper co-authored in it with Thomas Green, Ann Blanford, Chris Roast and Steven Clarke.

Full text available here.

(www.sciencedirect.com/science/journal/1045926X)

In it, I argue for the seperation of cognitive and information structural concerns in the Cognitive Dimensions framework, and use the resultant 'refactored' dimension set to bring new clarity to some complex cognitive problems.

The work has directly influenced my current work on Cognitive modelling of Security that I'm currently writing up. I believe that it offers a powerful view, esp. for modelling security critical user interfaces.

Thanks to my co-authors and Alan Blackwell.