Living without mice
Wednesday, March 31, 2010 10:34:34 AM
(This is effectively a write up of the last section of a presentation I gave at CSUN recently. And it is a request for thoughts and comments so I can solidify it and push any good bits to proper standardisation).
the problem, idea one, idea two
The Problem
Once upon a time, Web applications were pretty simple. You had forms, and they sent information. Or you had javascript, and it didn't do anything very important.
Then came interaction events, where you could make things happen when the mouse hovered something, or a user clicked on it. And the DOM, which made it much easier to do interesting things when the user did their mouse magic. And later still came SVG, and XMLHttpRequest, and muli-touch screens and various other cooler and cooler technologies for making the Web into a really interesting platform.
But way back then, when we had a handful of mouse events, people realised there could be a problem. We got a few keyboard events like onkeypress and onkeydown. And this would solve the problems...
Of course, it was a bit trickier. Because while people knew something about common mouse interactions (there were differences - web patterns typically used hover and click, where others used click or double-click) keyboard interactions were pretty much a mess. There have been well-established but different ways of moving around since before there were graphic interfaces of any quality. But they relied on wildly different keyboard mappings. (One reason for this is that mice were pretty much all the same, but keyboards come in many many varieties).
So keyboard interaction was (and today still is) based on waiting for a particular key to be pressed. And hoping that the user has that key. And trying to work out whether it is the key that has a particular label on it, or a particular key in a (guessed-at) layout of the keyboard. And assuming that doing this won't interfere with the user interface, so that trying to open a file or quit an application does something unexpected instead.
In short, it's a mess. It doesn't work well, it didn't work well, and as devices get more and more diverse (what are the keys on a touch-screen, or a Wii remote control, or a 20-key qwerty keypad?), it seems likely to work less and less well. And yet people continue to provide the advice that making things keyboard accessible is a matter of writing increasingly complex javascript and pretending things work reasonably well.
So what's the alternative?
And there is the rub. I don't know. I don't think anyone else does, either. Here are a two ideas that might work, although they need a lot more fleshing out. But having thought that for a few years, I am hoping others might be able to help with the fleshing process.
1: Intent-based events
A decade ago I was involved in a long series of discussions which tried to replace the action-specific event triggers (hovering, clicking, pressing a key) with a model based on abstract interactions such as selecting, moving, activating, and so on. The idea was simple enough - if you activate something, the author doesn't care if you did it with a mouse or by a spoken command, they just know it was activated and react. Similarly you could select something ready to activate, whether by hovering over it with a mouse cursor, or moving to it with a 4-way joystick (then rare, now very very common on phones in particular). Some of these things got taken up - but as separate events, that were not implemented in browsers and therefore not useful in practice.
A development of this idea was "namespaced events" - essentially the ability to define new events. The trick here was to work out what would trigger them (and the simplest thinking was to fall straight back into trapping low-level actions as if the author could somehow predict what set of actions were available to the user). Opera even implemented this proposal - and it wasn't all that hard, at least to do what we did.
At the same time, it became clear that the click event (the most common event used) had to respond to other activation. It's still not clear exactly how this works (and what is the interaction of a complex set of things when you use a key to trigger the click), but at least it is possible to do simple things and no longer require a physical mouse (or something horrible like mousekeys) to make them work. So we have, today, something like an activation event (called click, and fairly messy around the edges). But we still have things like onmousemove and the CSS :hover property that are pretty-much mouse-only, and no apparent push to fix them.
The question now is can they be fixed? It would take implementation in probably half a dozen browsers to push any change, and then it requires teaching authors to change - and there are zillions of those. With agreement, goodwill, and an understanding that we are currently dividing applications into different platforms (except the enormous ones that try to cover everything), I think it is possible. I believe we could change the Web in under a decade. But it will take a lot of work to get there.
"accesskey" mode?
(I use that name because I don't have a good one. It relies on reinterpreting the accesskey attribute, more like HTML5 now does it than like HTML4 did it, because although the markup required by HTML4 was OK, the specification was stupid - and yet people followed it and made stupid implementations, which gave the whole thing including the useful bits a bad reputation it hasn't yet recovered).
The idea starts by fixing accesskey itself. Instead of being primarily a request to listen for a particular key (in this respect SVG's version is even worse than HTML 4's), it is primarily a statement that some control is important enough that you should be able to directly activate it (rather than moving through the page to it). There are lots of ways to do this - you can use the letter to associate a key with it, as iCab used to do. You can use the letter, or a word, to associate a voice command. You can associate it to a mouse gesture, or any other input mechanism available to the user. This should be determined by the user agent, although information supplied by the author is helpful (memorable letters/words, a role or rel attribute, etc).
For simple page interaction, or common applications, this is pretty much enough. And we implemented something that handles it (more or less) in Opera, a few years ago. Entering accesskey mode pops up a menu of the accesskeys defined - and you can either select the relevant letter, or navigate the menu with the mouse.
(You should be able to navigate the menu with the keyboard, it should handle the case where the attribute is more than a single letter, it should do a better job of explaining what happens, etc, but those things are in development still. It also makes little sense that shift-esc is the way to make this happen - I reassign that to the "." key, but there are other alternatives. Finally, we need to make this work on Opera Mobile and Opera Mini too. Which is all a matter of getting those bits of work done).
But this still leaves out one important use case. A pop-up menu is fine when you're interacting with a complex spreadsheet application, or an internal business system. But it really isn't the way to go for games, air traffic control systems, and other cases where you want continuous interaction. In that case, something that tells the system to "keep the accesskey mode on", that is to listen for the accelerated controls until told not to is probably required. The WAI-ARIA role application is sort-of used to do that (I think the current explanation of it is too simplistic, but it points to something like what I describe), and it doesn't seem impossible (or even especially difficult).
The "until told not to" is important. The problem with plugins that provide their own user interface is their inability to return the user to what they were doing before they dealt with the plugin. But again, this is a problem that can be (and occasionally has been) solved.
So where now? I don't know. Thoughts welcome.


inkel # Wednesday, March 31, 2010 11:34:15 AM
The only idea I have at the moment is to analyze some web application (perhaps a simple one, say Twitter) and try to come up with a solution or proposal of what should be the interaction of such application with the keyboard.
With that, then I would start to think in a solution, at least from a developer perspective.
Roland Frankasfera # Wednesday, March 31, 2010 12:35:24 PM
where single key presses allow the intended access. On text it is natural to let the text itself be the "access key" and use autocompletion to support the selection process - as long as completions are definite and do not break the flow of user interactions.
Already a directional-pad or the arrow keys should suffice to perform selections.
Following demo gives easy access to about 20.000 wiki page titles.
http://www.taipu.de/ida.htm is a prototypic implementation of the ideas mentioned.
Ben Buchanancheshrkat # Wednesday, March 31, 2010 12:47:33 PM
click, hit enter = "activate"
esc, click outside target element = "stop" (or exit)
hover/mouseover/keyboard focus = unified "focus" or "attention"
mouseout/blur = "blur"
Perhaps they could be replaced with theoretically grouped attention/activation events:
attention-start
(element has left inactive/non-focussed state and now has some level of attention)
attention-active
(focus - has user attention and is active)
attention-inactive
(hover - it has user attention but is not activated)
attention-end
(element leaves attention states and returns to inactive state)
Then activation:
activate-start
(the user has triggered the device's "go!")
activate-held
(click and hold, unreleased event, activation event is essentially continuing)
activate-modify
(activated, not released, then some further input received; capture initial status of element and track input for changes)
activate-cancel
(user has initiated an event, then performed an action that cancels it - eg. click, then hold, then drag outside element and release outside the event activation area; tap fingertip, then swipe off mobile screen; whatever the device specifies)
activate-end
(user has finished initiating the event)
activate-wait
(the state between activate-end and response/action/results)
Both sets could then be shortcut:
attention (do this while element has any form of attention)
activate (immediately go start->end->wait)
Of course it's easy to think them up, harder to get them implemented.
The other option would be to get browser vendors to agree on a set of crossover mappings. It could work based on combinations of events specified by authors:
1) mouse event defined but no keyboard event defined - also respond to logical keyboard event. eg. click/hitting enter
2) both events defined - follow author instructions
People who'd intentionally ignored keyboard input would have to explicitly set keyboard events to return false; but I think that's probably the smallest potentially affected group. Anecdotally most people who've done mouse events but left out keyboard events simply didn't know, remember or care about keyboard events.
...not sure if any of that was helpful. I've no doubt many people have come up with the same ideas before
I suspect you're right, it'll take years; but better to have a good abstracted event model coming down the pipeline than rest on the current mess. Maybe focus on getting libraries like jQuery to change to the new scheme to give things a kick start. Cross-device issues might get better coverage with stuff like tablets and touch interfaces getting more popular.
Denis Boudreaudboudreau # Friday, April 2, 2010 1:46:32 PM
Apple's phone has opened a new world of possibilities and shaken a metaphor that has been with us since the 80s, but it's only the beginning.
I, too, believe the mouse will be a relic of the past soon enough. Pointers probably will be there for a long while still, but touch-screens will soon invade the desktop world.
I believe your idea of accesskey mode is brilliant and I've been in favor of it for HTML5 ever since John Foliot explained it to me back in december. It shows amazing potential and feeds on the idea of screen readers list extractions.
However, I do not understand why it was implemented the way it's been in Opera. Instead of prompting the assigned accesskey and URL, it seems to me it should have taken into account the page title element.
For one, URLs aren't always that descriptive, so they may not be significant enough for users to actually understand where these keys will lead.
Agreed, the value of the <title> element is not always relevant (as shows Google results for searches such as "untitled document" --> 28,000,000 this very morning) but when it is used correctly, it becomes quite useful indeed.
Therefore, something like this would seem more appropriate to me :
(key) - Page Title - URL
But then again, what do I know... ;p
Silvia Pfeiffersilviapfeiffer # Saturday, April 3, 2010 1:10:44 AM
Maybe "accesskey mode" can also use ESC (or something else) to put you into "keyboard-only control mode". Or there could even be a browser setting that puts you into that mode forever. Then it would need to be required by browsers and by Web apps to provide support for "keyboard-only control mode".
If browsers have a setting, it would even be relatively simple to test. One could almost automatically extract a list of all the interactions possible on a page and then test whether they could all be executed in "keyboard-only control mode". Such a validator could be provided by the W3C and then approve Web pages. Not easy, but also not impossible.
Leif Halvard Sillisyngespil # Sunday, April 4, 2010 7:01:02 PM
It allows me to browse the Web like a VIM ;-)
@Charles: Have you looked at Vimperator?
Leif
Charles McCathieNevilechaals # Monday, April 5, 2010 8:18:55 PM
@Ben, you could be channeling the thinking from a decade ago about how to do it. And yeah, I think it would be worth looking into it more deeply to see if we could get it right. But it has already died one death, as far as I can tell - the extensibility part on the "namespaces are bad, M'kay?" hill and the original events simply by being separated from "the real deal" in some accessibility ghetto. So it would take some clear thinking to figure out how to make it. The point about libraries is well taken though - at that time there was really just a handful of functions prefixes MM_ that were treated as a library.
@Denis, the fact that Opera gives access to a URI rather than link content or similar is a filed bug. You're right - it isn't helpful. Currently our accesskey implementation is not very good - but sadly that makes it much better than the competition ;(
@Silvia, yeah I am a vi user and that has influenced my thinking - but I am also aware that many people in the world hate vi for one reason or another. So I'm wary about just assuming it is a solution. There's also the example of
@Leif, not really, since I don't really use Firefox. But I'll play with it and see what I learn...
Right now it seems I am travelling too much, and want to get some more focus on this stuff... because there are also other things that need to be fixed on the Web. And hopefully I will at least get some time to write up more of them.
tin1tun # Tuesday, April 6, 2010 10:54:48 AM
One of the events that I miss sometimes is "environment-change". This would trigger things like:
- Enable/disable "high contrast" in the OS
- Inverting/changing colours/zoom with a magnification tool
- Enable/disable voice input/output
- Resize/change orientation of browser window
- Enable/disable images/style sheets/plugins
- Activating user style sheets
We could use these events to dinamically adapt the interface depending on user's needs, that sometimes are not static.