Skip to content

Paste & Cite

I was recently asked by somebody to speculate about generalizable application features that might help researchers in their work. I responded to them directly, but thought it might be worth repeating part of my response here.

Since the early 1990s I’ve wished that the OS (any OS) would support a “Paste & Cite” feature and, now that I’m involved with CrossRef and its linking and (nascent) plagiarism detection initiatives, I am even more convinced that such a feature would be immensely valuable to anybody who does research. The basic idea behind the feature would be that the clipboard would also copy “provenance” information whenever somebody chose to copy something. Then, when the user decided to paste the content someplace else, it would offer an optional “Past & Cite” menu item.

This is similar to Ray Ozzie’s concept of the Live Clipboard– but I think it is simpler and with a different emphasis. The goal here is not to copy structured data around- it is to keep track of where it came from in the first place. In the simplest case, “Paste & Cite” would just paste in a URI pointing to the origin of the content (e.g. a local file, a file on an SMB share or a web page). This alone would help immensely with those situations where one “loses track” of where quoted text, copied pictures, etc. came from. Apparently a large number of semi- plagiarism cases stem from authors inadvertently losing track of the provenance of material that they copy and paste (with the best intentions of citing the material). In more sophisticated scenarios, the system would be opportunistic and “Paste & Cite” might make use of Dublin Core + PRISM metadata imbedded in HTML or XMP in PDFs/ Images or ID3 in mp3s, etc. Again the idea would be to give people a simple (possibly even simplistic) way of keeping track of the provenance of something. And of course- if a DOI were present, the provenance information could make use of it in order to ensure that the URI doesn’t break.

I hate the number 255

I hated it in Pascal and I hate it now in This might even force me to stop using


Of course, it isn’t the number that I really hate- its the programmers who, rather than think of the realistic use cases for a column called “notes”, just settle for the default “biggish computer number” that pops into their head. You’d think they would have at least upgraded to 512 or 1024 by now.


Brain Subscription And Trust Circles

Jon Udell and Ross Mayfield have are talking about the use of social software and trust-circles as tools to find relevant and authoritative content on the web. Sounds familiar. I’ve long thought trust circles (amongst other trust metrics) are key to addressing the “Internet Trust Anti-Pattern“.

It may sound incredibly un-hip and reactionary, but to hell with the wisdom of crowds. Watching the crowd might be entertaining, but when I need to work, I can get far better results if I constrain that crowd to a few people whose opinions I have reason to respect. I’d use the word “authority” again, but the word is overloaded. Just as the open access community struggles with “free as in beer” and “free as in freedom”, the user-generated-content crowd struggles with “authority” as in “power” and “authority” as in “expertise.”

(Take deep breath. Wipe foam from my mouth. Stop goose-stepping)
Anyway, Jon concludes that he had been optimistic about the progress that would be made in exploiting trust circles inferred from social software tools. He says:

“Before we can search transitively across trust circles, we’ve got to be able to search within them”

I would just add that, before we can search within them, we need to be able to identify the channels of information being generated by each member of the trust circle. I’ve talked about this before, but I want to be able to go to somebody’s blog page or email signature, click on a button like this:


…and automatically subscribe to that person’s, Flickr, iCal, and blog feeds. “Brain subscription” seems like a perquisite to trust-circle nirvana.

The F-Word

Will implementing a good information architecture destroy your Alexa rating? Mike Davidson has done a brief analysis of MySpace which basically shows that “Page Views” could be the new “Line Count” in stupid metrics.

I’ve often wondered if part of the attraction of MySpace is the air of “authenticity” conveyed by the hideously amateurish interface(s)? And now I can wonder how many marketers will take Davidson’s observations and perversely conclude that the more unnecessary page views they can get people to go through, the better. Usability be damned.

I already fret about what conclusions web site designers are likely to draw from the research that show that users scan web pages in an “F” pattern. Will we see a new crop of web site designs that look like this?


And will we adjust our scan pattern to this?


Van Morrison, Crank and Google Scholar

In a Guardian article dated Saturday July 8 2006, Pico Iyer talks about how Google and other search engines have distorted the literary interview. He describes how interviewers prepare themselves by researching their subjects online and how search results tend to artificially highlight and emphasize interesting, but effectively trivial information about the interviewee. The author describes how he once, in some long-since forgotten interview, had mentioned Van Morrison as being an influence on his work and how almost every interviewer since has found this tidbit of information and incorporated it into their own interview. This repeated citation of the same fact has served only to exaggerate the actual importance of Van Morrison on the Author’s work. Of course, as these interviews also go online, the problem only gets worse. His Guardian article will make things worse. This blog entry will make things worse. Pico Iyer and Van Morrison are becoming forever intwined.

This is just one of many examples of the peculiar side-effects of Google’s page ranking algorithms. In Google Scholar (GS) researchers can find both of GS’s ranking algorithms frustrating. The first one, based largely on the number of citations an article receives (a more scholarly version of PageRank ) has the annoying habit of listing all of the articles that are the most well-known at the top of search results. While this might be a great default behavior for a casual user or a student, it is sometimes irritating to the specialist researcher who presumably already knows the most important articles in their field. GS’s alternative is to list the articles in reverse chronological order, which effectively strips out any pretense of “importance.” I’m sure Google will eventually fix these GS eccentricities and introduce a ranking based on “citation velocity” or some other metric that effectively mixes currency and influence. In the mean time Google and Google Scholar have become a sort of network effect meth-amphetamine.

As we get used to the peculiarities of the Internet, we sub-cognitively adjust our use of it accordingly. I remember in the late 1990s a colleague showed me some site that he had recently started to consult for statistics and data of some sort. I glanced at the site and, though it looked official enough, I almost immediately said to my colleague that I thought the site was bogus and that he’d better be deeply skeptical of its contents. Eventually he confirmed that the content on the site was utter bilge and he came to ask me how I had guessed that it would be. I looked at the site again and tried to figure out what tipped me off. As I said, the site itself looked official and my assessment certainly wasn’t based on the data (the nature of which I’ve since forgotten but that I certainly wasn’t qualified to assess), but something about it had made me uneasy. After a few puzzling minutes I realized what had made me suspicious- there was a tilde (~) in the URL. For those who never knew or have since forgotten, a tilde in a URL is a good indication that the URL in question is pointing to some individual’s private home directory on a *NIX based machine. The url “” might look like it is official content from “somewellknownorg”, when it is actually pointing to home directory of somebody named “Ted” who happens to have an account on the somewellknoworg machine. One doesn’t often see such URLs these days, but back in those days they were fairly common. Somehow I had managed to subconsciously learn that a “tilde” in the URL should make me pause and since that incident I’ve confirmed with some of my geekier friends that they too had developed this unarticulated heuristic for determining the relative “authority” of content. We probably all have other such URL-based heuristics. I doubt many people trust URLs that have ip addresses in them. And we each have a notion of the relative trustworthiness of domain name endings (.COM, .CO.UK, .EDU, .NET, .RL), though we may not be actively aware of it.

A conversation at a recent conference made me realize that I’ve started to develop heuristics for dealing with the distorting effects of search engines. A colleague casually mentioned that he no longer looks at the first few search results returned by Google. He found the first three or four results to be of generally lower quality than those a little lower down in the result set. As soon as he said this, I realized that I had been doing the same thing for the past year or so. I find myself “starting” to look at Google search results about one third of the way down the page, skipping the first several results. Like my colleague, I’ve found that the first results seem to have an oddly distorted relevance ranking. I suspect that this is a side-effect of PageRank. Items that are more “interesting” filter to the top and “interesting” is not quite the same as “accurate”, “thorough” or “authoritative”. This, of course, is what Pico Iyer has encountered as he has become inexorably linked to Van Morrison.

Early Social Bookmarking

I was recently pondering the characteristics of so-called “cult fiction” and was trying to remember how it was that I learned about certain cult authors back before this thing called the Internet existed. How did I learn about Vonnegut, Pynchon, Roth? As I dredged through my memories I realized that I most probably ran across these authors whilst using an early analog social bookmarking system- the library checkout card.


For those who have never seen one of these things, they were little index-cards inserted into a sleeve that was glued to the inside back cover of library books. When you checked out a book, you would sign your name on a line on the card and the librarian would stamp the due date next to your name on the card and then file it. This was how they kept track of who had which books out and when they were due. When the book was returned, the card would be reinserted in the sleeve and the book would be re-shelved.

The beauty of this system was that you could judge the popularity of a book by removing it from the shelf, flipping it open to the back inside cover and seeing how many times it had been signed out, when it was signed out and by whom. I now remember scouring though these things to see what might be worth reading. I remember looking for:

  1. Multiple cards. My school was very small and not very old, so most of the books had only one card with only a few names on it. When you found a book that had been checked out so many times that it required multiple cards, you were almost certainly onto a winner. The exception, of course, was when the book had clearly been assigned to a class. In those cases the presence of several cards was usually a false positive.
  2. The names of older students that I respected. Don’t ask me why they had to be older, they just did. Not only that, but they had to be older than me *at the time that they checked out the book*. In other words, when I was thirteen, I wasn’t interested in what sixteen-year-old “Ben” had read when he was twelve, though I might give consideration to what he had checked out and read when he was fourteen or fifteen. Insane, in retrospect, but most of my behavior at the time now seems insane.
  3. The names of students or teachers that I didn’t respect. The presence of such a name virtually killed the chance that I might read the book. I suspect that I missed reading Douglas Adams in high-school because the checkout card listed a name that I didn’t approve of. Again- barking mad, but true.
  4. Multiple sequential checkouts by a person. This was a sign that the book might be harder to read. Of course, to me that meant “better”. Sigh.
  5. Multiple non-sequential checkouts by the same respected person. I interpreted this to be a sign that the book might even be worth re-reading.

Anyway- I vividly remember finding “BREAKFAST OF CHAMPIONS” (Kurt Vonnegut) and being astounded that it had four or five check-out cards stuffed into the back sleeve. The cards were a who’s-who of the most interesting seniors. Clearly I needed to read it.


I was relatively late in learning of the term “backchannel”. It describes a phenomena that I have been fumbling to explain to people as being *one* of several reasons for them to use instant messaging (IM) as a regular tool in the office. Whereas the term backchannel seems to be most often used to describe how tech conference attendees use IRC, Wikis and blogs to carry on parallel conversations and commentary during conference sessions, I have observed the phenomena in the office, in meetings and conference calls. I just that, until late last year, didn’t know it had a name.

At my current job I have been busy evangelizing various collaborative technologies that I had found useful in the past, so one of the first things that I did was try to get everybody using instant messaging regularly. My colleagues are not geeks, but they are technologically savvy and more then willing to experiment with new tools. Being responsible business-headed sorts, they did ask me what practical use IM would be. They also wondered why one would use IM when one could use email instead.

I warned them that my answer would be idiosyncratic, but I listed the following reasons in descending order of importance:

  • Presence indication
  • Backchanneling. (Though, obviously, since at the time I didn’t know the term, I described it to them with some hand-wavy and not-to-convincing blather.)
  • Lightweight interrupt checking

Note that the list mentioned nothing of “chatting”, though in the new environment of my new job I think that I might end up adding a new item to the list:

  • Whispering

Presence Indication

I admit it- I was once an IM sceptic and I was constantly berated by my IM-using colleagues (hi Jessica, hi Andrew, hi Leigh!) for never being online. Worse than being an IM skeptic, I was actually an IM boor- I would only launch IM when I wanted to get hold of somebody who I knew used IM and who was otherwise engaged on the phone or something. Fortunately my colleagues finally called me on my IM boorishness and even convinced me to support getting a company Jabber server installed. Once we had everybody in the company using Jabber, I grew to appreciate the importance of IM as a presence indicator. My old employer had 100+ employees scattered across four offices in the US and UK. The technical group that I ran also had a large contingent of telecommuting employees, so the Jabber presence list became, for me at least, the most tangible daily reminder of community that existed in the company. A year after we launched our in-house IM server, I ran across Apophenia’s entry on IM presence and recognized the described cultural divide all too well. Her article is essential reading for IM skeptics and *particularly* for IM boors.

But my current environment is a little different. How much use is IM presence indication in a company of four people huddled together in a room the size of a garage? At first, my new colleagues grumbled a little at my IM evangelism, but they humored me and ran the software despite their understandable misgivings. After a few months of people traveling and working from home, I’m happy to say that IM presence has become an accepted substitute for actually being in the office. I knew that IM had really “made it” when one of my co-workers, who knew I was working from home on a particular day, called me up to berate me for not having launched my IM client.


In my old job, backchanneling, or holding parallel IM conversations during meetings and conference calls, became a critical tool for making meetings shorter and more productive. There are two areas where I found backchanneling to be particularly effective:

  • Wallflower inclusion
  • Tactical coordination

Wallflower Inclusion

Having grown up in Puerto Rico, there is little I like better than getting in a nice argument featuring much hand-waving and gesticulating. I actively seek out people who will engage in this kind of head-butting and, in my youth, I avoided anybody who couldn’t hold their own in such an encounter.

Over the years, I have been shocked to learn that there are actually mega-bright, quiet types out there (imagine!). I’ve also learned that these people don’t thrive in meetings , conference calls or other venues that favor the verbal. I was always chagrined when one of these people would send me email *after* a meeting with insights that could have easily changed the outcome of said meeting. I’m not sure how it started, but I eventually found myself using IM during meetings, to query the opinions of those who I knew were unlikely to speak up. Eventually, some of them started using IM as a kind of realtime text-to-speech gateway- channeling their opinions through the more verbal members of the group. I know this sounds bizarre and possibly dysfunctional, but it worked and it helped to make sure that even the shyest people were able to get their points across when it really mattered- not hours after a decision had already been made.

Tactical Coordination

Like the “wallflower inclusion”, I’m not sure how this began. I remember that occasionally, when on really long conference calls, some of the participants in the calls would start IM-ing each other commentary on the proceedings- usually snide remarks. Eventually this commentary turned into actual tactical coordination of the call: discussions of how to avoid known contentious issues, how to bring a particular thread to a close, when to take issues off-line and, most importantly, when people where starting to reach consensus. I am sure that our ability to carry on these parallel conversations allowed us to shorten conference call length and increase their utility. At worst, they at least allowed us to stay sane during interminable calls.

Lightweight Interrupt Checking

“Got a sec?” is probably the most common IM that I send and receive. The ability to quickly check to see if somebody is willing to engage in a conversation is wonderful. A quick IM is far less obtrusive than a phone call, or worse, wondering into somebody’s office or cubicle. But in order for this technique to succeded you must:

  • Not take offense when somebody replies “not just now”
  • Likewise, not feel obliged to reply “yes” to such queries
  • Only use it when the subject that you want to discuss really warrants a real-time conversation and can’t just be handled via an asynchronous method like email or voicemail


For the past ten years I have either had an office or a fairly isolated cubicle and I didn’t have to worry much about distracting people with my blather. Now I’m sharing a small office with three people. On the whole I enjoy working with people in an open-plan office, but there are times when one of us is in crunch-mode on some project, and it really helps if interoffice banter and queries can be kept as unobtrusive as possible. This is where I’m learning the third great corporate use of IM- whispering. I suspect that my office-mates think that I still don’t use it this way often enough.

Beginning, middle, end

In the early days of the web, buzzword coinage relied on prefixes. Add an “e” or an “i” to any word or phrase and you had yourself a brand new business to flog.


A few years later- the infix became the basis for the buzz-worthy. The numeral “2” became de rigueur.


And now it is the turn of the suffix and again it is numeric.

Web 2.0
Science 2.0
Reading 2.0
Sloth 2.0

User Behavior As A Music Rating Cue

The “My Rating” feature on iTunes has always felt a little clumsy. First of all, I hardly ever listen to music on iTunes itself- I listen to most of my music on my iPod. Secondly, I don’t want to have to *do* anything convoluted or extra in order to register that I like or dislike a song. I am surprised that Apple, given its user interface prowess, hasn’t managed to take better advantage of natural user behavior in order to more effectively drive the ratings system. In short:

  • If I play a song multiple times in a row, it probably means I like it. Increment the rating.
  • When I repeatedly turn up the volume on a song- it probably means I like it. Increment the rating.
  • When I repeatedly skip a song, I probably don’t like it. Decrement the rating.

I have taken to quoting Bradely Horowitz’s observation that “the act of consumption is itself an act of creation”. The iTunes/iPod UI team should be well positioned to exploit this phenomena.

The Internet Trust Anti-Pattern

I am afraid that the Wikipedia is a classic case of what I’ve come to term “the internet trust anti-patttern”. It goes something like this:

  1. A communication/collaboration system is started by self-selecting core group of high-trust technologists (or specialists of some sort).
  2. Said system is touted as authority-less, non-hierarchical, etc. But this is not true (see 1).
  3. The general population starts using the system.
  4. The system nearly breaks under the strain of untrustworthy users.
  5. Regulatory controls are instituted to restore order. Sometimes they are automated, sometimes not.
  6. If the regulatory controls work, the system survives and is again touted as authority-less, non-hierarchical, etc. But this is not true (see 5).
  7. If the regulatory controls don’t work, the system becomes marginalized or dies.

Think of Usenet, think of IRC, think of email, think of P2P networks- they’ve all gone through this cycle. Some have survived and other have effectively died.
I’ve been speaking (large PDF) and writing publicly about the issue of trust and the internet for two years now. This recent article on the Wikipedia in The Guardian sounds familiar. Tim Bray has already attacked it. “Sophmore philosophy” [sic] seems a bit rich given that the same can be said most pro-wikipedia philosophizing.

The Register also has a piece on Wikipedia. Not exactly balanced, but I doubt that this debate is never going to be.

I keep delaying posting this and keep running across more informed discontent- this time by Apophenia and Jason Scott.

Of course, the internet trust anti-pattern applies to more than the just the Wikipedia. Kuro5hin is having problems.

I’ve written about Outfoxed as a possible model for a generalizable way of dealing with the anti-pattern. The Higgins project looks promising too.

Anyway, I’d like to be wrong on Wikipedia.