Tools Across the Research Lifecycle
Reflections on tools and the process of research.
Introduction
<slide>
A couple of days ago George Loper, who runs a web site about politics in Charlottesville Virginia reprinted an essay Stéfan Sinclair and I had written comparing Barrack Obama’s speech on Race to a speech to the NAACP of his minister and former mentor Jeremiah Wright.
<slide>
The original essay, “Now, Analyze That” was posted on the Text Analysis Developers Alliance wiki and was the result of an experiment in what we are calling Extreme Text Analysis.
We are not experts on race or politics in America, though like most in Canada we are avid spectators. Our essay was instead the second round of a series of experiments with a number of goals, including,
1. It was an experiment in the rhetoric of text analysis – an experiment in how we could exhibit the analysis and tools within an essay without distracting from the essay and its conclusions. I am not going to talk much about this problem, but it is a very real one. Most of us who try to use computers in the analysis of text get caught either hiding the methods and tools or overburdening our essays with long methodological excursions that end up overwhelming the subject analyzed and conclusions about the text. This experiment was run to try different ways of exhibiting the methods and tools in an online essay that lets people recapitulate our results while still preserving the flow of the essay.
<link or slides about the integrated elements>
<loper slide>
The Loper reposting of our essay, alas did not reproduce the any of the interactivity of the original, which proves something, perhaps that he found the interactivity distracting or that he found it very hard to recreate the interactivity in another site, a problem that I will return to.
<slide>
2. Second, it was an experiment how tools, and in this case the tools we have been developing in the TAPoR project, integrate into the lifecyle of research. We wanted to try using the tools in a real, though abbreviated project going from conception to publishing something online that we could step back from an look at. This being extreme text analysis we gave ourselves two days locked in a lab. As it was we went for lunch on the second day, had a few glasses of wine, and ended up writing the essay the next week.
<slide>
Which brings me to the trajectory of this Institute lecture, which is not about our experiment, but about how tools fit in the lifecycle of research.
What I propose to do is to:
1. First, propose a model research cycle.
2. Then use that model to look at the types of tools used in research.
3. I then want to turn to a particular challenge with tools, which is the challenge of transition – how tools help or hinder us as we transform information over the lifecycle of research.
But first a word about some of the assumptions behind modeling research cycles. Some of my assumptions are that:
• There are different phases of activity to the research cycle.
• That different aides and tools are needed for different types of activity. That is useful to review the types of tools in terms of the activities they support.
• That tools need to not only aide a type of activity, but need to gracefully help researchers cycle to other activities.
• That none of the tools out there support all the phases of activity, nor should they.
A Model of the Humanities Research Lifecycle
We don't know much about how humanists do research, but we can hypothesize a model research cycle. The proposed model is loosely based on research about humanities research activities (in particular I have drawn on a report titled “Scholarly Work in the Humanities and the Evolving Information Environment”), but this model is primarily meant to frame an examination of activities, tools and transitions. The model simplifies things into four phases:
1. The first phase, *Wide Browsing* is the wandering, general reading and browsing that researchers do outside of particular projects. This includes browsing shelves in the bookstore, reading blogs and reading journals in their field. It is reading that is not purposeful in the sense that it is not aimed at a particular project. Generally researchers don't record such reading - they don't take notes or gather the readings for future consultation. This is the reading we do for "fun" and breadth.
2. The second phase, *Gathering and Note Taking* starts to happen when a researcher conceives of a *project* and begins to gather materials for the project. Projects begin differently, some may be provoked by some external invitation that calls for a shift from wide reading for pleasure to thinking about a project, while others may develop an interest slowly and shift subtly into project work. Once we frame a project we begin to gather things specifically for that project and to take notes about what we gather. Once a researcher has a project in mind then we have to start collecting the materials for thinking it through and for later reference.
3. The third phase I call *Writing and Thinking*. At a certain point in a project the researcher shifts to activities aimed at a sharable outcome which usually involves writing. It could be the writing of an abstract, and then conference paper. It could be writing of a paper for publication or preparation of a PowerPoint deck for presentation, but the point is that once there is an anticipated outcome (and deadline) then the activities of the research change to support a specific writing project. Often the gathering and note taking stops in the late night rush to get the paper done as the paper itself becomes the holder of the ideas. Often the specificity of the writing narrows what is gathered and noted - once you are writing you only need to read stuff that supports your paper. More importantly, for humanists writing is a thinking through. Obviously thinking happens throughout the process, but in the humanities the writing of something for others projects your thinking and forces you to reflect on what you think you want to say. This is when one is forced to think through and analyze all the gathered evidence. In other disciplines the writing is more reporting, in the humanities the writing is the research reflection.
4. Finally *Sharing and Publishing* is what happens at the end once the writing is more or less finished and we concentrate on sharing the ideas in different ways including publishing it. To some extent sharing happens at all phases - humanists may read and write alone, but they share ideas broadly by e-mail, at symposia, and at conferences. Publication is, however, a more formal sharing where the original research is submitted to some sort of scrutiny (including peer review), you end up editing it as you get comments, checking page proofs, or you have to rewrite it as it is rejected. This last phase is often the most time consuming and often not considered research at all, but the drudgery of publishing in order to have your thoughts entered into a more permanent record.
I call this a cycle because after each sharing another research project might emerge and it is from reading what other people share than new cycles start. Once published the research may turn again to broad reading or another researcher may be inspired to respond to the publication with their own project creating a cycle of conversation that characterizes our research culture.
This model has all sorts of problems that I want to mention, because this is just a model, it is not THE WAY THINGS ARE.
• First of all this model leaves out the phase I started this lecture with – the phase which is out of your control when people start reposting your essays, or as Jerry McGann talks about it, the inevitable deformations that take place once you let go. Where is George Lopers transformation of our essay dropping the exhibited technology which for us was the point?
• Second, and obviously, research activities vary significantly across the humanities and this model privileges a very conservative or enlightenment view of research as the work of a lone theory fellow. To give just one example that doesn’t fit, take archaeology where excavation is a research activity.
• This model also doesn’t really accommodate a lot of what you all have been doing over this week. Where would text editing and encoding fit? Where would multimedia development fit?
• This model presents research as a tidy circle of phases that neatly follow each, when if fact it is more of a hair-ball with messy ends sticking out all over. I sometimes think real research is more like Steven Jay Gould’s punctuated equilibrium where there are bursts of new projects and then die-offs of unfinished projects never really abandoned.
But, in my defense, this model is just that, a model. Of course, real research is much messier without such clear phases, but this is meant to be something like the “scientific method” in its ideal presentation of what happens. It is not what really happens, it is an ideal progress of phases that we can use, and I will in a moment.
No doubt you can think of other problems with this model, but the usefulness for me of this as model (as in a little clay and sticks sculptors model of what could be) is how it can be used to look again at research aides and tools.
Phased Tools
Having apologized enough, I hope, for the simplicity of the model I want to use it to review the types of tools that are used in different phases:
<Slide – Table of Phases and Aides>
• | Wide Browsing | You would think search engines are useful when reading broadly, but it more likely to be edited sources like academic blogs, book reviews, digests, discussion lists and essays that provoke broad reading. Recommendation engines can be useful as they are a form of crowd editing. The footnote and link are also powerful tools. When reading a blog, if it links to something that sounds interesting and I have the time I'll browse over. |
• | Gathering and Note Taking | This is where search engines like Google and indexes like the MLA Index are useful for finding things. There are a number of bibliographic management tools like !EndNote that can be used to keep references and notes. One of the most useful is Zotero which works well with the web. For that matter notes can be kept in tools like Google Notebook or just a plain text editor. What is missing is tools for gathering full texts from different sources to create study collections, though Synergies promises something along these lines. Zotero comes the closest, it lets you take a snapshop of a web page, but too many online collections are closed like Google Books. They let you read, but not gather. <a href="http://portal.tapor.ca" title="http://portal.tapor.ca">http://portal.tapor.ca</a>][TAPoR lets you gather texts into a myTexts This is an area of the TAPoR in which you collect your private texts for analysis. It is also a portal to access publicly available texts which have been added by other users. In this area you can view the catalogue of texts available to you or add, edit, tag, and view the contents of specific texts.
For more information see the TAPoR Tutorial.
Return to Glossary. library that you can analyze and it has a research log which is a blog like environment for storing results and notes. |
• | Writing and Thinking | Word processors are probably the most used tool for writing, and for that matter, note taking. What word processors don't really support is thinking through. They don't connect well with notes or gatherings of study texts. This is where text analysis environments like <a href="http://portal.tapor.ca" title="http://portal.tapor.ca">http://portal.tapor.ca</a>][TAPoR try to provide a thinking aide akin to the concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word.
Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends.
See the Wikipedia entry on Concordance (Publishing)
Return to Glossary., but fall short. |
• | Sharing and Publishing | Tools that let you share your thoughts online are helpful in the last stage starting with basic web servers, but including wikis and online journal systems like the <a href="http://pkp.sfu.ca/?q=ojs" title="http://pkp.sfu.ca/?q=ojs">http://pkp.sfu.ca/?q=ojs</a>][Open Journal System. Encoding guidelines like the TEI are also important here along with the broad community of people developing systems of publishing XML. The TEI doesn't do the publishing for you, but it guides more scholarly publishing. |
Other Uses of the Model
This rapid survey of some of the tools we use in different phases suggests to me a number of points that I think are worth making about the research lifecycle and technology.
The Problem with Search
First of all, it is worth pointing out that Google is nowhere near supporting research. The way the question of research is framed in public sets it up as a problem of access to and finding of knowledge as if all we did was search for things and all we need is access to ever more and more to search across faster. It worth repeating what we all know:
More data is not more information
More information is not more knowledge
Access to information does not make us wiser
I would love to take some time to think about how searching excess became confused with research, but I am going to limit myself to pointing that we do not start out searching - that comes later somewhere in the gathering and writing. We start by foraging and browsing widely and we do that in all sorts of traditional ways that depend on trusted sources of information. I believe that when foraging we are more likely to follow the trails through highly edited (or recommended) materials. There is too much to read so we browse where there is quality and someone we trust leads us through surprises.
In other words, Google will only get you so far. Google Notebook lets you gather links and notes. Google search, including Google Scholar and Google Books, let you find works and note them. Google Docs lets you write documents and share them. All of these tools are brilliantly simple to use and stable. They aren't, however, really up to scholarly work. They don't gather bibliographic references as Zotero tries to do. They don't let you gather collections of texts, just collections of passages. They don't let your write academic papers and they don’t help you publish.
As an aside, we are at a moment where our relationship with Google is fraught. On the one hand Google has legitimized what we do by doing it on a previously unimaginable scale, and by showing the commercial potential to text concording. On the other Google threatens to dismiss us to the dustbin of what Julia has called craft projects and in doing so they could lower standards, and crowd out scholarly projects. They don’t want to do that, but they do want to make money and we could end up marginalized.
There have always been great tools
A second specific point this model me see is that it reminds us of all the excellent research aides we already have and use. Another way to put this, is that it helps remind us of all the important tools we already have that should not be sacrificed in to the Goddess of Global Search.
Now, Socrates would argue that writing itself is a technology - a memory aide (and a problematic one at that.) But, four types of aides we have are worth recognizing because they have shaped the academy:
• *Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word.
Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends.
See the Wikipedia entry on Concordance (Publishing)
Return to Glossary.* The concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word.
Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends.
See the Wikipedia entry on Concordance (Publishing)
Return to Glossary. has been a way of rereading and thinking through an issue across a text or texts.
• *Commentary* The commentary, whether written into the margin of a manuscript, or separately connected to a work, has been a useful way to record and share the best annotations about a work. The commentary has long been a way of provoking the thinking through of a primary text and recording a tradition of interpretation. And it is these traditions of interpretation that make schools and academies.
• *Journal* The journal of essays has long been a way of sharing ideas in a community and linking people at a distance. Diderot and Grimm’s Correspondence Litteraire created a network of intellectuals across Europe long before the web was a figment of our ethereal imagination.
• And lastly the Library is a brilliantly evolved space for gathering and thinking. As much as I like online resources, I love to go to the shelves and browse materials gathered by others. Once there it is fast, high resolution, kinesthetic, and multimodal.
There was a time when the hyperventilated rhetoric of magazines like Wired announced the “inevitable” end of all these traditional aides, but now we know better. The new tools don’t replace they simulate and augment. We don’t have to build tools to replace good stuff, just as, to paraphrase Alan Kay, we don’t want to program computers to do what we like to do. Instead the challenge is to create aides that mind the gaps.
Transitional Tools
But I have not come to lecture you on the excellence of our existing aides. I want to return to computer aides and now ask about how tools can help us make the transition from one phased activity to another. This is not my question. John Bradley has been asking this question in a different ways for a while and he has been developing a tool called Pliny that is designed to assist in the shift from annotation to writing. Here is his conclusion from Douglas Englebart’s strange fiction about a user in his report on the NLS, Augmenting Human Intellect ,
So, what are the “little things” that scholars do when doing their research, and can any of them be usefully assisted by the computer? One aspect of research would be notetaking, and the gradual transformation of materials originally recorded as notes into a new research idea, and then into a scholarly article that describes it. (Bradley, Text Technology, p. 14, 2005, no. 2)
Bradley noticed in Englebart, in the (very sparse) literature about what we do, and in watching colleagues, that a lot of time was spent retyping quotes and notes into a word processor for transformation into writings. The word processor, never a particularly good note taking tool was being used because that is what our colleagues know, and perhaps more importantly, that is what they use to create what they publish – finished papers with all the quotations in the right place, all the notes worked in, and all the references properly cited. It helps that word processors also have search tools, outlining tools and good interfaces to bibliographic tools like EndNote. John feels that we can do a better job at helping researchers make the transition from gathering, annotating and note taking to the transformation of writing. In Pliny he is trying to develop an aide that helps with the transformation of information that occurs as one shifts from one research phase to another.
That is the challenge before us now. We have reasonably good tools for the different tasks, but how can they help us make the transitions and transformations through the entire research lifecycle?
TAPoR and Experiments in Text Analysis
I want to now return to the Extreme Text Analysis experiment I started with and talk about TAPoR, or the Text Analysis Portal for Research. The TAPoR portal is meant to support the analytical phase of research and to that end Stéfan Sinclair and I set out to test it in a series of humbling experiments which we call Extreme Text Analysis.
The “extreme” in the title is a reference to a type of agile programming that has recently become popular called Extreme Programming but it has come to stand for the extreme humiliation one feels when trying to get tools that were designed for one purpose to help in a cycle of research and you realize how hard it is to get results out of a pretty display into an essay.
Our experiments, which are documented online, constantly struggled with our own tools. We spent a lot of our time posting bug reports, hacking things, and generally wishing we had lots more programming support to keep on streamlining the tools.
Normally, when asked to talk about TAPoR I like to sing its praises, but today I want to eat some humble pie as way of showing you what could be.
Conclusions
*To Come*
------
---++ Bibliography
<b>W. S. Brockman, L. Neumann, C. L. Palmer and T. J. Tidline</b>. <i>Scholarly Work in the Humanities and the Evolving Information Environment</i> Digital Library Federation and Council on Library and Information Resources, 2001. <a href="http://www.clir.org/pubs/reports/pub104/contents.html" title="http://www.clir.org/pubs/reports/pub104/contents.html">http://www.clir.org/pubs/reports/pub104/contents.html</a>
<b>J. Bradley</b>. <i>Thinking Differently About Thinking: Pliny and Scholarship in the Humanities</i> <i>Digital Humanities 2007</i>. University of Illinois, Urbana-Champaign: 2007. <a href="http://www.digitalhumanities.org/dh2007/abstracts/xhtml.xq?id=124" title="http://www.digitalhumanities.org/dh2007/abstracts/xhtml.xq?id=124">http://www.digitalhumanities.org/dh2007/abstracts/xhtml.xq?id=124</a>
-- Main.GeoffreyRockwell - 25 May 2008