Workshops

Others:

CWRCshop2 (Ryerson): Using Voyant for Analyzing Texts

This is a script for a workshop on using Voyant for the CWRC community. It is available at http://hermeneuti.ca/workshops/cwrc2

1.0 Introduction

2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Jane Austen's Persuasion.

Cirrus (Austen's Persuasion): http://voyeurtools.org/tool/Cirrus/?corpus=JaneAusten&docIndex=5&stopList=stop.en.taporware.txt&toolFlow=simple (backup)

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. More Info on the TAPoR WordCloud Tool A good discussion on Tag Clouds Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting?
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Here are some more Cirrus visualizations to consider:

These types of word clouds are prevalent from academia to advertising – they quickly provide an intriguing representation of a text, as demonstrated by this example of studying gendered languages in toy advertising. But they're ability to rapidly convey a picture with words comes at the cost of information reduction, and some are highly critical of word clouds as hermeneutical tools. What do you think?

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In Context (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.) of the word in that text.

3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:

http://voyeurtools.org/?corpus=JaneAusten&stopList=stop.en.taporware.txt (backup)

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. More Info on the TAPoR WordCloud Tool A good discussion on Tag Clouds Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URL for the tool:

Voyant: http://voyeurtools.org

Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/

Backup version: http://beta.voyant-tools.org/

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain text, HTML, or XML texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.

Note that you can create a persistent URL for your corpus – that way your link can be shared or bookmarked and you won't need to reload the texts into Voyant. Click the save icon in the blue bar at the top and the first URL will be the link for your Voyant corpus.

5.0 Other Stuff

CWRCshop: Using Voyant for Analyzing Texts

This is a script for a workshop on using Voyant for the CWRC community. It is available at http://hermeneuti.ca/node/211

1.0 Introduction

  • The workshop leaders will introduce themselves:
    • Geoffrey Rockwell, University of Alberta, geoffrey (dot) rockwell (at) ualberta (dot) ca, http://www.geoffreyrockwell.com
    • Susan Brown, University of Alberta, University of Guelph, sbrown (at) uoguelph (dot) ca
  • Overview
    Voyant is currently a beta release by Stéfan Sinclair and Geoffrey Rockwell. It was previously called "Voyeur" so do not be confused if that name is used. Voyant is the next generation in a series of text analysis tools that include HyperPo and TAPoRware. It provides tables and graphs related to word use across a single document or a collection. Voyant adds, among other things, the ability to handle much larger files than the previous tools could.
  • Outline
    In this workshop we will:
    • First, look at how to use a single Voyant tool, Cirrus, with a small corpus of Austen texts. 
    • Then learn how to use the normal skin of Voyant with a single text.
    • Finally, show how to load your own text into Voyant.
  • Now make sure you can connect to the wireless.
  • Help
    If you need help, connect to Hermeneuti.ca and explore the resources there. Here are some useful links:

2.0 Using a single Voyant Tool: Cirrus

Voyant Tools has a number of different tools that can be composed into skins or used individually. We will start with just one tool called Cirrus that can then spawn other tools. We will try it with Mary Shelley's Frankenstein. Click on this link to open.

Cirrus (Frankenstein): http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1317355585427.2492&stopList=stop.en.taporware.txt

For a backup go here: http://voyeur.hermeneuti.ca/tool/Cirrus/ and enter text http://www.gutenberg.org/cache/epub/84/pg84.txt

The Cirrus tool shows you a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. More Info on the TAPoR WordCloud Tool A good discussion on Tag Clouds Return to Glossary. of high frequency words. Some questions to ask yourself:

  • What words did you expect? What words are missing? What words are interesting.
  • How does the tool arrange words and choose colours? Is there any correspondence between size and frequency?

Try It: Try clicking on a word. It will launch a second tab or window with a list of the texts in the corpus with the frequency of the word you clicked on.

Try It: Now try double-clicking on one of the texts. This should launch another tab or window with a Key Word In Context (KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary.) of the word in that text.

3.0 Using a Reading Skin

Voyant Tools can also be composed into "skins" that combine tools as panels so that they can be used interactively. Here is the same Austen corpus in a simple skin:

Frankenstein: http://dev.voyeurtools.org:8080/?corpus=1317355585427.2492&skin=simple&event=corpusTypeSelected

In this skin clicking in one window will often (but not always) update other windows. Try the following:

  • Triggering: Click on words in the Cirrus word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. More Info on the TAPoR WordCloud Tool A good discussion on Tag Clouds Return to Glossary.. Then click on a text in the Word Trends and play with the KWICA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary..
  • Changing Settings: Try changing the settings for the Cirrus by clicking on the small gear icon. Try playing with the Word Trends
  • Showing and Hiding Panels: Try showing and hiding panels using the small up and down arrows in the upper-right of the panels.

When in doubt just restart the session by hitting refresh.

4.0 Using Voyant on You Own Text

Voyant Tools can be used on your own text or corpus. To do that you go to the simple URL for the tool:

Voyant: http://voyeurtools.org

Just the Cirrus tool in Voyant: http://voyeurtools.org/tool/Cirrus/

Backup older version: http://voyeur.hermeneuti.ca

You will get panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. that asks you for a text. You can provide:

  • One or more URLs to texts on the web
  • Upload a text or a zipped collection of texts
  • Upload plain text, HTML, or XML texts
  • Upload a PDF (and Voyant will try to extract the text)

Voyant is forgiving, but there are none-the-less bugs.

5.0 Other Stuff

Here are some corpora and skins:

DH 2011 Visualization for Literary History (TAPoRware and Voyeur)

This is an outline for a workshop on visualization with Voyeur. It is based on a workshop given at DH 2010 in London, England.

1.0 Introduction

Here is a list of links for the Visualization for Literary History:

Here are the tools to try for the full Voyeur interface:

Now lets try the full text again in the full Voyeur:

For a list of tools see: http://entry.tapor.ca

DH 2011 Visualization for Literary History (Visualization with Voyeur)

This is an outline for a workshop on visualization with Voyeur. It is based on a workshop given at DH 2010 in London, England.

1.0 Introduction

2.0 Visualizing a Single Text

In the first part of the Workshop we will show you how to use Voyeur to visualize  a single text as a way of learning the interface. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain text is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

In order to focus on each tool independently, will open each Voyeur tool separately. 

  • First we will look at Cirrus: http://voyeurtools.org/tool/Cirrus/  
  • Cirrus is a visualization tool that displays a word cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. More Info on the TAPoR WordCloud Tool A good discussion on Tag Clouds Return to Glossary. relating to the frequency of words appearing in one or more documents. One can click on any word appearing in the cloud to obtain detailed information about its relativity. The larger the word, the more frequent the term.
    • Show how to load a text by copying one of the Frankenstein URLs into the "Add Texts" box
    • Show how hovering over the words reveals a number showing the word count of the current word in the corpus. 
    • Show how clicking on a word produces a textual set of results as a list on a new page.  These results include a count, a relative count, and a trend graph.
  • Next we will look at Links: http://voyeurtools.org/tool/Links/
  • Links finds collocates for words and displays links between them using a force directed graph. It shows term frequencies in proximity to keyword. It is a visualization and shows a web of terms. Once you arrive to Links, insert / upload your content and let the tool perform its analysis. You will be presented with a web type visualization. You may hover over words to find data pertaining to that word within your corpus. You may also double-click on any word to find a more detailed analysis. Clicking and dragging allows you to organize your corpus. If there are multiple documents within the corpus, they will be coloured differently.
  • Load a text by copying one of the Frankenstein URLs into the "Add Texts" box
  • If you hover over a term, Voyeur will tell you its linkage within the corpus documents.
  • Try dragging and dropping terms to organize them.
  • If you would like to manipulate the visualization, right-click on any of the terms and choose 'Stick/unstick' or 'Remove'. 'Stick/unstick' puts the term in place, and is not moved when other terms are moved. 'Remove' simply removes the term from the visualization.
  • Clicking on the options button (the button that looks like a gear) will launch a dialog box with various options pertaining to the Links tool. Stop words list is if you would like to exclude words from the visualization. (Usually words such as 'a', 'the', and 'and'.) 'Node size determined by type frequency' is the default, and will result in sorting by how often the term appears in the documents. Sorting by 'Node links' will result in terms appearing larger if they are heavily linked with other terms. 'Autofit graph on screen' sizes the graph depending on the size of your browser window. 'Remove orphans' will remove terms which are not linked to any other term in the visualization.
Now we will look at Word Trends http://voyeurtools.org/tool/TypeFrequenciesChart/
  • Term Frequencies Chart shows how terms are distributed across document(s) in a corpus (documents are shown in the order in which they were added).  Every charted line represents one word common throughout the entire corpus. If you hover over specific points it will give you specific information about that word in a specific document.
  • When you add analyze a corpus with Term Frequencies Grid, you will initially have common words at the top of the chart with colour codes. You will see lines within the graph which are coloured accordingly to those words. If you click on one of the terms at the top, it will omit that term from the graph.
  • When we hover over the segment points, we can see the frequency of that term in that segment. If you click on the point, Voyeur will open a new window with detailed information of that segment and term within its Document KWICs tool.
  • If you click and drag on a section of the chart it will zoom in to that section. To reset the chart to its original state, click on “reset zoom”.

  • If you would like to see less or more segments on the chart, simply click on “Segments” at the bottom left of the chart to choose the desired segments.

Other Things
  • We will look at how how to get help (Mention Quick Guide)
  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

3.0 Analyzing a Corpus

In the second part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist

 

  • Bubblelines is a visualization tool that helps to understand patterns of word repetition in one or more documents. Each document is represented as a horizontal line and each seach term is represented as a bubble – the bubble represents the frequency of the term in the corresponding segment of text (the text is divided into segments of equal length). The larger the bubble, the more frequent the term.
  • Load a text by copying one of the Frankenstein URLs into the "Add Texts" box
  • Hovering over a bubble, or set of bubbles, will cause a box to appear that displays the frequency counts for that segment of text.

  • Similarly, hovering over the number at the end of the line will cause a box to appear that summarizes the frequency for the entire document.

  • When Bubblelines first loads a corpus, you may see terms that have been pre-selected and included in the URL or embedded page. If no terms are specified, Bubblelines automatically fetches the five most frequent terms and displays bubbles based on those.

  • You can remove the default terms by clicking on the "Clear Terms" button.

  • You can add additional terms to be displayed using the "Find Term" box. Note that available terms will appear as you type and you can pick an item from the list to have it added.

  • In addition to adding and removing terms, you can toggle the display of the terms that have been loaded. To do so simply click on the term (active terms are underlined).

  • ScatterPlot creates a scatter plot graph of terms, spaced by their variation from one another. Once you arrive to ScatterPlot, insert / upload your content and let the tool perform its analysis. You may hover over these dots and click on them for more information.
  • When you first load ScatterPlot, you will see a variety of terms plotted on a graph. If you hover over the terms, you will see their variation explained by each component on the x and y axis. If you click on any of these terms, it will bring you to the Document KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. tool for further analysis.

  • ScatterPlot offers options for changing the plot. The terms button allows you choose how many terms should be displayed. The dimensions button lets you switch between a two or three dimensional graph. Toggle labels simply removes or adds labels for the terms on the graph.

  • Some other things to try:
    • Set stoplists.  You may want to exclude common words.  To do this, click on the "Options" button, represented by a gear icon in the upper-right. 
    • Manage multiple documents.  
    • Show how to group results
    • Show comparing document
  • Try looking for trends yourself using the different tools

4.0 Using your own text

  • Now you can try your own text. There are different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs (as we have done above)
    • Uploading a text, using the "upload" button
  • For uploading, there are a number of formats of texts that will work:
    • file formats: text, HTML, XML, RSS, TEI, PDF, MS Word, RTF
  • Finally, we will discuss caching and so on.
  • Now try your own text.

5.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XML results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • How to quote an analytical result in TADA.
  • Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.

6.0 Advanced and Other

7.0 To Prepare

  • Make sure we have Voyeur running with a backup
  • Sort out how participants can get on wireless
  • Powerbars for laptops
  • Preindex texts

DH 2011 Voyeur Tools

This outline is for a workshop offered at the Digital Humanities 2011 conference at Stanford.

Please note that the main server for Voyeur Tools (voyeurtools.org) may be inaccessible so we have created a backup installation (dev.voyeurtools.org:8080). They should function very similarly, but the corpora loaded into the development server may not be accessible after the workshop.

1.0 Introduction

2.0 Introduction to text analysis using individual Voyeur tools

After introductions we will show you how to use individual tools in Voyeur to analyze a single text as a way of thinking about techniques in text analysis. We will work with Mary Shelley's Frankenstein, the Humanist discussion list corpus and a collection of Austen novels. The plain text to Frankenstein is here: http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

Here are the tools we will try:

  1. Cirrus (with Austen): http://voyeur.hermeneuti.ca/tool/Cirrus/?corpus=1308408654248.9846&stopList=stop.en.taporware.txt
  2. Word Trends (with Humanist): http://dev.voyeurtools.org:8080/tool/TypeFrequenciesChart/?corpus=humanist
  3. Links (Collocates): http://dev.voyeurtools.org:8080/tool/Links/?corpus=1308459917755.5623&mode=document&stopList=stop.en.taporware.txt

We will discuss the standard controls for a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. and how you can cite and embed panels (with their texts).

3.0 Distant Reading: Analyzing a Single Text

In the third part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface.

  • We will open Voyeur:
    • Show how to load a text (including XML options)
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.
      • Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. (Options, Columns, Search, Favorites)
    • Discuss the full set of panels
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1308459917755.5623&stopList=stop.en.taporware.txt

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

4.0 Distant Reading: Analyzing a Corpus with Correspondence Analysis

In the fourth part of the Workshop we will look at working with a corpus using a different skin and the Correspondence Analysis tool. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.)

We will use the Humanist Corpus with a different skin or arrangement of panels:

http://dev.voyeurtools.org:8080/?corpus=humanist&skin=scatter&stopList=stop.en.taporware.txt

We will discuss how to use the Correspondence Analysis tool to explore themes in a diachronic corpus. For more on CA see http://stefansinclair.name/correspondence-analysis

Some of the features to look at:

  • Controlling the visualization (labels, words, etc.)
  • Using the list of words (selecting multiple words)
  • Controlling panels

5.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTML, XML, RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on

Try your own text now.

6.0 Exporting Data and Quoting Analytics

There are different ways to export data and quote analytical results:

  • You can export tab-separated values, copy and pasted into Excel
  • You can export of XML results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • You can embed live tool snippets (in a blog post, TADA, etc.)

7.0 Wrap-Up

  • other aspects: skins, tool browser
  • how to give feedback
  • future: Voyeur Notebooks, new TAPoR
  • thanks!

DH2010 Introduction to Voyeur

This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.

1.0 Introduction

2.0 Analyzing a Single Text

In the first part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain text is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

  • We will open Voyeur:
    • Show how to load a text
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.
      • Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. (Options, Columns, Search, Favorites)
    • Discuss the full set of panels
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://voyeurtools.org/?corpus=1278409278561.646

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

3.0 Analyzing a Corpus

In the second part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist

  • We will show you how to:
    • Show how to set various options, like stoplists
    • Show how to hide and show columns
    • Manage multiple documents
    • Show how to group results
    • Show comparing documents
  • Try looking for trends yourself

4.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTML, XML, RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on
  • Now try your own text.

5.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XML results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • How to quote an analytical result in TADA.
  • Go to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to try it yourself.

6.0 Advanced and Other

7.0 To Prepare

  • Make sure we have Voyeur running with a backup
  • Sort out how participants can get on wireless
  • Powerbars for laptops
  • What texts will we use?
  • Preindex texts and create a Workshop web page on Hermeneuti.c

Dublin 2011: From Metadata to Linked Data

This workshop outline is for a Summer School at Trinity College Dublin. See http://dho.ie/summerschool2011 for the full description. This outline is for Day 3  on Generating Textual Data:

Day 3: Generating Textual Data, Tobias Blanke and Geoffrey Rockwell

Based on the results of Day II, participants will dig deeper into the details of generating textual data using text and data mining techniques. Participants will learn methods to algorithmically create textual data while critically evaluating existing tools, methods, and solutions as well as their future potential. They will gain insights on how generic services need to be modified to serve the needs of humanities research. Finally, we will investigate how to generate output can be reused in the emerging web of data.


This is an outline for a workshop on Voyeur. It was developed for a workshop before DH 2010 in London, England.

1.0 Introduction

2.0 TAPoRware: A Simple Recipe for Studying Themes in a Text

In the second part of the Workshop we will show you how to use TAPoRware to analyze a single text as a way of thinking about techniques in text analysis. We will work with the Introduction, Preface, Chapter 1 and Chapter 2 of Mary Shelley's Frankenstein. The plain text is here:

http://taporware.ualberta.ca/sampleDocs/plainText.txt - This is just a couple of chapters

http://www.gutenberg.org/cache/epub/84/pg84.txt - This is the Gutenberg version of the full text

We will also be using some TAPoRware tools and Recipes for TAPoRware. The Tools and Recipes are here:

List Words: http://taporware.ualberta.ca/~taporware/textTools/listword.shtml - Use short Frankenstein

Concordance A concordance is a gathering of passages that "concord" or agree. Usually it is a gathering of passages with a sought for word. Concordances are a form of reading tool that go back to the Middle Ages. They are typically lists of words with their appearances. A concordance for the bible, for example, would have entries for all the content words of the bible in alphabetical order. Each entry would include information about where the word appears and some context. Searching for words on a computer now typically returns a concordance called a Key Word in Context (KWIC) with the sought word down the center and a few words of context on either side. Google returns a type of concordance when you search for a word with an example of the word in context for each page it recommends. See the Wikipedia entry on Concordance (Publishing) Return to Glossary. Tool: http://taporware.ualberta.ca/~taporware/textTools/findtext.shtml - Use short Frankenstein

Weighted Centroid: http://taporware.ualberta.ca/~taporware/otherTools/wcentroid.shtml - Use short Frankenstein

Principal Component Analysis: http://taporware.ualberta.ca/~taporware/betaTools/pca.shtml - Use short Frankenstein

2.2 Using Voyeur Simple Tools

Cirrus Word Cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. More Info on the TAPoR WordCloud Tool A good discussion on Tag Clouds Return to Glossary. (Frankenstein): http://dev.voyeurtools.org:8080/tool/Cirrus/?corpus=1309937516546.6692&query=&stopList=stop.en.taporware.txt

Cirrus Word Cloud A visual presentation of keywords drawn from a text, visually differentiated based on their position and frequency of use in that text. More Info on the TAPoR WordCloud Tool A good discussion on Tag Clouds Return to Glossary. (Austen): http://voyeur.hermeneuti.ca/tool/Cirrus/?corpus=1308408654248.9846&stopList=stop.en.taporware.txt

Other tools from Voyeur can be found here: http://hermeneuti.ca/voyeur/tools

3.0 Distant Reading: Analyzing a Single Text

In the third part of the Workshop we will show you how to use Voyeur to analyze a single text as a way of learning the interface.

  • We will open Voyeur:
    • Show how to load a text (Frankenstein: http://www.gutenberg.org/cache/epub/84/pg84.txt). Discuss different types of texts that can be loaded.
    • Show the different panels that appear initially
      • Discuss the order they open and the Summary panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary.
      • Discuss common features to panels
      • Go over the Words in the Entire Corpus panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary. (Options, Columns, Search, Favorites)
    • Show how to manage panels
    • Discuss trigger order of panels (flow within Voyeur)
    • Show how to get help (Mention Quick Guide)
    • Show how to make a list of favorite words to explore searching for words and saving in favorites
  • Now you should try Voyeur with your text or the Frankenstein text above. To open the Frankenstein click here:

http://voyeur.hermeneuti.ca/?corpus=1309937028026.8131

  • Some things to try:
    • Experiment with the Options (like the Stop Word list)
    • Create a Favorites list for a theme and and explore that list
    • Search for phrases

4.0 Distant Reading: Analyzing a Corpus

In the fourth part of the Workshop we will look at working with a corpus or collection of many texts. We will use Voyeur on the archives of HUMANIST from 1987 to 2008 (21 documents.) The Voyeur index is at:

http://voyeurtools.org/?corpus=humanist&skin=scatter&stopList=stop.en.taporware.txt

  • We will discuss:
    • Different skins with different panels
    • Correspondence analysis and the exploration of a large corpus
  • Try looking for trends yourself

5.0 Using your own text

  • Now you can try your own text. We will show the different ways of providing Voyeur a text:
    • Typing a text or pasting it in
    • Typing in one or more URLs
    • Uploading a text
  • We will then discuss the formats of texts that will work, and what will happen to them:
    • file formats: text, HTML, XML, RSS, TEI, PDF, MS Word, RTF
    • Finally we will Discuss caching and so on
  • Now try your own text.

6.0 Exporting Data and Quoting Analytics

We will now show how to export data and quote analytical results:

  • How to export tab-separated values, copy and pasted into Excel
  • How to export of XML results from KWICsA concordance or keyword in context (KWIC) is usually represented as a list of occurrences of a word with some limited context shown (words to the left and words to the right). Here is an example that shows the occurrences of the word "dream" in A Midsummer Night's Dream in TACTweb: I.1/577.1 | Four nights will quickly dream away the time; | And I.1/578.2 Swift as a shadow, short as any dream; | Brief as the II.2/585.1 | Ay me, for pity! what a dream was here! | Lysander, III.2/591.1 this derision | Shall seem a dream and fruitless vision, | IV.1/593.1 as the fierce vexation of a dream. | But first I will IV.1/594.2 to me | That yet we sleep, we dream. Do not you think | The IV.1/594.2 rare | vision. I have had a dream, past the wit of man to IV.1/594.2 the wit of man to | say what dream it was: man is but an IV.1/594.2 he go | about to expound this dream. Methought I was--there IV.1/594.2 his heart to report, what my dream | was. I will get Peter IV.1/594.2 to write a ballad of | this dream: it shall be called IV.1/594.2 it shall be called Bottom's dream, | because it hath no V.1/599.1 | Following darkness like a dream, | Now are frolic: not a V.1/599.2 theme, | No more yielding but a dream, | Gentles, do not See also the definition at Wikipedia. Return to Glossary. (for instance)
  • How to quote an analytical result in TADA.
  • Show going to http://tada.mcmaster.ca/Sandbox/VoyeurWorkshop to insert a panelWeb frameworks like the TAPoR Portal organize information into panels (sometimes called portlets or coplets.) These can me minimized, maximized and closed using the three buttons in the upper left-hand corner of the panel. With Voyant you can export panels of results and place them into other web sites. Return to Glossary..

7.0 Skinning Voyeur

We will now look at how you can develop a different skin.

  • Open a corpus like http://voyeurtools.org/?corpus=1309931394540.8106
  • Click on the Export button (the disk button in upper right) and export to layout builder
  • Drag panels into the blank area to create a custom skin (Warning: many combinations won't work)