Transcription+protocol-1

Transcription protocol
The transcription should capture enough detail for tallying canonical tokens while maintaining readability.
 * 1) **++**Teacher's first word of the utterance. [[image:https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcRVLNLh_jXC9mMJxiJm77iuywdcXFTzO5mGBVO95UrPSHKR7jc1 align="right" caption="Courtroom equipment gov't photo" link="@http://www.tned.uscourts.gov/docs/0708history.pdf"]]
 * 2) **--**Student's first word of the utterance.
 * 3) Almost all utterances or turns start with an upper-case letter.
 * 4) **33**A third person speaks up.
 * 5) **00**This utterance continues from the previous line because it is beyond 46 words.
 * 6) **@** laugh **@@** longer laughter **@@@** even longer laughter
 * 7) **//New-Years-Day//** a multi-word proper noun, which will count as one lexical item when looking for tokens
 * 8) **I-see**. **You-know** a fixed expression in which the verb's functional meaning is figurative; counts as one token
 * 9) **[XX]** unintelligible
 * 10) **[chatter]** multi-word unintelligible utterance, produced by one or more spakers
 * 11) //**WED**nesday// Upper case indicates a louder syllable in the conversation, while still counting the word as a Wednesday token.
 * 12) * repeats the previous token without modification
 * 13) **[dysfluency-a-]//-around//** when the speaker repeats a syllable of a word.
 * 14) **, ** Comma indicates my impression of a pause. For a more rigorous categorization, I could view the acoustic waveform to measure the existence and length of the putative pause.

Transcription Notes
I labelled dysfluencies to cull out the extra syllables from word counts and to preserve information for future analysis. I used an asterisk ** * ** notation in an attempt to re-create dysfluency when the reader tries to follow the transcription. This sacrifice of readability produces what I assert is a more authentic sounding transcript. **//the the next thing//** becomes **//the//** ** // * // ** **//next thing//**

I transcribed the following phrases thusly because the repeated words differ in intonation, lacking the false-start sense characteristic of word-level and phrase-level dysfluencies. In one case, I heard a repetition without an intonational distinction, but I resisted characterizing it as a dysfluency because of the pause, indicated by a comma.
 * // I guess. I guess you //**
 * // OK. OK. Um. OK. //**
 * // That’s a, that’s a pity. // **

Some dysfluencies incorporated modified repetition of the first syllable's vowel. **//A-a//** (in IPA **[e** **Ə ** ]) For other unusual pronunciations, I maintained the word form rather than use a narrow transcription. For example ** //Wez-day// ** becomes ** //Wednesday// **. This allows for word-frequency counts with **Wednesday** as the sole token.
 * clen-clinic **

Although Tagliamonte here advises against narrow transcription of phatics so that word counts are immune from allophonic differences. I therefore refrained from distinguishing among **Uh-huh**, **Uh-hm**, and **Uh-hum** and used only the first form when transcribing. Nevertheless, when I felt that there was a difference in meaning, I transcribed a vocalic elongation: I distinguished between **//OK//** and //**Oh-kay**// and between //**ah**// and **//aaah//**.

Utterance-initial lower case emphasizes when a speaker starts an utterance mid-sentence. The MS Access tokenizer that I wrote ignores the upper-case, lower-case distinction: each counts as an instance of the same token. Nevertheless, I assert that the upper-case, lower case distinction lends authenticity to the transcript, while also preserving information should one later elect to re-analyze the transcript in a more narrow manner using a case-sensitive tokenizer.
 * ++Starting from the um you-know when * you got up in the morning to**
 * --Ah.**
 * ++** um **the activities.**


 * --Everyday, I keep a note diary a memo **
 * ++Right **
 * --Yeah. to * not to confuse **
 * ++Right. **
 * -- ** uh **, everyday life. **
 * ++OK. **


 * ++** Um **. Great. So, um, you recently finished your first assignment, [chatter]**
 * --Fourth assignment. @@**
 * ++Sorry, your fourth assignment. Excuse me.**
 * ++** Um, **and that was showing the American military men [dysfluency-a-] around. Um. How many days were you doing that?**
 * ++** Um, **and that was showing the American military men [dysfluency-a-] around. Um. How many days were you doing that?**

Elsewhere, I noted a comma at the end of an utterance for what seemed like a pause. That pause lasted long enough for the other speaker to interject a phatic and the first speaker to continue the utterance.
 * --So, after the lecture and ah it was a party, so I joined them, **
 * ++Uh-huh.**
 * -- um at the um at the [dysfluency-a] Assembly-Hall. **
 * +Uh-huh **
 * --And there are many ah important people there, some American ambassadors, or ah some actually Diet dieting Diet members, **
 * ++Oh, really. **
 * -- came to the Tokyo. **

(Let's take a look at these phatic-rich interchanges through an @aizuchi lens.)

I could not resist using a narrow transcription for the student who over-generalized the past tense marker with **then** to get **thend.**

(Thanks to CM for providing the reference to APPENDIX C. TRANSCRIPTION PROTOCOL. Tagliamonte, Sali A. Last updated on: September 10th, 2004, which auto-magically downloads here.) Here is a 2011 transcription protocol from Penn, which addresses dysfluencies/disfluencies, but ends up with coded symbols. These in Appendix A, p.7, are from SDSU on the West Coast, in Prof Wayne A. Beach's **Conversation Analytic Methods in Communication Research – Acknowledgement Tokens & Related Social Activities**