Transcription protocol

The transcription should capture enough detail for tallying canonical tokens while maintaining readability.
  1. ++Teacher's first word of the utterance.
  2. --Student's first word of the utterance.
  3. Almost all utterances or turns start with an upper-case letter.
  4. 33A third person speaks up.
  5. 00This utterance continues from the previous line because it is beyond 46 words.
  6. @ laugh @@ longer laughter @@@ even longer laughter
  7. New-Years-Day a multi-word proper noun, which will count as one lexical item when looking for tokens
  8. I-see. You-know a fixed expression in which the verb's functional meaning is figurative; counts as one token
  9. [XX] unintelligible
  10. [chatter] multi-word unintelligible utterance, produced by one or more spakers
  11. WEDnesday Upper case indicates a louder syllable in the conversation, while still counting the word as a Wednesday token.
  12. * repeats the previous token without modification
  13. [dysfluency-a-]-around when the speaker repeats a syllable of a word.
  14. , Comma indicates my impression of a pause. For a more rigorous categorization, I could view the acoustic waveform to measure the existence and length of the putative pause.

Transcription Notes

I labelled dysfluencies to cull out the extra syllables from word counts and to preserve information for future analysis.
I used an asterisk * notation in an attempt to re-create dysfluency when the reader tries to follow the transcription. This sacrifice of readability produces what I assert is a more authentic sounding transcript.
the the next thing becomes the * next thing

I transcribed the following phrases thusly because the repeated words differ in intonation, lacking the false-start sense characteristic of word-level and phrase-level dysfluencies.
I guess. I guess you
OK. OK. Um. OK.
In one case, I heard a repetition without an intonational distinction, but I resisted characterizing it as a dysfluency because of the pause, indicated by a comma.
That’s a, that’s a pity.

Some dysfluencies incorporated modified repetition of the first syllable's vowel.
A-a (in IPA [e Ə])
For other unusual pronunciations, I maintained the word form rather than use a narrow transcription. For example Wez-day becomes Wednesday. This allows for word-frequency counts with Wednesday as the sole token.

Although Tagliamonte here advises against narrow transcription of phatics so that word counts are immune from allophonic differences. I therefore refrained from distinguishing among Uh-huh, Uh-hm, and Uh-hum and used only the first form when transcribing. Nevertheless, when I felt that there was a difference in meaning, I transcribed a vocalic elongation: I distinguished between OK and Oh-kay and between ah and aaah.

Utterance-initial lower case emphasizes when a speaker starts an utterance mid-sentence. The MS Access tokenizer that I wrote ignores the upper-case, lower-case distinction: each counts as an instance of the same token. Nevertheless, I assert that the upper-case, lower case distinction lends authenticity to the transcript, while also preserving information should one later elect to re-analyze the transcript in a more narrow manner using a case-sensitive tokenizer.
++Starting from the um you-know when * you got up in the morning to
++um the activities.

--Everyday, I keep a note diary a memo
--Yeah. to * not to confuse
--uh, everyday life.

++Um. Great. So, um, you recently finished your first assignment, [chatter]
--Fourth assignment. @@
++Sorry, your fourth assignment. Excuse me.
++Um, and that was showing the American military men [dysfluency-a-] around. Um. How many days were you doing that?

Elsewhere, I noted a comma at the end of an utterance for what seemed like a pause. That pause lasted long enough for the other speaker to interject a phatic and the first speaker to continue the utterance.
--So, after the lecture and ah it was a party, so I joined them,
--um at the um at the [dysfluency-a] Assembly-Hall.
--And there are many ah important people there, some American ambassadors, or ah some actually Diet dieting Diet members,
++Oh, really.
--came to the Tokyo.

(Let's take a look at these phatic-rich interchanges through an aizuchi lens.)

I could not resist using a narrow transcription for the student who over-generalized the past tense marker with then to get thend.

(Thanks to CM for providing the reference to APPENDIX C. TRANSCRIPTION PROTOCOL. Tagliamonte, Sali A. Last updated on: September 10th, 2004, which auto-magically downloads here.)
Here is a 2011 transcription protocol from Penn , which addresses dysfluencies/disfluencies, but ends up with coded symbols.
These in Appendix A, p.7, are from SDSU on the West Coast, in Prof Wayne A. Beach's Conversation Analytic Methods in Communication Research – Acknowledgement Tokens & Related Social Activities