|
Post by Mike Maune on Aug 24, 2018 8:18:02 GMT -5
I've had some success in using the auto-coder to tag inscribed Engagement. It doesn't catch everything, and for some of these, I have to go into the search results and deselect a few errors, but it has sped up the Engagement tagging considerably. Any suggestions on how to improve either the auto-codes pictured or auto-coding Engagement in general?
|
|
|
Post by Mike Maune on Aug 24, 2018 10:24:02 GMT -5
My dataset is undergraduate argumentative history essays--hence why some of these have worked so well. I think a best practice for discussing research with UAMCT is to share the zipped project folder when possible--unfortunately, our IRB won't allow that in this case. But as in most issues of troubleshooting, it's usually important, I think, to share as much contextual information as possible.
|
|
Beth
New Member
Posts: 2
|
Post by Beth on Aug 24, 2018 19:41:41 GMT -5
Hi Mike, I have similar issues when using UAMCT, and other corpus tools. I'm wondering if it's possible to auto-code the Appraisal resources at all, by using a corpus tool. The Appraisal framework operates at the semantics stratum, which does not have a one-on-one relationship with the realisations on the lexicogrammar level. Therefore, the cause for the issue might not be the potential defects in the algorithm of the tool, but it's that there can never be an exhausted list of words realising Engagement, or other Appraisal subsystems, and the meaning of a word is conditioned and constrained by its context, instead of by its spelling. Even if we aim at auto-coding the inscribed Engagement only, it might not be easy for a computer to decide if the meaning of the word is stable across context, and if the dialogic positioning realised by this word is stable across context.
However, as you mentioned your dataset is about one register, and one register only, I would assume it might be possible to improve the accuracy by a "bag of words" method or a machine-learning approach.
What do you think?
|
|
|
Post by Mike Maune on Aug 24, 2018 23:41:50 GMT -5
Welcome to the forum, Beth, and thank you for getting our first conversation underway. I think it's possible right now to, given a specific corpus, take a chunk out of the inscribed Engagement with the autocoder. And I think it MIGHT be possible to, at some point, tag most texts for Engagement algorithimically. First, re: the stratified model. Yeah, there's not a one-to-one relationship between lexicogrammar and discourse semantics. But then, there isn't a one-to-one relationship between the graphological stratum and the lexicogrammatical stratum, and the UAMCT does a reasonably good job tagging lexicogrammar. It makes errors sometimes--a colleague of mine found it gets tripped up on phrasal verbs, for example, and it doesn't do Behavioral configurations. But given the choice of manually coding with a certain amount of human error and autocoding with a certain amount of machine error, I think it's worth it to consider whether I'm willing to tolerate the machine error for my purposes. A second point regarding the stratified model: the lexicogrammar realizes discourse semantics, but register doesn't realize discourse semantics. If realization is a kind of "encoding" (Martin & Rose, 2008... ugh... I dunno if this is a good convention to carry over to a message board lol), then it seems like if it IS possible to capture some amount of Engagement automatically, it would be through a series of search rules of the lexicogrammatical stratum--potentially a kind of manifestation of the actual realization "rules." I don't disagree that meaning is shaped and constrained by context of situation and culture, but the model does also indicate it's shaped--in fact realized--in the other direction as well. Which is what the UAMCT is set up to handle. So your point is well taken--this approach is not going to catch the effect context has on discourse semantic meaning. So I either tolerate that error or find a way to factor that in--the latter is preferable to the former, but I but do not have a solution at the moment, and welcome suggestions. You've pointed out other limitations of the approach: it's "bag of words" (BOW henceforth?) and doesn't take into account logogenetic perspective--but at the moment, the UAMCT is primarily taking a BOW approach, unless you tag genre stages, for example. I think both synoptic and dynamic perspectives can both be useful and accurate descriptions of the data, even if a dynamic, logogenetic approach is definitely going to improve accuracy of Engagement tagging. At the same time, BOW relative frequency analysis also is informative and accurate description of a corpus--and my current dataset suggests it can be used to inform discourse semantic autocoding rules--though not without a certain amount of error. I don't know if autotagging is EVER going to get EVERYTHING right. But I do think it's possible for the autocoder to get some of it right--hence why I think it's worth it to try and to improve the autocoding rules. Thoughts?
|
|
|
Post by Mike Maune on Aug 24, 2018 23:50:00 GMT -5
One last point, since you brought up machine learning--it's entirely possible that the autocoder in UAMCT is a crude instrument for the task, so machine-learning might be better. At the moment, my coding skills are limited, and I haven't written a machine-learning program yet--but it's on my to-do list, and maybe that will be part of the equation later on?
|
|
Beth
New Member
Posts: 2
|
Post by Beth on Aug 28, 2018 0:24:47 GMT -5
Hi Mike, thanks for your response. Yep, it's absolutely true that neither human error nor machine error is preventable. I am manually annotating my data now but I tend to change my annotations now and then.
BTW, I have very limited knowledge about programming but am curious to learn the auto coding rules in UAMCT. Is there any way for me to see the algorithms or the rules of the software? Thanks!
|
|
|
Post by Mike Maune on Aug 28, 2018 2:03:58 GMT -5
Yes! If you go to the Help section of UAMCT and check out the Autocode section, that should help--as will the CQL & Concordance sections. The rules I write are mostly based on CQL searches, but you can also use the buttons to generate the searches.
I've included a few of the rules in the first post in this thread. They are based on the prototypical realizations that Martin & White include in their chapter on Engagement.
|
|
|
Post by Mike Maune on Aug 28, 2018 2:07:00 GMT -5
I'm also going to cover autocoding in the webinar later in the semester.
|
|
|
Post by Mike Maune on Sept 6, 2018 8:37:53 GMT -5
Just an update: I realized my auto-code for Entertain was WAY too broad. It coded all modality as Entertain, when in fact Martin & White say that Entertain is realized typically by Modality of "likelihood." That narrows the semantics quite a bit. I'll be revising that rule in my dataset soon.
|
|