[Tlhingan-hol] Certification Test Woes

d'Armond Speers, Ph.D. speersd at georgetown.edu
Tue Apr 1 09:05:54 PDT 2014


On Tue, Apr 1, 2014 at 1:26 AM, Lieven <levinius at gmx.de> wrote:

>
> Am 01.04.2014 04:39, schrieb d'Armond Speers, Ph.D.:
>
>  questions on any given test by weight/content wouldn't interfere with
>> the randomization, making certain questions more or less likely to
>> appear on a test.  I'm open to suggestions on whether this is an issue
>>
>
> You have prbably thought about it already, but can't the weight of a
> quetion be attached to the number of words or syllables in the answer?
>
> e.g
> translate "shoe" - answer = 1 point
> translate "where is the bathroom" - answer = 6 points
>
> This may not always work, but can make some more difference.


Well, if we were to go with an approach like this I wouldn't count
syllables, but morphemes.  No reason that "bathroom" should have a higher
value than "loo".  The questions are (a) how do you reliably calculate this
value; and (b) what do you do with it?

For (a) how do you reliably calculate the content value, the problem is
that we're talking about the content value of the expected answer, not the
question itself.  I didn't want to just have every question be a "translate
this sentence" type of question; we also have "fill in the blank" and other
types of direct questions ("what is the subject and object indicated by
this verb prefix?"), which are typically low-content-value answers.  For
the translation type questions, the student is free to translate however
they like, even though the possibilities are still pretty few in the Level
1 test.  Just because I think they may use 4 morphemes in their answer
doesn't preclude them using 8 (with twice as many opportunities for
errors).  And each question isn't just testing a single grammar point; some
are testing multiple topics at once.  We do define the expected answer (as
a benefit to the one grading the test, not as a hard-and-fast right/wrong
test), so we could just use that as an estimate and call it good enough.
 Or should we also take into account the number of topics associated with
each question?  See, this is complicated.

For (b) what do you do with it, there are two general ways to approach
this.  (Well, three, if you count the way we did it, which is to allow
randomization to take care of it.)  You can make all expected answers have
the same amount of content (very hard, and doesn't permit direct
questions), or you can measure the content of expected answers and
establish some heuristic for how many questions with each content value to
include on a test and ensuring that each test has the same content value
total across 20 questions.  You would probably accomplish this by ensuring
that each topic listed in the guidelines for that level had the same number
of questions for each content value, which would mean greatly expanding the
size of the test bank.  Defining that heuristic is an empirical question on
test design (I remember doing this stuff in college and it was tedious!),
so not my preference to undertake the task.

My preference was/is not to make all of the questions uniform.  I could
easily come up with a test bank of 500 questions just asking to translate
each vocabulary term, but that's (a) boring and (b) not really measuring
their practical skill with the language.  Some questions are directly about
grammar, some are translation (simultaneously evaluating their grasp of
grammar and vocabulary), and some are on some specific topic, like the
distinction between {ghobe'} and {Qo'}.  Honestly, when considering the
question of how to create questions that were meaningful, interesting, and
sufficiently varied to account for the range of language use, while
maintaining balance across the topics that we identified in the guidelines,
I didn't think it was practical to take into account the content value of
the expected answer as well, not without having to increase the size of the
test bank considerably.  It was already a huge effort, so I didn't think
that was practical.

Ah, memories.  :)  veqlargh is always in the details.  Am I over-thinking
this?  Is there a simpler way?  And if not, is the original problem serious
enough to warrant this level of test re-design?

--Holtej
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kli.org/pipermail/tlhingan-hol/attachments/20140401/fe7c467f/attachment.html>


More information about the Tlhingan-hol mailing list