Reviewer II concludes (yes, there is, in fact, a conclusion to their critique) with the second half of my introduction:
This study, however, is based on faulty assumptions that undermine both its rationale and its conclusions: assumptions about test performance; about prerequisites for certain linguistic skills; about a disconnect between speech-based vs. typing-based communication skills; about what it would take for someone to point to letters based on cues from the person holding up the board; and about the study’s central premise: the agency of eye gaze.
Reviewer II views this not as a thesis statement to be supported in subsequent paragraphs, but as an accusation to be dissected as is:
These are very strong accusatory words. Let’s dissect them here:
And off he goes, beginning with my accusation that the authors make faulty assumptions about the prerequisites for certain linguistic skills:
This is not a study of linguistic [sic] but rather a study of how utilizing eye signals and pointing behavior combined whenever possible with vocalization of letters, it is possible to form phrases in response to questions. The goals of this study are far more modest than examining language. The level of performance is at the level of quantifiable volitional motor control and dyadic interactions.
Apparently, if the goals of a study aren’t to examine language, the study can make any assumptions it wants to about language.
Or perhaps Reviewer II thinks the study’s conclusions about eye signals, pointing behavior, volitional motor control, and dyadic interactions have nothing to do with language. And I suppose one can’t really rule out the possibility that Jaswal et al aren’t claiming that their experimental subjects are actually intentionally communicating. Indeed, perhaps the authors agree with me that, e.g., when one of their subjects types “That is hard. I feel like world is waiting on me not the other way around”, that person isn’t actually intentionally communicating a linguistic response to “Can you think of something you have to wait for?”, but simply moving their index finger in response to cues.
Re my second accusation, namely, that the study makes faulty assumptions about a disconnect between speech-based vs. typing-based communication skills, Reviewer II claims that:
The very point of the paper is to show how we have been missing that connection.
Ah: since the point of the paper involves making that assumption, I shouldn’t be criticizing it in the first place.
Reviewer II goes on to explain that:
Information flow exists in both domains [speaking and typing] but are hidden to the naked eye of the observer. The paper quantifies the type of communication that took place between two people, using several layers of data. This is a basic science study with clear reproducible methods. It is not a study to explain how language is acquired in humans.
I’m not sure what Reviewer II’s point is here, but with phrases like “quantifies”, “layers of data”, and “clear reproducible methods”, it must be something important.
When Reviewer II moves on my third accusation, re faulty assumptions about what it would take for someone to point to letters based on cues from the person holding up the board, his wording works itself into a veritable frenzy of mind-bendingly impressive phrases and concepts:
To point to letters based on cues from a person holding up a board, it takes in the first place, to process all the sensory information relevant to the task; determine the goal; coordinate the huge number of degrees of freedom of the body in motion, so the brain launches the intended motor command and receives the feedback from the sensory signals that the action generates; consider that signal in relation to the original mental intent and distributes it across the different systems of the eyes in head, head in trunk, arm-hand in trunk standing upright against gravity (operating at different time scales and frequencies) to execute appropriate sensory-motor (coordinates) transformations, so the proper forces are generated to propel the hand in a controlled manner to the letter on the board (without knocking it down or never reaching it) and to do all of that all over again under different stochastic (subtle) variations, because the person holding the board moved too. I have derived the partial differential equation that models that whole act geometrically, without the forces. It is a feat to do this and no robot can yet do it truly autonomously as these children/adolescents did.
Perhaps if Reviewer II had read my paper forwards rather than backwards, and had parsed the section under dissection as a thesis statement, they would have realized, as they proceeded through the subsequent supporting paragraphs, that my point had nothing to do with the complexity of sensory-motor activity, or with the partial differential equations involved in models thereof, or with the achievements of contemporary robotics.
Rather, it was a response to Jaswal et al’s claim that:
On a cueing account of a letterboard user’s performance, the assistant would need to deliver a cue that identified which of 26 letters to point to, and the user would need to detect, decode, and act upon that cue. Each of these steps would take time and would be subject to error, especially given the subtlety of the cues the assistant is hypothesized to deliver and the 26 cue-response alternatives.Jaswal et al, Eye-tracking reveals agency in assisted autistic communication, https://www.nature.com/articles/s41598-020-64553-9
My response was:
Completely overlooked here is the most obvious means of cuing: board movements that bring specific letters closer to the person’s extended finger. This involves no decoding whatsoever and speeds up rather than slows down the typing.
Readers, thanks for bearing with me through this convoluted, multi-directional dissection of a dissection. In my next post, I will post my critique of Jaswal et al in its entirety, followed by the comments from Reviewer I and II, so that those of you who want to can parse things in whatever way you’d like to–on surfaces moving or stationary, with or without eye-trackers mounted on your heads, and with or without the help of dyadic interactivity or differential equations.