I’m not a dolphin communication expert, but I have concerns with the recent paper by Vyacheslav Ryabov in the St. Petersburg Polytechnical University Journal: Physics and Mathematics.
It’s garnered a huge amount of extremely positive press coverage, but my takeaway from the paper is less enthusiastic: all we actually see is a 30 second snippet of captive dolphin sounds.
One of the simplest claims that the author makes is that the dolphins demonstrate turn-taking, and that this mirrors the nature of human conversations. Many researchers have developed ways to quantify turn-taking; it’s important to demonstrate that the apparent turn-taking is not just what would happen if two dolphins randomly produced sounds. Unfortunately, there is no such analysis in this paper. The author has only included one 30 second waveform and doesn’t provide any information about the total recording duration. They don’t prove that that particular snippet is representative of all of the recordings. Therefore, it’s impossible to determine whether they’ve cherry picked a 30 second recording that looks like it shows turn-taking out of what could possibly be hundreds of hours of recordings.
There are two sides to every conversation: perception and production. The author only looks at production in this study and doesn’t investigate whether the receiver dolphin was actually listening, or what they might have perceived/understood from the sender dolphin. One fundamental part of human language is categorical perception. We categorise every language sound we hear, even when the sounds actually occur on a sliding/continuous scale.For example, “ra” and “la” are actually continuous and you can make a sound that’s halfway in between them, but all native English speakers draw a line somewhere on that continuum to divide it into “ra” and “la”. I always hear it as being one or the other and never a combination. Some East Asian speakers don’t draw that line – they perceive that whole continuum as one phoneme (small chunk of sound) where I perceive that continuum as two phonemes.
The reason this is important is because the author states that “Each pulse … that is produced by dolphins is different from another by its appearance in the time domain and by the set of spectral components in the frequency domain. In this regard, we can assume that each pulse represents a phoneme or a word of the dolphin’s spoken language.” For one, the authors don’t show us the “spectral components in the frequency domain” at all. Most papers with data like this would include a plot that shows time plotted against frequency (how high or low a sound is), but this paper only shows time plotted against sound pressure level (effectively loudness). So we’re missing all the data we need to determine if the author’s statement is true.
But more importantly, they claim that each pulse represents a phoneme or a word because they’re all different. That’s unsurprising; every time I say a word, it’s going to be a very slightly different waveform. We just don’t know if the dolphins perceive the pulses to be different. In order to conclude that the pulses are equivalent to phonemes, the authors would need to show that the dolphins have categorical perception for these phonemes; that they categorise some of them as the same even though they’re slightly different. That would involve intensive behavioural work that was not conducted for this paper.
Another thing the author does is go through a list of characteristics of human language and attempt to demonstrate that dolphin communication also has those characteristics. All of these are difficult to demonstrate and the author has not conducted any experiments to explicitly address any of them. For example, one of these is duality of patterning, (i.e. meaningless phonemes make up meaningful words, and words make up messages). The author states that dolphin communication has this duality of patterning, but does not provide any evidence for this. Even in the earlier claim that the pulses equate to parts of human language, the author could not determine whether the pulses equate to meaningless phonemes or to meaningful words. There’s no evidence that the pulses are meaningless (very simple animal calls have meaning) or that the pulses build up into a meaningful message.
Ultimately, the paper really only showed one 30 second snippet where the dolphins did not produce sounds at the same time and with only two changes in “speaker”. It’s hard to draw much more of a conclusion from this study except that the author recorded dolphins for an unknown duration and found one short example of what superficially looks like a conversation.