# [Proposal] Representations of natural sound categories


Given the abundance of information available for retrieval from our immediate acoustic environment, one would think that the perceptual representations of our surroundings will be deeply embedded in that physical space. Sounds can be described with a myriad of tools and measures, and so disentangling their categorical membership based on that information should be a rather straightforward task since there is sufficient acoustic information to discriminate between them. Nevertheless, whether humans use this information in their perceptual judgements and whether it is reflected in perceptual structures of everyday sounds remains unclear. Categories are formed based not only on the perception of the stimuli physical features but also on more abstract types of information, such as meaning. Research suggests that these categories might have some acoustic grounding even though it might only be a contributing factor rather than a decisive one. Our research aims to examine whether and to what degree categorization and similarity judgements of natural sounds rely on acoustic vs semantic information.


Auditory categorization is a computational process that allows humans and animals to extract, manipulate and respond to sounds heard in the environment based on their acoustic features and other available information (Tsunada & Cohen, 2014). Research demonstrates that sounds’ categorical representations are linked to their underlying acoustics suggesting that these categories are formed not only based on the perception of physical features of the stimuli but also more abstract types of information, such as meaning. Evidence from imaging studies further supports the hypothesis that environmental sounds and words describing them share a common conceptual space by showing priming effects of words vs sounds and vice versa (e.g., Orgs, Lange, Dombrowski, & Heil, 2006; Aramaki, Marie, Kronland-Martinet, Ystad, & Besson, 2010) and shared neural resources for processing language and environmental sounds (e.g., Dick et al., 2007; Leech & Saygin, 2011). Moreover, ecological approaches to listening posit that people do not perceive the acoustic environment in terms of acoustic features describing its constituents, but rather sound-producing sources and events or contexts in which they are embedded (Gaver, 1993; for discussion on what do we hear see O’Callaghan, 2008, 2011).

Theories of categorization refer to the processes based on the conjunction of necessary and sufficient conditions which need to be met in order to classify an object as belonging to the specific category (for review see: Smith & Medin, 1981). Ultimately, the object either possesses a particular feature or not, and the category membership is concluded on an all-or-none basis. However, this approach assumes that the categories are exhaustive and mutually exclusive, which is not always true for auditory objects present in the environment. People may rely on different types of principles to group sounds, and those sounds may belong to one or more semantically separate groups (i.e., cross-categorization). This suggests that grouping relies on a recognized set of similarities or dissimilarities and varies depending on the chosen criteria. Furthermore, it may involve comparisons of each entity with either a prototype (Rosch, 1975) or more or less typical exemplar (Nosofsky, 1992) to delineate the boundaries between the categories. At the neural level, categorization based on matching pattern templates has been proposed as a putative mechanism accounting for perceptual invariance (Engineer et al., 2013). Therefore, the similarity between the exemplars will be a dominating theme while discussing perceptual processes in categorization.

By necessity, the similarity between sounds is defined with respect to the dimensions used in making similarity judgments. Sounds might be similar according to one dimension (e.g., pitch) but dissimilar according to another dimension (e.g., amplitude). However, it is difficult to disentangle their contributions, as various properties often concur, e.g. sounds with similar pitch might also have similar amplitude. Acoustic differences can also be overwritten entirely by the sounds’ meaning or common context. Thus, it is vital to assess the degree to which different object properties determine perceived similarity. Building on the reviewed literature, we want to extend this framework by investigating the representations of natural sound categories and various dimensions underlying those representations.



The stimuli are sounds selected as the best exemplars of 53 unique categories of natural sounds (e.g., barking dog, honking car or crying baby). Initially, each category was represented by 10 acoustically different recordings to reflect within and between-category variability. Based on results from our previous behavioural studies involving over 1000 respondents, one recording per category was chosen as the most representative exemplar of each category.

Similarity judgements

The spatial arrangement task allows participants to express multiple sound similarities at once by placing them in relative distance to one another in a two-dimensional space. The experiment consists of multiple trials presenting various combinations and subsets of sounds and stops as soon as participants achieve a required level of stability of their arrangements. All subsequent trials include fewer stimuli so that the participants can refine their judgements. On each trial, participants will see circle icons with embedded sounds initially presented in a circular arrangement around the arena. They can click on each of the icons, listen to each sound and arrange the sounds according to their perceived similarity inside the circular arena so that similar sounds are close together and different sounds are far apart.

The experiment was designed and is hosted on the Meadows Research platform (www.meadows-research.com) that can be easily integrated with Prolific.

This experimental paradigm was thoroughly piloted (N=30), and completing the task takes approximately 90 minutes.

Sample size

Data simulation predicted that we need to recruit 100 good participants to cover the whole stimuli matrix with stable similarity representations. To account for participants excluded due to bad performance, we plan to recruit 120 participants in total. 20% is an average participants’ rejection rate calculated based on previous studies.


Minimum pay rate: £7.5 p/h

Time (total) Participants’ cost VAT (22%) Prolific fee (33%) Total
90 minutes £1350.00 £90.00 £450.00 £1890.00

Commitment to open science practices

We plan to publish the results and their interpretations in a peer-reviewed scientific journal in open access and make the preprint available on https://psyarxiv.com/. Any scripts used to extract and analyze the data will be published in an online repository together with the fully anonymized data. All the stimuli will also be made publicly available.



Thank you for reading my proposal! I know it’s not perfect, but it has the potential to uncover the dimensions underlying the perception of natural categories. This information will help us learn more about how we listen and understand our surrounding acoustic world. Planned analyses are outlined in the preregistration document. The experiment was also thoroughly piloted, and it’s ready to go - you can help make this happen by clicking “Vote”! :pray: If you are interested in the project or have any questions, I’m happy to chat about it. Thanks!

1 Like