Scott D. Gest, gest@psu.edu
Pennsylvania State University [1]
James Moody, jmoody77@soc.duke.edu
Duke University
Kelly L. Rulison, klr250@psu.edu
Pennsylvania State University
Abstract: Despite cross-disciplinary interest in social influence among adolescent peer groups, significant variations in collecting and analyzing peer network data have not been explored, so it is difficult to disentangle substantive and methodological differences in peer influence studies. We analyze two types of network data (self-reported friendships and multi-informant reports of children who “hang around together a lot”) with three methods of identifying group structures (two graph theoretic approaches and principal components analysis) to explore substantive differences in results. We then link these differences back to underlying features of the networks, allowing greater insight into the general problem of identifying groups in network data. We find that different analytic approaches applied to the same network data produced moderately concordant group solutions, with higher concordances for multi-informant data. The same analytic approaches applied to different relational data (on the same nodes) produced weaker concordance, suggesting that the underlying data structure may be more salient than analytic approach in accounting for different results across studies. Behavioral similarity among group members was greatest for approaches that rest directly on density of direct ties.
Sociological and psychological research on adolescent peer groups has often proceeded along parallel tracks, exploring similar phenomena but within distinct traditions for collecting and analyzing peer network data. Building upon a rich tradition of general social network analysis theory and methods (Doreian, Kapuscinski, Krackhardt, & Szczypula, 1996; Freeman, 2003; Friedkin & Cook, 1990; Moody, 2001a) sociologists have studied the structure of adolescent peer groups and their dynamic change over time (Doreian et al., 1996; Hallinan and Tuma, 1978; Hallinan, 1978; Haynie, 2001; Holland and Leinhardt, 1977; Moody, 2001b) as well as social influence and diffusion processes (Cohen, 1977; Giordano, Cernkovich, Groat, Pugh & Swinford, 1998; Jussim & Osgood, 1989). Similarly, psychologists have built upon theories emphasizing peers as contexts for individual development (Hartup, 1996; Kindermann, 1996; Sullivan, 1953) to study structure and change in dyadic and group networks (Berndt & Hoyle, 1985; Cairns, Leung, Buchanan & Cairns, 1995; Farmer, Estell, Bishop, O’Neal, & Cairns, 2003>; Urberg, Degirmencioglu, Tolson, & Halliday Scher, 1995) and the influence of peers on individual adaptation (Berndt, 1982, 1992; Cairns & Cairns, 1994; Hanish, Martin, Fabes, Leonard & Herzog, 2005; Kindermann, 1993).
Despite this parallel interest, there are relatively few cross-citations in the major sociological and developmental journals concerned with peer group processes. This is unfortunate because different data collection and analytic traditions have emerged in the two fields, making it difficult to integrate findings and slowing the transfer of insights and innovations from one field to the other. Our goal in this paper is to contribute to a productive integration of these traditions by using unique data from a single setting to explore the comparability of peer groups identified when two common adolescent peer network data collection procedures are analyzed with three common group identification algorithms.
In the peer context, data collection procedures typically vary along three dimensions: the substantive meaning of a social tie (friendship/affection vs. interaction), the level of analysis (dyad vs. group) and the informant (self-report vs. multi-informant). These three dimensions allow for many distinct measurement strategies, but for conceptual and practical reasons two measurement strategies have gained widespread use: self-reports of dyadic friendships and multi-informant reports of interaction-based groups. Similarly, while the number of grouping algorithms found in the literature is large, identifying principled axes of difference is more difficult. Two general approaches common in the literature are density-based graph-theoretic algorithms from the social network tradition and algorithms based on correlated patterns of social ties from the developmental studies tradition.
While others have studied a broader set of grouping algorithms (Freeman, 2003), we focus on these core disciplinary approaches to help foster comparability across a wide literature gap and to help link group comparisons directly to features of the network structure. Comparing grouping algorithms poses a difficult research design trap: if each approach is effectively maximizing its specific group-definition, one runs a clear risk of simply comparing incompatible definitions – that is, there is no clear external indicator of the true solution. However, in the absence of an external metric, being able to first compare different solutions then link those differences to underlying graph patterns helps flesh out the substantive meaning of otherwise implicit definitional differences embedded in grouping algorithms. In the adolescent peer group context explored here, we expect that differences in data type will affect the transitivity, density and structural cohesion (path structure) of the graph, and thereby lead to differences in how the three algorithms assign nodes to groups. Substantively, we hope these comparisons will provide a first step in establishing the degree to which studies of “peer networks” within different measurement and analysis traditions identify similar phenomena.
Self-reports of friendship dyads. Asking adolescents to name their friends is perhaps the most common measurement procedure in both sociology and psychology. Because friendships are typically defined as voluntary relationships based on liking, this procedure can be seen as a special case of defining meaningful social ties in terms of closeness, affection or liking, which has long roots in both sociological (Homans, 1950; Sampson, 1969) and developmental research (Bukowski, Newcomb and Hartup, 1996). Some researchers underscore this point by asking adolescents to name their “best” or “closest” friends or by asking adolescents to name classmates they like or feel close to. Developmental theorists have long held that feelings of friendship or closeness motivate attempts to understand and accommodate the friend’s concerns, thus providing one process of peer influence (Hartup, 1996; Newcomb and Bagwell, 1995; Sullivan, 1953). Because feelings of liking or affection are inherently subjective, self-reports are seen as the definitive method for identifying adolescents’ friendship preferences.
There is considerable variability both within and across disciplines in the way researchers analyze self-reports of friendships. Psychologists typically focus on friendship dyads and for theoretical reasons often restrict attention to reciprocated friendship choices (Berndt and Murphy, 2002; Hartup 1996), although some also consider non-reciprocated nominations (Hektner, August & Realmuto, 2000; Mrug, Hoza & Bukowski, 2004; Snyder, Horsch, & Childs, 1997) and larger group structures (Urberg et al., 1995). In contrast, sociologists often focus on group structures and typically look to asymmetries in nominations as indicators of group hierarchy and status, although some also remain focused on dyads (Hallinan and Tuma, 1978; Hallinan, 1978) or on reciprocated nominations (Coleman, 1961).
Multi-informant, interaction-based groups. A second measurement procedure that is increasingly being used in psychological research involves asking every adolescent in a social network to identify classmates who “hang around together a lot” (Cairns, Perrin and Cairns, 1985; Cairns, Cairns, Neckerman, Gest and Gariepy, 1988). As with self-reported friendships, this procedure embodies a particular perspective on the nature of social ties, the relevant level of analysis and the most suitable informant. Asking adolescents to identify peers who “hang around together a lot” means that social ties are defined in terms of interaction frequency. This makes sense from the perspective of social learning theories (Cairns, 1979; Patterson, 1974, 1982), which suggest that social behaviors are established, maintained and changed through repeated instances of modeling and reinforcement that occur within social interactions. For example, the quantity of preschool girls’ interactions with aggressive peers predicted increases over time in their own problem behavior (Hanish et al., 2005); and the quantity of delinquent adolescent boys’ friendship conversations that involved a well-organized focus on antisocial activities predicted the persistence of antisocial patterns (Dishion, Nelson, Winter, and Bullock, 2004).
The visible nature of social interactions suggests that reports may be obtained from any individuals with access to the relevant interaction settings. Certainly self-reports of interaction patterns are feasible and face-valid (Bagwell, Coie, Terry, and Lochman, 2000). Direct researcher observations can also be very effective with young children (Hanish et al., 2005; Ladd, 1983; Strayer & Santos, 1996; Vaughn and Waters, 1981), but are expensive to gather and have two disadvantages during adolescence: some important interaction settings may be inaccessible to researchers (e.g., hallways, buses), and those that are available (e.g., classrooms) may be misleading due to the strong constraints they impose on interaction patterns (Feld, 1981). In contrast, peers can be seen as expert participant-observers in the adolescent social network with unique access to a range of relevant settings. In a procedure developed by Cairns (described in detail below), all peers in a network are prompted to identify classmates who “hang around together a lot,” and the multiple reports are summarized in a symmetric “co-nomination matrix." The use of information from multiple informants to construct a global network was independently developed in the line of research on cognitive social structures (CSS; Krackhardt, 1987). The Cairns method differs from the CSS approach in that informants (“perceivers” in CSS terms) are not limited to reporting on common group membership, but rather are allowed to inform on any relation connecting others in the network.
Self-reports of friendship dyads and multi-informant reports of interaction-based groups are conceptually and operationally distinct ways of assessing adolescent peer networks. The two approaches differ in how they define the basis of social ties (closeness vs. interaction), the level of analysis at which data collection occurs (dyad vs. group) and the informant (self- vs. multi-informant). The resulting data structures are quite different: self-reports of friendships produce a directed adjacency matrix whereas multi-informant social groups produce a symmetric co-nomination matrix. These data differences often result in different degrees of density and transitivity. The group-basis of multiple-informant data results in graphs similar to the one-mode projection of two-mode graphs, with significantly more closed triads than self-reported graphs, which tend to be sparser. Each approach is a conceptually coherent strategy for identifying “adolescent peer groups”, but it is not at all obvious that groups derived from subjectively perceived, dyadic friendship ties are equivalent to those derived from consensually perceived, visible group interaction patterns. When researchers use these two different strategies to identify “peer groups”, are they studying the same thing?
Similarities in patterns of ties. There is a long tradition of grouping together individuals who share similar patterns of social ties. Early social network researchers used principal components analysis or centroid factor analysis to identify groups (factors) from interaction (e.g. Wright and Evitts, 1961) and nomination matrices (e.g., Bock and Husain, 1952; MacRae, 1960). More recently, a growing number of developmental researchers have used correlation-based algorithms to identify peer groups from multi-informant reports (Boivin & Hymel, 1997; Cairns et al., 1985, 1988; Estell, Cairns, Farmer & Cairns, 2002; Farmer et al., 2003; Rodkin, Farmer, Pearl, & Van Acker, 2000; Xie, Cairns & Cairns, 1999). One group has used principal axis factoring to identify groups from an adjacency matrix (Bagwell et al., 2000). Principal components analysis (PCA) has also been applied to co-nomination matrices (Gest, Rulison and Welsh, 2005). These approaches share the premise that groups can be conceptualized as individuals whose patterns of received friendship nominations or whose profile of co-nominations with peers are similar (i.e., correlated). These approaches have clear links to the block-modeling traditions rooted in CONCOR (White, Boorman, and Breiger, 1976), where actors are classified as similar if they have similar nomination patterns to/from others in the network. One of the potential advantages of the PCA approach, as will become evident below, is that an element of structural equivalence informs the construction of primary groups, allowing one to identify groups that are both internally dense and similarly situated in the graph at large.
Direct approaches. The social network field has identified many approaches for finding primary groups in networks (Frank, 1995; Fershtman, 1997; Burt, 1978; Freeman, 1992; Richards, 1995; Seidman and Foster, 1978). A basic division in such methods is between those that identify exact graph theory features and those that search the graph to identify a solution iteratively. Many graph-theoretic methods for finding primary groups are challenged in settings where data are messy, resulting in assignments that are not robust to the kinds of data that analysts typically encounter (see Moody, 2001a for a review). These methods also often identify groups that heavily overlap. Recent work on structural cohesion has taken this feature as a strength of the model, in that k-connected components have a strictly defined and interpretable overlap structure and are more robust to data quality as k-cohesion increases.[2]
The alternative approach has been to identify groups based on a search and clustering process, using algorithms that attempt to generate clusters with relatively high in-group density. The exact algorithms vary significantly. One line of work makes many assignments of nodes to groups in attempts to minimize a cost function (Borgatti, Everett, and Freeman, 1999; Guimera and Amaral, 2005). Much of the research on group detection algorithms has been to identify ways to seed or speed these types of searches, with some very sophisticated pattern recognition approaches being most popular (Richards, 1995; Fershtman, 1997). While often successful in small groups, these iterative solutions can be very time-consuming on large networks. Recent work has attempted to identify search processes either directly on graphs, such as extensions of simulated annealing processes (Guimera and Amaral, 2005) or on summary statistics generated by the network structure (Moody, 2001a) that allow searches of very large networks. Finally, a third line of research has taken a statistical modeling approach, using guided search algorithms based on a tie probability model (Frank, 1995). These models work on the logic that groups should focus on ties, so the probability of a tie between i and j (pij) is a function of a parameter on the group partition, and nodes are juggled across partitions until that parameter is maximized.
To our knowledge, within the literature on adolescent peer social networks, there are no empirical reports comparing the group solutions obtained when factor-analytic and graph-theoretic grouping algorithms are applied to two of the most common types of peer network data. To begin linking these different data collection and analytic traditions, we use a single data set to identify adolescent peer groups based on two types of peer network data (self-reported friendships and multi-informant interaction-based groups) with each of three group identification methods (principal components analysis and two graph-theoretic algorithms).
Data were provided by 134 (62 girls, 72 boys) of the 150 students (89%) enrolled in the 6th grade at a middle school serving a small, working-class community in central Pennsylvania. This data permitted us to describe the peer networks of 148 (68 girls, 80 boys) of the 150 students (see below). Students at the school scored near the statewide average on tests of achievement, although rates of poverty in the community exceeded the state average. Almost all students (99%) were Caucasian, reflecting the demographics of the community. This project was a component of a Safe Schools / Healthy Students grant obtained by the school district from the U. S. Departments of Education, Justice and Health and Human Services. Prior to the October student survey, parents were mailed letters describing the project with a form to sign if they did not wish their child to participate. Students whose parents did not return a form exempting them from the project were asked to complete a group-administered survey lasting approximately 45 minutes. Students were free to decline to participate in the survey.
Self-reported friendships. We construct friendship groups from students’ reports of friendships. Students were asked: “Some kids have a lot of friends, some kids have one friend and some kids don’t have a friend. What about you? List the names of any friends you have in your grade.” Students were provided a roster containing the names of all students in the 6th grade, organized by homeroom. Space was provided for students to list up to ten names, although some students listed several more than that (range: 0 to 31 nominations). These data were organized into an adjacency matrix. For the principal component analyses, we entered ones along the diagonal (MacRae, 1960).
Multi-informant groups. We construct multi-informant groups using Cairns’ Social-Cognitive Map (SCM) method. Students were asked: “Are there some kids in your grade who hang around together a lot? List the names of the kids in each of the different groups in your grade. Try to think of as many groups as you can.” Space was provided for students to list up to nine groups with up to ten individuals per group and students were free to list themselves in a group. Two observational studies confirm that the frequency of being named to the same group is correlated with observable interaction rates (Cairns et al., 1985; Gest, Farmer, Cairns & Xie, 2003). For example, 4th and 7th grade students interacted with members of their multi-informant groups at rates three to four times higher than with other same-sex peers (Gest et al., 2003). In the present study, all nominations were organized into a symmetric co-nomination matrix in which off-diagonal cells indicated the total number of times two individuals were named to the same group. Values along the diagonal indicated the total number of times a given child was named to any social group. Students were not required to classify all peers into groups, so there was variability in how often different adolescents were named to groups.
Social behavior, educational attitudes and achievement. We examine group homogeneity with respect to four measures of social behavior and educational attitudes and achievement. Following standard procedures in the developmental literature on peer relations (Coie, Dodge & Copotelli, 1982), we asked each adolescent to name the peers s/he liked the most and the peers s/he liked the least. The number of times each adolescent was named as liked most and least was tallied and standardized within gender. The difference between each adolescent’s standardized liked-most and standardized liked-least scores was computed as an index of peer social preference, and this score itself was standardized within gender (M = 0, SD = 1, Skew = .04). Aggression was measured with five items rated by teachers on a 5-point scale (a= .92; 1 = low, 5 = high). To better capture the highly skewed scores on aggression, each child was classified as non-aggressive (76.6% of sample with Mean scores <2.0 on the 5-point scale), Moderately Aggressive (14.6% with Mean scores between 2 and 3) or Highly Aggressive (8.8% with Mean scores greater than 3.0). Liking for school was measured with a single item measured on a 5-point Likert scale (“I like going to school”; M = 3.31, SD = 1.32, Skew = -.29). Grade Point Average (GPA) was calculated as the average of students’ grades in Reading, Social Studies, Math and Science during the 1st grading period (M = 3.40, SD = .66, Skew = -1.01).
We applied principal component analysis to both types of peer network data. First, we extracted all factors[3] that had eigenvalues greater than 1.0, resulting in 39 factors for the self-reported friendship data and 38 factors for the multi-informant network data. Factors with eigenvalues less than 1 were not extracted because these factors explain less variance in the solution than a single variable. Second, we applied a Varimax rotation and then determined whether each factor was defined by at least three individuals whose primary loading (>.30) was on that factor. We required three individuals per factor because the theoretical definition of a group requires at least three members and we required factor loadings above .30 to ensure that each individual shared at least 9% of their variance with the group.[4] When one or more factors did not meet these criteria, we re-ran the PCA extracting one less factor, resulting in 24-factor (group) solutions for both types of network data. This process, along with using Varimax rotation, allowed us to obtain maximum differentiation while still identifying empirically reliable and conceptually meaningful groups. Some adolescents had significant factor loadings on more than one factor that could be interpreted as reflecting membership in more than one group, but for purposes of comparing grouping solutions across methods, we assigned such “dual-members” to the group on which they had the highest loading.
We use two social-network based group detection methods to compare with the PCA routine: Moody’s (2001a) Recursive Neighborhood Means (RNM) approach and UCINET VI’s FACTIONS (FAC) routine (Borgatti et al., 1999). The RNM approach was chosen because of its theoretical link to the substantive problems of peer effects and the FAC routine because it is commonly available and thus likely to be used by others. Like the PCA routine, both of these approaches are “indirect,” in that they do not search for a particular graph theoretic pattern (like cliques), but instead use the observed network to generate a cost / similarity score that is clustered or maximized. These types of indirect routines are useful, as many of the direct graph theoretic approaches (such as searching for cliques or k-cores) are either very slow algorithmically or have substantive difficulties identifying primary groups.
Moody’s RNM routine was originally designed as an efficient means to cluster very large (>10,000 node) networks, but its theoretical foundation in peer influence models (Friedkin, 1998; Friedkin and Cook, 1990) suggests that it should be substantively useful for settings where peer influence is the central concern. The RNM routine uses a two-step procedure. In the first step, one simulates a peer influence process on k random variables. The peer influence simulation then adjusts each person’s score on each random variable to equal the (tie-strength weighted) mean of the people to whom they are connected. Because the original variables are uncorrelated, dense clusters of nodes will come to occupy unique positions in the k-dimensional space defined by the resulting distribution of random variables. In the second step, one uses cluster analysis (here we use Ward’s minimum variance method) to identify groups based on the resulting influence variables. The number of groups is determined by examining changes in fit statistics (here we used Freeman’s (1972) segregation index as our guide), such that two initially distinct groups are joined if doing so significantly improves the fit for both groups. In addition, any small or disconnected groups were examined manually to see if nodes would be better classified by placing these nodes in a “between” group position.[5]
The FAC routine searches for groups with a “clique-like” structure. A perfectly clique-like structure would have groups that are completely connected internally (everyone tied to everyone else) and no ties outside of the groups. Thus, the routine counts null dyads within groups and ties outside of groups as deviations from the ideal, and adjusts group boundaries to minimize the number of such deviations. As with many of the group detection algorithms, one must determine the number of factions initially. Initial examinations of these data showed that the RNM approach was finding fewer groups than the PCA approach, so we choose 20 groups as a number that “split the difference” between the other two approaches.
For both the RNM and the FAC routines, we treated the data as symmetric, but weighted reciprocated ties more than asymmetric ties.[6] For the multi-informant nomination data, we used the number of times each pair was nominated as being in the same group as the basis for the tie weight. Initial FAC runs suggested that the predominance of often non-concordant single naming was throwing the results, so we limited analysis to pairs with 2 or more co-nominations.
Part of the difficulty in finding primary groups in networks is defining exactly what features represent a primary group. While theoretical and algorithm advances have been made in identifying particular aspects of network structure that clarify our understanding of primary groups [such as structural cohesion (Moody and White, 2003), tie strength (Freeman, 1992), clustering and distance ( Holland and Leinhardt, 1970; Holland and Leinhardt, 1971; Watts, 1999) and the ratio of in-group to out-group ties (Fershtman, 1997; Guimera and Amaral, 2005)], there is no unified agreement on what counts as a “clique-like” subgroup. In the substantive setting of interest here, we expect primary peer groups to be small and tight-knit. In general, we also expect them to be largely distinct,[7] with relations / interaction falling disproportionately within the primary group. We use six measures to examine how “tight-knit” and distinct the group solutions are for both types of data.
Tight-knit primary groups are likely to be relatively dense and have many closed triads that hold the local group together. In general, network density is the average value of relations taken over all possible dyads. We measure relative density as the density of ties falling within group divided by the density of ties that fall outside of groups. To account for group structure as well as volume (Freeman, 1992), we use two triad-based measures. Closed triads capture cases where friends of friends are friends (transitive relations), and we expect substantively that primary friendship groups will be characterized by relatively high numbers of closed triads. The transitivity ratio is defined as the proportion of all potentially closed triads that are actually closed. It is calculated as the proportion of all two-step paths (iàj, jàk) that are also direct paths (iàk). We define the relative transitivity ratio as the transitivity ratio calculated only among within-group dyads over the transitivity ratio of the entire network. Ideally, groups should enclose closed triads, thus any case of a group boundary separating a closed triad is a deviation from the ideal-type model. We thus measure the proportion of all closed triads (T300) that fall entirely within group to capture how often group solutions encapsulate closed triads.[8]
The distinctiveness of a group is measured by how often relations fall within rather than between groups. We use Freeman’s segregation index (1972), the proportion of all ties that fall outside of groups, and the modularity index (Newman and Girvan, 2004) as three measures of group distinction. Freeman reasoned that if a group partition was irrelevant, then relations should be distributed randomly across the group boundaries. Freeman’s network segregation index is thus calculated as the difference between the number of observed cross-group ties and the number of randomly expected cross-group ties, divided by the number of randomly expected cross-group ties. When the value is 1.0, all relations fall within separate groups. When the value is 0, then relations are distributed randomly across groups. The modularity statistic (Newman and Girvan, 2004) follows a similar logic and will be 0 if ties are distributed randomly. The advantage of the modularity score is that the measure reaches a clear maximum value when ties are more likely to fall within groups, making it ideal for comparing group distinctiveness across solutions. Finally, the proportion of ties that fall outside of groups provides a readily interpretable (though not gauged against random chance) metric for the sheer volume of cross-group ties.
Group size enters into our consideration both substantively and methodologically. Substantively, children’s primary groups tend to be small (Rubin, Bukowski and Parker, 1998) and thus any solution that generates very large groups lacks a certain level of face validity. However, we also expect a group to have a certain extra-individual character that extends beyond any single individual member (Simmel, 1950; Moody and White, 2003). The smallest collection that can exist independent of any single actor is the triad, and thus groups are typically defined as having 3 or more nodes. Methodologically, the distribution of group sizes affects all of the other metrics used to define groups. On the one hand, if all nodes were partitioned into a single group, then there would be no out-of-group ties and all triads would fall within the group (there would, of course, be no data reduction here either!). On the other hand, if everyone were assigned to a single closed triad, then in-group density would be perfect, leading to very high relative densities and relative transitivity ratios.[9] Finally, we use two versions of the Rand statistic (Rand, 1971) to compare the different primary group solutions statistically and “Shadow plots” (Batagelj and Mrvar, 2001) to evaluate overlap qualitatively. Measures for comparing nominal distributions, such as Kappa, are ineffectual if the number of groups differs across solutions. The Rand statistic, in contrast, allows us to compare clustering solutions with any number of clusters. Substantively, the Rand statistic measures the proportion of pairs classified similarly in two solutions. A pair is similarly classified if they were put in the same cluster in both solutions, or classified as being in different clusters in both solutions. A value of 1.0 means that the two partitions are substantively identical. While intuitive, the raw Rand statistic does not distinguish observed matching from matching expected by chance. Since each pair has to be placed in some cluster, a proportion of pairs will be similarly classified due simply to chance alone. We use the adjusted Rand statistic proposed by Morey and Agresti (1984) to correct for chance overlap. This measure captures the percent difference from chance in the likelihood that a pair of actors is similarly classified. Shadow plots are schematic images of an adjacency matrix, with the rows and columns sorted to help see the structure in the graph. Cell values are shaded (“shadowed”) proportional to the strength of the ij cell, and allow us to visually compare the overlap of the group solutions to the underlying tie-distribution.Reports of friends and groups. On average, adolescents listed 9.72 friends (SD = 3.95) and identified 3.76 groups (SD = 2.01) with 4.60 (SD = 2.10) individuals per group (i.e., a total of 17.26 group members). More than half of all self-reported friends (59.4%) and more than half of all peers nominated to social groups (51.1%) were outside of the adolescent’s own homeroom, confirming that the social network is appropriately considered at the level of the entire grade.
Graph statistics. The multi-informant group data demonstrated a high density of ties (.486) and transitivity (.571). This suggests ample “clusteredness” to be exploited by each grouping method, which is to be expected, since the data generates ties between all pairs named as members of the same group. In contrast, the self-reported friendship data had lower density (.121) and transitivity (.302) scores. This will make finding groups consistently more challenging than with the multi-informant data, because there will be less clustering for the algorithms to exploit.
Sociograms. Next we constructed sociograms illustrating each type of network data. In each sociogram, position in the xy plane is determined by a force-directed automatic layout algorithm implemented in PAJEK (Batagelj and Mrvar, 2001). For these layouts, social ties are analogous to springs, with stronger values indicating a stronger pull between the nodes. As such, two nodes that are connected will tend to be close to each other, while nodes that are disconnected will be further apart. In an ideal-typical sense, if the network were composed of very distinct groups (and the nominations reflected these groups), then the figure would contain distinct “clumps” for each group.
Figure 1. Self-reported Friendship Nominations
Thick Blue lines are reciprocated friendship nominations, thin gray are asymmetric nominations. Letters identify particular nodes to compare with Figure 2.
In Figure 1, each node represents a student and each line represents a friendship nomination. For the present analyses, asymmetric ties (thinner lines) count less than symmetric ties (thicker lines). This figure shows that friendship nominations among 6th graders are heavily conditioned by sex. Beyond this strong sex segregation, the network does not suggest many small groups, particularly among the males. Instead, both the male and female sides of the network have a “core-periphery” structure, with a small number of individuals who have no reciprocated ties, and a large cluster of individuals who are strongly connected. Females in the network are slightly more differentiated, with what appears to be two or three overlapping “clumps” stretching along the “north-south” axis. In addition, there are two small groups in the “south-east” portion of the figure with no reciprocated connections to the rest of the network, but one link between them. These were students who spent part of their day in a Special Education classroom: despite being “mainstreamed” in General Education classrooms for much of the school day, these children’s friendships were largely separate from the rest of the grade.

In Figure 2, each line indicates the number of times that two students (nodes) were named as being members of the same group. The thickness and shade of the line corresponds to the frequency of co-nomination to the same group. Although the number of co-nominations linking individual nodes ranged from 1 to 33, for clarity these values were grouped into six ranges. There are three immediate impressions given by this figure. First, there are clear clusters with very strong agreement (thick lines), particularly among the females, indicating substantial consensus among students regarding the interaction patterns of their peers. Second, there are large individual differences in the maximum number of co-nominations linking a given student to other students: the maximum number of co-nominations is centered around 10 (Mean = 11.1, Median = 9.0), but 39 (27.1%) students were never named more than 5 times with any peer, while 18 (12.5%) students reached over 20 co-nominations with at least one of their peers. Third, the wide linkages at low levels (the very thin lines connecting a wide body of nodes across the graph) suggests that some people provide idiosyncratic reports of groups that are at odds with the group consensus.
In general, the two sociograms correspond quite closely in terms of the overall shape, the separation of males and females, and the location of individual nodes (fourteen of which are labeled, A through N, on each graph).[10] The two groups of special education students in the southern portion of the graph (including nodes G, J and K), for example, contain nearly identical members. In addition, 3 of the 4 male nodes in the “female” section of the friendship graph (including nodes E and I) are similarly more closely associated with the female side of the multi-informant graph. In both graphs, boys L and M occupy similar positions in clusters outside the main clump of boys, while girls H-F and B-C are located in parallel positions within the relatively core groups of girls. Boys N and D reside on the periphery of both graphs, whereas girl A is peripheral in the friendship graph but closer to the core of the multi-informant graph.
Correspondence between friendship and multi-informant nominations. We tested the degree to which the number of times two people were nominated as hanging out together predicted friendship nominations. We model the likelihood of a friendship nomination, controlling for network and group involvement measures including: number of friends named, number of friendship nominations received, number of group nominations received and the sex composition of the dyad. Because the dependent variable is dichotomous (nominated or not), we use a logistic regression model. The results (Table 1) clearly show that the number of times a dyad is nominated to the same group strongly predicts a friendship nomination.
Table 1. Logistic Regression of Friendship Nomination on Multi-informant Co-nominations (odds ratio in parentheses)
|
Variable |
Model 1 |
Model 2 |
Model 3 |
Model 4 |
|
Intercept |
-4.87 |
-6.31 |
-6.54 |
-6.28 |
|
# of friends named by ego (ODG) |
0.138 (1.15) |
0.145 (1.16) |
0.144 (1.16) |
0.145 (1.16) |
|
# of times alter was named as a friend (IDG) |
0.146 (1.16) |
0.144 (1.16) |
0.144 (1.16) |
0.153 (1.17) |
|
# of times ego named as a group member (ego visibility) |
-0.015 (0.989) |
-0.011 (0.989) |
-0.01 (0.989) |
-0.01 (0.989) |
|
# of times alter named as a group member (alter visibility) |
-0.008 (0.992) |
-0.008 (0.992) |
-0.007 (0.993) |
-.009 (0.991) |
|
Same sex dyad |
|
1.84 (6.355) |
2.05 (7.76) |
|
|
Both Male |
|
|
|
1.74 (5.71) |
|
Both Female |
|
|
|
2.04 (7.69) |
| Number of Co-Nominations |
0.603 (1.83) |
0.508 (1.66) |
1.09 (2.99) |
0.506 (1.66) |
|
Group x Same Sex |
|
|
-0.610 (0.544) |
|
|
|
|
|
|
|
|
Pseudo R2 |
0.37 |
0.423 |
0.428 |
0.424 |
Note. All variables are statistically significant at the .0001 level.
After controlling for sex composition of the dyad, the odds of a friendship nomination increase by 1.66 for each time the pair is said to belong to the same group. This effect differs by gender composition. Specifically, a co-nomination is more likely to predict a friendship when the dyad is cross-sex, though the relative rarity of these nominations makes this finding somewhat less important. As expected, the controls for network expansiveness (ODG) and attractiveness (IDG) are also important. While statistically significant, simple visibility of either party really does not matter that much (the odds ratios are close to 1.0). To simplify the interpretation of this coefficient, Figure 3 plots the predicted probability of a friendship nomination for a same-sex dyad, by the number of times they are nominated as being in the same group (estimates based on model 2).[11] This figure indicates that (in these data) the likelihood of a self-reported friendship reaches 50% as the number of multi-informant interaction co-nominations reaches around 7, and exceeds 95% when the number of co-nominations reaches around 13.

|
|
|
Self Nominations |
Multi-Informant Nominations |
||||
|
|
|
RNM |
FAC |
PCA |
RNM |
FAC |
PCA |
|
Self Nom |
RNM |
--- |
0.328 |
0.362 |
0.417 |
0.392 |
0.316 |
|
FAC |
0.861 |
--- |
0.475 |
0.464 |
0.498 |
0.465 |
|
|
PCA |
0.873 |
0.949 |
--- |
0.584 |
0.572 |
0.465 |
|
|
Mult Nom |
RNM |
0.881 |
0.942 |
0.958 |
--- |
0.687 |
0.695 |
|
FAC |
0.872 |
0.947 |
0.957 |
0.965 |
--- |
0.665 |
|
|
PCA |
0.862 |
0.948 |
0.948 |
0.969 |
0.966 |
--- |
|
Note. Values above the diagonal are the chance-adjusted Rand statistic (Morey and Agresti, 1984). Values below the diagonal are the simple Rand statistic, unadjusted for chance. The interpretation of the Rand statistic is the probability that a randomly chosen pair will be similarly classified by the two partitions. The interpretation for the adjusted Rand is the percent difference between the number of observed agreements and the number of chance agreements.
We next examined the similarities between grouping algorithms for both types of data. Table 2 contains the Rand matching coefficients describing the comparability of the node partitions across the six different combinations of network data and group identification algorithm. The positive coefficients across all comparisons indicate that the partitions are significantly correlated, but the differences that occur are systematic. Overall agreement is higher across the three multi-informant solutions (mean Rand=0.97, Adjusted Rand=0.68) than across the three self-reported friendship solutions (mean R=0.89; AR=0.39). This main effect of network data is quite large, and due largely to the clear clustering evident in the multi-informant matrix. Effectively, subgroups in the multi-informant network are much easier targets to hit than in the less clustered friendship network, so that differences in grouping algorithms are less likely to lead to divergent grouping solutions.
Within the friendship network, the RNM solution is less similar to the other two (AR: RNM,FAC =0.33; RNM,PCA =0.36) than they are to each other (FAC,PCA =0.48). This suggests that the three methods differ in their basic strategies which, as we will see below, trade off larger, distinctive groupings (RNM) and smaller groupings with greater in-group density (FAC, PCA). Overall, the groups derived from the self-reported friendship data with a particular algorithm were as similar to groups derived from the multi-informant data (upper right quadrant of Table 2; median AR=.47) as they were to each other (AR range .33 to .48).
From the perspective of comparing results across different combinations of data collection techniques and analytic strategies, these results send a mixed signal. On the one hand, the significant chance-adjusted agreement across all six solutions indicates that investigators using a wide range of methods (measures*algorithms) are indeed describing similar phenomena. On the other hand, compared to self-reported friendship data, the more clustered multi-informant data produce much more consistent groupings across several analytic methods. We next turn to the details of the kinds of groups identified by each approach.
|
Type of Network Data |
Group Identification Procedure |
|||
|
Recursive Neighborhood Means |
FAC |
Principal Components Analysis |
Gender x Homeroom |
|
|
Self-Reported Friendships Density = .12 Transitivity Ratio = .57 |
|
|
|
|
|
# of Groups |
10 + 14 between |
20* |
24 + 3 unclassified |
|
|
Size: M (SD) |
13.3 (11.6) |
7.3 (1.9) |
6.04 (2.22) |
|
|
Min - Max |
5 - 43 |
5 - 12 |
3 – 12 |
|
|
Groups of size = 3 |
0 |
0 |
3 |
|
|
Relative Density |
9.76 |
11.43 |
13.61 |
7.84 |
|
Relative Transitivity |
1.58 |
2.56 |
2.60 |
2.14 |
|
Prop. closed triads in same group |
0.60 |
0.27 |
0.24 |
0.21 |
|
Freeman Segregation |
0.57 |
0.32 |
0.33 |
0.311 |
|
Modularity |
0.46 |
0.30 |
0.31 |
0.41 |
|
Proportion of ties out-of-group |
0.346 |
0.641 |
0.636 |
0.636 |
|
|
|
|
|
|
|
Multi-informant Groups Density .49 Transitivity Ratio = .57 |
|
|
|
|
|
# of Groups |
20+9 between |
20* |
24 + 1 unclassified |
|
|
Size: M (SD) |
6.95 (3.03) |
7.4 (2.6) |
6.13 (2.44) |
|
|
Min - Max |
4 – 14 |
5 – 13 |
3 – 11 |
|
|
Groups of size = 3 |
0 |
0 |
4 |
|
|
Relative Density |
27.22 |
30.83 |
29.72 |
10.87 |
|
Relative Transitivity |
1.54 |
1.66 |
1.67 |
1.54 |
|
Prop. closed triads in same group |
0.41 |
0.52 |
0.31 |
0.21 |
|
Freeman Segregation |
0.56 |
0.58 |
0.53 |
0.394 |
|
Modularity |
0.52 |
0.53 |
0.50 |
0.36 |
|
Proportion of ties out-of-group |
0.399 |
0.383 |
0.439 |
0.556 |
* Number of groups is definitional
Table 4. Within-Group Behavioral Homogeneity
|
|
|
|
Type of Network Data |
|
|
|
|
Group structure Index |
Self-Reported Friendships |
Multi-Informant Groups |
|
Group Identification Procedure |
Recursive Neighborhood Means |
Like Going to School |
.080 / .066 a |
.228 |
|
Peer Social Preference |
.142 / .143 |
.281** |
||
|
Teacher-rated Aggression |
.048 / .088 |
.204 |
||
|
Grade Point Average |
.080 / .226* |
.224 |
||
|
FAC |
Like Going to School |
.126 |
.220 |
|
|
Peer Social Preference |
.197 |
.342*** |
||
|
Teacher-rated Aggression |
.171 |
.229* |
||
|
Grade Point Average |
.247+ |
.246* |
||
|
Principal Components Analysis |
Like Going to School |
.235 |
.276* |
|
|
Peer Social Preference |
.366*** |
.317** |
||
|
Teacher-rated Aggression |
.403*** |
.439*** |
||
|
Grade Point Average |
.332** |
.289* |
||
* p < .05. ** p < .01. *** p < .001.
Note. Effects of group membership (Partial eta-squared) after controlling for gender. For the RCN self-reported friendship solution, values after the slash are the partial eta-squared values after removing group 1.
Table 3 contains the group structure statistics for the six clustering solutions and, for comparison, the statistics for a simple attribute clustering based on sex and homeroom. Table 4 contains estimates of group behavioral homogeneity. Below we briefly summarize results for the multi-informant networks before examining reasons for the variability in solutions for the friendship data.
Multi-informant Group Structures. The three methods produced similarly sized groups from the multi-informant data. The average group sizes were very similar for RNM and FAC (6.95 vs. 7.40) with a similar distribution of group size (range 4 to 14 for RNM; 5 to 13 for FAC), and PCA groups were only somewhat smaller (M = 6.13, range 3 to 11). Group tight-knittedness was very similar across solutions, with the relative density of in-group to out-group ties roughly three times higher for the RNM, FAC and PCA solutions (27.22, 30.83, 29.72) than for a partition reflecting the split by gender and homeroom (10.87). Group differentiation was also similar across solutions.
The estimates of group behavioral homogeneity were generally reliable and moderate in magnitude for each solution. Given the similarity in the partitions, it was surprising that homogeneity was consistently higher for the PCA groups than for the RNM groups, with homogeneity of FAC groups at intermediate levels. The substantially higher levels of similarity in aggressive behavior for the PCA groups was due largely to a subset of 13 boys, 5 of whom were highly aggressive (representing nearly half of the highly aggressive students in the entire grade). RNM and FAC placed all 13 individuals in the same group, whereas PCA separated them into a group of 8 that contained all 5 highly aggressive individuals and a group of 5 non-aggressive boys. The most likely explanation for the modest differences in similarity for the other behaviors is that PCA produced slightly smaller groupings, but we examine this issue more fully in the context of the friendship network solutions, which differed more substantially in additional ways. At this stage, the most noteworthy feature of multi-informant data was that three distinct group identification procedures produced solutions that were highly comparable in terms of structural characteristics, the placement of individuals into groups, and estimates of behavioral homogeneity.
Size. As with the multi-informant network data, the RNM and FACTION solutions tended to produce larger friendship groups than the PCA solution. FAC and PCA produced group sizes that were very similar to those obtained when the comparable method was applied to the multi-informant data (FAC: M = 7.3 for friendship data, M = 7.4 for multi-informant data; PCA: M = 6.0 and M = 6.1, respectively). The RNM friendship solution generated one male group of 43 nodes. This large cluster dominates the RNM solution, and appears to drive a number of the differences reported in the group structure below. When this group is excluded from consideration, the RNM solution for friendship data produces groups only a bit larger than those for multi-informant data (M = 8.5 vs. M = 6.95).
Internal density. The PCA solution produced groups with the highest density and internal transitivity, with the FAC groups being quite similar. More