Language Attitudes of Twitter Users Toward New York City English LaFave Nathan New York University, United States of America nathan.lafave@nyu.edu 2016-03-07T04:54:00Z Maciej Eder, Pedagogical University in Krakow Jan Rybicki, Jagiellonian University
Institute of Polish Studies Pedagogical University ul. Podchorazych 2 30-084 Krakow, Poland maciej.eder@ijp-pan.krakow.pl

Converted from a Word document

Paper Short Paper language attitudes New York City English Twitter text mining text analysis corpora and corpus activities text analysis linguistics social media data mining / text mining English

New York City English (NYCE) has long been a stigmatized variety of English. In his seminal research on language use in the New York City dialect, the sociolinguist William Labov referred to New York City as “a great sink of negative prestige” (Labov, 1966)—a characterization that reflected the negative view of NYCE speech shared by non-New Yorkers and New Yorkers alike. Decades later, Preston (2003) elicited extremely low ratings of the New York City dialect on scales of both “correctness” and “pleasantness” by participants from across the US. While these studies present strong evidence of the prevalence of negative language attitudes toward NYCE speech, a more complete picture of linguistic ideology would include what speakers say about NYCE when they are not participating in an academic study. This project seeks to accomplish just that, by examining linguistic ideology with respect to NYCE as espoused by users of the social networking service, Twitter.

Twitter has been recognized as an important resource for humanists and social scientists alike. Scholars have collected and analyzed Twitter messages (tweets) in order to investigate numerous textual and linguistic phenomena such as lexical variation (differences in use of synonymous words and phrases, such as pop vs. soda vs. coke). Russ (2012) in particular (see also Bamman, 2011) illustrates the utility of Twitter for examining regionally defined lexical variation through comparison of the geographic distribution of word choices in geotagged tweets (with GPS coordinates from which they originated) to more traditionally collected dialectology data. All related research has focused on differences in production. However, I argue that Twitter represents an untapped resource for the investigation of perceptions of language use, particularly language attitudes toward regional dialects and differences in their phonetic features (which can be identified by non-standard orthography). Using Twitter solves a primary quandary for language attitude researchers—how to acquire naturally occurring data given the fact that participation in research decreases naturalness.

Tweets containing attitudes and ideology were collected using a range of strategies, including text mining for words—and, crucially, spellings—that reference individual features. To do this, however, it is necessary to determine which features get noted and then which lexical items—and which spellings—are used to signal them. For instance, cawfee (also, cawffee) is a common orthographic representation of the word “coffee” as pronounced with a raised-THOUGHT vowel, one of the signature dialect features of NYCE. Widely used spellings that reflect r-vocalization, another key feature of the NYCE dialect, include New Yawk and fuhgeddaboudit. In addition to collecting tweets containing orthographic representations of nonstandard features, Twitter search parameters included over 20 terms related to possible names for the dialect itself (e.g., New York accent, Manhattan dialect, Brooklynese). These were included in part to determine the extent to which the general public perceives a distinction among speakers from the five boroughs (a distinction which has not been borne out by linguistic analysis).

Repeated automated text mining of Twitter using a Python script to interact with the Twitter API yielded 6,384 tweets that match the aforementioned criteria. Elimination of retweets that did not introduce additional linguistic content and inspection to ensure the tweets reference NYCE produced a final corpus of 1,773 tweets. Relative frequencies of the borough-specific and pan-regional terms in the 1,315 tweets that explicitly reference NYCE by some name reveal that Twitter users most frequently refer to NYCE as the New York accent (N=805; 61.2%), though Brooklyn accent (N=359; 27.4%) accounts for more than a quarter, with Bronx accent (N=54), Queens accent (N=29, tied with Brooklynese—the most frequent -ese moniker), and Staten Island accent (N=10) being used much less often. Whether New York accents and Brooklyn accents are perceived as linguistically or socially distinct, or two names for the same dialect region, will be explored in the paper.

All tweets were manually coded to determine their sentiment with respect to NYCE.

POSITIVE: I swear girls from New York accent sound so sexy NEUTRAL: GAWGEOUS idea she said in her New Yawk accent NEGATIVE: If you have a Brooklyn accent I automatically want to punch you.

Almost half of these tweets are neutral in sentiment (N=584, 44.4%); 378 were positive (28.7%) and 200 negative (15.2%). However, 154 tweets were classified as UNCLEAR (8.7%)—many are ambiguous as to whether they evaluate an imitation of an accent or the accent itself, such as when describing an actor’s performance (which is common among these types of tweets):

UNCLEAR: his New York accent is so bad /:

Examples such as these pose significant obstacles to automated sentiment analysis—which has been extended to Twitter data (see for instance Pak and Paroubek, 2010)—particularly of language attitudes. Automatic methods would simply code the tweet as negative without recognizing the need to differentiate its underlying meaning. It is noteworthy, however, that even if every UNCLEAR tweet is actually expressing negative sentiment, there would still be a greater number of tweets with positive opinions of NYCE speech than negative ones. Furthermore, when Twitter users reference a specific NYCE feature, their evaluation of it is more likely to be positive, regardless of whether they use standard (N=79) or nonstandard (N=568) orthography to represent the feature.

These findings portray a broader range of reactions to NYCE than the language attitudes speakers have presented when engaged in academic research. The paper will include discussion of both negative and positive language attitudes that Twitter users espouse concerning the dialect features associated with NYCE. For instance, any tweets with positive sentiment will be examined to determine if they represent instances of “covert prestige” (Labov 1966), whereby speakers use stigmatized varieties for in-group identification and solidarity. Additional discussion will focus on which regional features evoke the most meta-commentary. Furthermore, I will explore the extent to which Twitter users draw (additional) attention to non-standard forms they employ through capitalization ( NEW YAWK), hashtags ( #newyawk), and other orthographic means.

Bibliography Bamman, D. (2011). Lexicalist. http://www.lexicalist.com/ (accessed 30 August 2015). Labov, W. (1966). The Social Stratification of English in New York City. Washington, D. C.: Center for Applied Linguistics. Pak, A. and Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of Language Resource and Evaluation Conference (LREC), Valletta, Malta. Preston, D. (2003). Language with an attitude. In J. K. Chambers, Peter Trudgill and Natalie Schilling-Estes (eds), The Handbook of Language Variation and Change. Oxford: Wiley- Blackwell, pp. 40-66. Russ, B. (2012). Examining large-scale regional variation through online geotagged corpora. Presented at the 2012 American Dialect Society Annual Meeting.