In closed-off, inaccessible academic circles, linguist brohs openly hate on computer brohs for creating models of language that are based on “probability” and that don’t take into account the “underlying structure of language”. The hardcore nativist bros believe that every language has the same underlying structure and that corpus-based language software is useless.
Researchers that use “machine learning algorithms” (a fancy way to say ‘teaching a computer to predict future behavior based on previous behavior’, statistically) to understand something in the world without giving a shit about the ‘meaning of that behavior’ are hated on by the people that study that meaning 4 a living. When a person comes along and tries to model the behavior of something they do not have a deep understanding of, the community that studies the behavior will get upset. It is a good and maybe healthy reaction.
The models generated by the math and computer bros have had success in terms of producing viable irl products. To troll the linguist people, computer guys are now getting kinda good at looking at the “underlying structure” of language, mainly, the syntax of sentences and coupling it with their statistical models to understand language better.
Take for example this simple sentence structure in tree-format:
Instead of deciding statistically that “happy” follows before a word like “linguists”, a model relying on sentence structure allows for “dynamic”-y modelling of language. A *very* simple model, with the only structure as the one above, can still produce some cool things if used correctly. Rules:
S –> NpVp
NP –> AdjN
VP -> VN
So, using this very simple rule, I created a script to generate short rhyming rap bursts following this sentence structure ((combining rhyme algorithm stuffs w/ language modelling stufs)). Here are some examples I generated from the rap corpus + publicly available n-gram models:
classic mouth pull the plow //
fat red round buy the cow //
late night crowd make the rounds //
open routes find the lounge //
good stream down beat the count //
sparkling crowd buy the cow //
heavy clouds produce a growl //
common grounds give a cow //
vocal booth cut the loops //
next man do divide the schools //
certain lieu cut the jews //
expensive route pull the school //
major news combine the attributes //
several pews preserve the rule //
own lust to get some value //
long arc through give the proofs //
bad tone to send a group //
your own tissue ensure the use //
so many crew bring the groups //
private pools see no two //
All of these follow the general rule S->NPVP (NP ->AdjN, VP -> VN). And they rhyme! Some are nonsensical and this is a far way off from anything that would be *really* interesting but it’s a world I want to learn about and do cool shit in.
If you want to fuck around with the thing, visit this link.
So, as a part of a new project I’m working on it’s been important to go through lines in rap that have lots of rhymes and don’t use big words. Here are some random bars from random songs that fit this criteria that sound nice.
1. Benzino – Bang Ta Dis
bluntsbitchesclipsguns bars brickswhipsfunds
———————————-
Syllables Per Word: 1.125
Rhyme Density: 0.78
———————————
2. JayZ – Parkin Lot Pimpin
bigthangsthickchains aintshitchangedgetbrainin the four dotsixrange
———————————-
Syllables Per Word: 1
Rhyme Score: 0.67
———————————-
Chance The Rapper, a young artist out of Chicago, has released a new mixtape that’s pretty good and “soulful”. It’s fun to listen to and there are a few really high points that make the listener feel good, I think. He is a good rapper, in technical terms. I think he is supposed to be “psychedelic” but listenable. Inoffensive/un-weird enough to listen to around a group of people with mainstream-y tastes but cool enough (for now, at least), to be proud of liking him.
The intro song Good Ass Intro has about 12 instances of Chance pronouncing the rhotic coda consonant {looser, another, brother, mother, glitter, clutter, colour, baller, butter, stainer, studded, mater) . In other words, pronouncing the ‘r’ in words that end with ‘r’. Not pronouncing the coda (last consonant) ‘r’ is something common to Non-Rhotic accents. African American Vernacular English (AAVE) is a non-rhotic accent.
Like other non-rhotic varieties, the rhotic consonant /r/ is usually dropped when not followed by a vowel;
Non-Rhotic accents are generally stigmatized and speakers ’embarrassed’ of it will converge to Standard English via pronouncing the ‘r’. Unlike G-Dropping rates, which has lots of good literature, I have no what is a common ‘R-Drop’ rate for AAVE is or what it’s relation to socioeconomic conditions are. There are some unsubstantiated theories about non-rhotic consonants re: class warfare.
Another event that may have influenced Southern dialectical patterns, particularly desegregation, which was accompanied by turmoil in the South from the 1950s through the 1970s. The civil rights struggle seems to have caused both African Americans and southern Whites to stigmatize linguistic variables associated with the other group.
Later on in the mixtape, Chance mentions that he hates Fox News and that Matt Lauer isn’t properly reporting the problems in Chicago. Which is cool. And mirrors the sensibilities of the young rap fan with opinions on politics probably. But to me, even though they are technically sloppy, the other kids out of Chicago do a way better of job of capturing the desperation and chaos Matt Lauer isn’t reporting. Chance’s use of the rhotic consonant probably says a lot about his relationship to the stuff happening in Chicago, I think. It almost doesn’t matter that he is from there, at least musically. Seems like he is pretty amorphous geographically.
Chief Keef is a famous rapper out of Chicago that started off making paranoia-driven gangster raps and quickly transitioned to molly-induced happy party songs after he got paid. Keef’s early stuff – BANG & Back From the Dead – is especially great as a look into the paranoia that drives divisions between these kids. These differences being partially geographical and partially ‘merit-based’ {If you ain’t poppin pistols, I ain’t rocking wit ya}.
After a lengthy Gawker profile and two excellent tapes, Keef became thinkpiece fodder. The central thesis by outraged black people with rap/culture opinions was that “white people shouldn’t/aren’t equipped to discuss violent rap”.
“Brian “B.Dot” Miller, who is black, and an editor at Rap Radar, took Sargent to task directly, tweeting at him to “please stop writing about MY culture,” bemoaning “cultural tourists writing about the music of MY culture” and “outsiders like yourself in hipster media that get a hard-on by overanalyzing black music.”
Whatever. In the widely circulated New Republic article, a (black) blogger raised similar concerns re: white hipsters writing about Chief Keef:
“Motherfuckers see us as ONE fucking unit and THAT is what we want ‘white bloggers’ to understand. Someone sees Waka and then kills Treyvon … Y’all don’t know that fuckin’ struggle of being judged based on someone else’s actions and you NEVER will … You will never understand. Never feel the pain, shame, guilt … You get to be just you. But in America no matter how hard I try someone is ALWAYS judgin based on my skin and when the Chief Keefs appear, people are thinking OMG look at what years of oppression and demoralization have done to a group. They think: niggers.”
The guy who killed Treyvon[sic] probably hated black people way before Flockaveli dropped. But, whatever. Both of these people are implicitly assuming that (white) bloggers *made* Chief Keef. That the sustained interest in Keef’s work was due to white people that write on the internet. Without the interest of these *bloggers*, Keef would not matter. The fear being, I assume, that the *power* held by these White Bloggers could create an ecosystem of similar rappers “without artistic merit” (representing the worst of the worst of black culture) being given a platform. The truth is, Keef built his fanbase in a wholly organic way by getting listeners similar to him and that white bloggers DO NOT matter in terms of creating a “street rapper” like Chief Keef.
I believe that people who comment on Youtube videos, good or bad, provide a solid, measurable way to understand content. Social media data mining may or may not be a bullshit thing to study but I think if the results (especially ones on either extremes) make sense and pass the ‘eye test’, it’s probably something worth exploring further. One thing that I think makes sense to look at, especially for videos from ‘street’ rappers, is to see if the people commenting type in some sort of unique, measurable way.
Ebonics is a rule-governed language that can sometimes be studied on paper. For example, something like G-Dropping can be looked at on text. Another rule of ‘Ebonics’ has to do with word-initial fricatives. This is a fancy way to say that words like {This} get pronounced {dis} in spoken language. And sometimes, because this is an example where the spelling of the word goes with the general rules of sound and stuff, people will actually write {dis}.
Writing a little script to extract comments from a YouTube video, we can find how often users use words like [da, dis, dat] instead of [the, this, that]. It ends up working really well to distinguish artists, stylistically. The chart below shows rappers that have a high ‘da’ score (street rappers, generally), medium ‘da’ scores (mixed fan bases) and low ‘da’ scores (generally backpackers or ‘barely rap’ bros):
High /da/
Medium /da/
Low /da/
Soulja Slim
GZA
Ras Kass
Geurilla Maab
Esham
Atmosphere
Beanie Sigel
Missy Elliot
Beastie Boys
Eightball & MJG
RZA
Brother ALi
Pastor Troy
Ghostface
Lupe Fiasco
Stylistically, the High /da/ guys seem to be similar and might be classified under the umbrella term ‘street rapper’, although maybe unfairly. It turns out that as a quick separator of styles this metric works really well. It relies on a phonological rule of AAVE. It probably is ‘wrong’ to call the [th] -> [d] phenomenon a “spelling mistake”. This is a systematic rule of AAVE (just like any other phonological rule in any language) that in this instance finds its way data mine-able through text.
If the ‘hipster media theory’ holds up and Keef’s fan base was cultivated through white people blogosphere link sharing, his initial work should NOT have a High /da/ score. However, this is not the case. The BANG mixtape has consistently High /da/ scores which indicates that it was probably kids similar to Keef that listened to him first. That while the people writing about him online now may be mostly white nerds, the people that fell in love with him initially were black kids like Chief Keef.
The High /da/ guys have a median score of ~0.15 and Low /da/ guys have a median score of ~0.01. Keef’s BANG mixtape has a weighted average /da/ score of ~0.18. This allows us to classify him as a ‘street rapper’. His song Setz Up has a /da/ score of 0.13. Looking through the responses to this song, we find this particular comment below which has an instance of /da/ AND /dat/. It also specifically explains in detail one of the gang references in the song:
A song riddled with gang stuffs appealing to kids that are hyper-aware of these references. The ‘hipster media’ didn’t make these kids care about Keef or understand these references. Most likely, Keef’s music initially represented a reality to the kids in his city.
I think it is important to relate these High /da/ scores to actual lyrical content from songs. We see that the fans response for High /da/ rappers seems to follow a general trend. High /da/ score rappers are all generally ‘street rappers’. We need to find a way to link the lyrical content from these songs to the particular responses. Ideally, we should find that High /da/ scores in YouTube commentary is correlated to some sort of particular word-usage in songs.
There have to be certain trends in word usage that can be measured? For example, I’m sure the word {‘nigger‘} is almost exclusively limited to songs by black guys. Not sure if there are any exclusive ‘white’ words, since white artists probably don’t own any kind of similar exclusivity to lexical items.
No reason we can’t look at this scientifically. All you really have to do is get good enough datasets for ‘white’ raps and ‘black’ raps. Mathematically, of course, {Black} ∩ {White} = ∅ ⇔ One-drop Rule. So, once we have these two datasets we can run some cool machine learning algorithms to train a computer to identify specific ‘white’ and ‘black’ characteristics.
We know from the earlier chart that Pastor Troy is a High /da/ score guy and that Atmosphere is a Low /da/ score guy. Ideally, using the text classification tool, Pastor Troy should score as more ‘black’ and less ‘white’ than Atmosphere. It turns out he does. Considerably.
With average scores:
Artist
Black
White
Atmosphere
7.62
27.45
Pastor Troy
47.56
1.81
The data supports our intuition with regards to Pastor Troy. It seems that Pastor Troy, a High /da/ guy, also has High ‘Black’ scores and Low ‘White’ scores. Does this data extend to other ‘street’ rappers? If we use an arbitrary cutoff of 0.05 (about 25% of the songs we mined in a 1500+ song dataset) we see that High /da/ scores generally correlate to Low ‘White’ Scores. That is, how the fans are talking about an artist is directly correlated to the actual lyrical content. A pretty sweet discovery.
We see that Low ‘White’ scores (0-15) correlate with High /da/ scores (>0.05). That is, there is a 92% chance that a song with a /da/ score greater than 0.05 will have a White Score less than 15. Pretty great evidence that the way fan bases discuss a street artist is a predictor of the kind of lyrical content an artist has. Without even listening to a song, we can know what kind of song we are dealing with just by how the fans are interacting with the work.
The blogosphere simply cannot ‘break’ a street artist. Any shit-talk to the contrary is without merit.
The event is scheduled to take place in Gasson Hall Rm. 305 at Boston College on March 14, 2013. It will begin at 6:30 PM. The event is hosted by the Boston College Economics Association. If you are in Boston or around Boston please come by and listen to me talk about RapMetrics. I will talk about past projects and potential new ones.
This is exciting for me and these kinds of things let me know that this project isn’t useless and that it is worth doing.
Sometimes Eminem’s verse in Renegade is used by racists to prove that rap is “complex” and that it should be looked at as “real art”. Probably stems from the fact that these ppl started off by listening to things that were not rap and they didn’t have the ability to contextualize rap music critically by putting value on things like “creativity” or “concepts”, historically/artistically etc. Then the value of the music comes in the form of “technicality” and guys like Eminem who complain (in Renegade at least) about being too famous, become examples of “great rap music” or something. These racists show HOW COMPLEX Eminem is by showing his “intricate” rhyme patterns like the ones below:
A 5-syllable rhyme pattern that spans multiple words {lucrative lyrics, youth in hysterics, views in his marriage, you shouldn’t hear it, food for the spirit…}. A bunch of internal rhymes and assonance and other cool TECHNICAL things that don’t matter because Eminem is complaining about dumb things like being too famous. It IS hard to reproduce as a rhyme pattern because of how many words are used and it takes HARD MATH and other algorithms to try and reproduce it. It gave me incentive to try and venture into 5,6,7 syllable rhymes (still very new into that realm) and I’m attempting to try and do that. I have a bunch of rhymes (~7k) at www.rapmetrics.com/suckmydick.txt if you want to see. Here are some:
Gang references in rap is not new. Rap beef is not new. Any elements of rap beef crossing over to real life isn’t new. But the specific intersection of the the 3, with the ability to view/understand it all digitally might be. The YouTube-wave of Chicago rappers shouting out blocks and gangs in the songs, and their ability to gain fans within the city that are hyper-aware of all the references has created an interesting opportunity.
With good algorithms to detect these references and aggregate them, YouTube comments (!) end up containing a ton of interesting data. Gang shootouts with respective positive and negative affiliation explicitly referenced, street corners mentioned, coded terminology thrown around and the ability to cross-reference a user’s posting history with particular songs they liked or commented on. It’s weird new territory with weird ethical questions that need to be considered. Perhaps better than I have already.
I wrote about it all at the link at the top. Please read and share.
I guess ever since Volume 5 of the And1 series I’ve been a big fan of Styles P. For one summer, I think this was the song we listened to the most. Of course, most of the ideas meant nothing to me then (even now?) but it came out at a special time when streetball tapes were important and the music off those tapes reflected what was ‘hot’. All of this before Professor ruined And1 basketball, streetball and the delicate intersection of the two for me.
All of that being not interesting, Jada and Styles both rhyme in cool ways. My favorite thing they do is the 4-syllable rhymes with unstressed middle syllables where, usually, almost all of the words are 1-syllable in length. For example:
I got shit that could wakeup thedeaf
that’ll knock down the door and breakup thesteps
Certain words like {up} and {the} are not stressed because they work mostly as function words (I guess). This is essentially a 2-syllable rhyme {wake+deaf, break+steps} and the middle words just carry rhythm and sound cool. Important to note that all of these words are monosyllabic. I think this is harder to do and actually sounds better. Computationally, it takes a lot more work to combine four words together like this and for it to sound good/’make sense’. It is easy to find instances of 2-word combinations that match the rhyme pattern above.
4-word combinations is going to require some boring programming and probably some math and stuff like that to do in a non-shitty way. Here are some Jadakiss/Syles-P 4-syllable rhymes and what the COMPLEX RapMetrics algorithm returns as possible rhyme results:
1.
Everyday I need an ounceand ahalf S.P.; the only flower that you know, with a bouncein ahalf
One of the great opening bars from one of my favorite albums ever. Again, same idea here; the rhyme is essentially {ounce+half, bounce+half} with two words in the middle to carry rhythm and make it a 4-syllable rhyme. It is one of my favorite rhyme schemes and an interesting one because the OW sound is pretty rare in general. Here are some computer generated rhymes for that scheme. Notice, again, there are no 4-word combinations. It is because it is actually harder to do computationally, and, in my opinion, harder to do as a writer.
Motherfucker understand it’s full service toyou
I don’t smoke the weed if it ain’t, purple orblue
Again, from ‘Get High’, same 4-syllable rhyme with 2 unstressed syllables in the middle. This time the first word in this ‘rhyme scheme’ is 2-syllables long instead of one. Which is cool. This is what the very complex RapMetrics algorithm returned as possible results for rhymes:
This is a cool ‘technique’ and there are some really cool things that can be done with it. It is actually harder to 4-syllable rhymes of only monosyllabic words computationally and I think it is also harder to do it well as a writer. I guess it is the main points of this ‘article’. Sometimes having to simplify language and use smaller words to convey meaning/rhythm is actually way harder to do. The simplest Styles P lines have stuck with me for the longest:
Bitch think I don’t smile ’cause my tooth chipped// Bitch, I don’t smile cause my heart chipped//