Linguistic Features, grouped by family

  • These methods are not meant to be called directly by user.

  • Passing in feature key into the LFTK Extractor object will trigger LFTK to internally use these functions to extract features.

class lftk.foundation.entity.Entity

Parent class for features that are in the ‘entity’ family.

total_number_of_named_entities() int

returns the number of named entities

total_number_of_named_entities_art() int

returns the number of named entities that are WORK_OF_ART (Titles of books, songs, etc.)

total_number_of_named_entities_cardinal() int

returns the number of named entities that are CARDINAL (Numerals that do not fall under another type.)

total_number_of_named_entities_date() int

returns the number of named entities that are DATE (Absolute or relative dates or periods.)

total_number_of_named_entities_event() int

returns the number of named entities that are EVENT (Named hurricanes, battles, wars, sports events, etc.)

total_number_of_named_entities_fac() int

returns the number of named entities that are FAC (Buildings, airports, highways, bridges, etc.)

total_number_of_named_entities_gpe() int

returns the number of named entities that are GPE (Countries, cities, states.)

total_number_of_named_entities_language() int

returns the number of named entities that are LANGUAGE (Any named language.)

total_number_of_named_entities_law() int

returns the number of named entities that are LAW (Named documents made into laws.)

total_number_of_named_entities_loc() int

returns the number of named entities that are LOC (Non-GPE locations, mountain ranges, bodies of water.)

total_number_of_named_entities_money() int

returns the number of named entities that are MONEY (Monetary values, including unit.)

total_number_of_named_entities_norp() int

returns the number of named entities that are NORP (Nationalities or religious or political groups)

total_number_of_named_entities_ordinal() int

returns the number of named entities that are ORDINAL (“first”, “second”, etc.)

total_number_of_named_entities_org() int

returns the number of named entities that are ORG (Companies, agencies, institutions, etc.)

total_number_of_named_entities_percent() int

returns the number of named entities that are PERCENT (Percentage, including ”%“.)

total_number_of_named_entities_person() int

returns the number of named entities that are PERSON (People, including fictional)

total_number_of_named_entities_product() int

returns the number of named entities that are PRODUCT (Objects, vehicles, foods, etc. (Not services.)

total_number_of_named_entities_quantity() int

returns the number of named entities that are QUANTITY (Measurements, as of weight or distance.)

total_number_of_named_entities_time() int

returns the number of named entities that are TIME (Times smaller than a day.)

class lftk.foundation.partofspeech.PartOfSpeech

Parent class for features that are in the ‘partofspeech’ family.

total_number_of_adjectives() float

returns the number of adjectives

total_number_of_adpositions() float

returns the number of adpositions

total_number_of_adverbs() float

returns the number of adverbs

total_number_of_auxiliaries() float

returns the number of auxiliaries

total_number_of_coordinating_conjunctions() float

returns the number of coordinating_conjunctions

total_number_of_determiners() float

returns the number of determiners

total_number_of_interjections() float

returns the number of interjections

total_number_of_nouns() float

returns the number of nouns

total_number_of_numerals() float

returns the number of numerals

total_number_of_particles() float

returns the number of particles

total_number_of_pronouns() float

returns the number of pronouns

total_number_of_proper_nouns() float

returns the number of proper nouns

total_number_of_punctuations() float

returns the number of punctuations

total_number_of_spaces() float

returns the number of spaces

total_number_of_subordinating_conjunctions() float

returns the number of subordinating conjunctions

total_number_of_symbols() float

returns the number of symbols

total_number_of_unique_adjectives() float

returns the number of unique adjectives

total_number_of_unique_adpositions() float

returns the number of unique adpositions

total_number_of_unique_adverbs() float

returns the number of unique adverbs

total_number_of_unique_auxiliaries() float

returns the number of unique auxiliaries

total_number_of_unique_coordinating_conjunctions() float

returns the number of unique coordinating_conjunctions

total_number_of_unique_determiners() float

returns the number of unique determiners

total_number_of_unique_interjections() float

returns the number of unique interjections

total_number_of_unique_nouns() float

returns the number of unique nouns

total_number_of_unique_numerals() float

returns the number of unique numerals

total_number_of_unique_particles() float

returns the number of unique particles

total_number_of_unique_pronouns() float

returns the number of unique pronouns

total_number_of_unique_proper_nouns() float

returns the number of unique proper nouns

total_number_of_unique_punctuations() float

returns the number of unique punctuations

total_number_of_unique_spaces() float

returns the number of unique spaces

total_number_of_unique_subordinating_conjunctions() float

returns the number of unique subordinating conjunctions

total_number_of_unique_symbols() float

returns the number of unique symbols

total_number_of_unique_verbs() float

returns the number of unique verbs

total_number_of_verbs() float

returns the number of verbs

class lftk.foundation.worddiff.WordDiff

Parent class for features that are in the ‘worddiff’ family.

total_brysbaert_age_of_acquistion_of_words() float

returns the total value of total Brysbaert score (adds up all words’ Brysbaert difficulty score)

total_kuperman_age_of_acquistion_of_words() float

returns the total value of total Kuperman score (adds up all words’ Kuperman difficulty score)

total_subtlex_us_zipf_of_words() float

returns the total value of total SubtlexUS frequency score (adds up all words’ zipf score from SubtlexUS)

class lftk.foundation.wordsent.WordSent

Parent class for features that are in the ‘wordsent’ family.

total_number_of_characters() int

returns the total number of characters

total_number_of_punctuations() int

returns the number of punctuations

total_number_of_sentences() int

returns the total number of sentences

total_number_of_stop_words() int

returns the number of stop words

total_number_of_syllables() int

returns the number of syllables

total_number_of_unique_words() int

returns the number of unique lemmatized words

total_number_of_unique_words_no_lemma() int

returns the number of unique words

total_number_of_words() int

returns the number of words

total_number_of_words_more_than_three_syllables() int

returns the number of words more than three syllables

total_number_of_words_more_than_two_syllables() int

returns the number of words more than two syllables

class lftk.derivation.avgentity.AvgEntity

Parent class for features that are in the ‘wordsent’ family.

average_number_of_named_entities_art_per_sentence() float

returns the value of (total number of named entities that are WORK_OF_ART / total number of sentences) -> Titles of books, songs, etc.

average_number_of_named_entities_art_per_word() float

returns the value of (total number of named entities that are WORK_OF_ART / total number of words) -> Titles of books, songs, etc.

average_number_of_named_entities_cardinal_per_sentence() float

returns the value of (total number of named entities that are CARDINAL / total number of sentences) -> Numerals that do not fall under another type.

average_number_of_named_entities_cardinal_per_word() float

returns the value of (total number of named entities that are CARDINAL / total number of words) -> Numerals that do not fall under another type.

average_number_of_named_entities_date_per_sentence() float

returns the value of (total number of named entities that are DATE / total number of sentences) -> Absolute or relative dates or periods.

average_number_of_named_entities_date_per_word() float

returns the value of (total number of named entities that are DATE / total number of words) -> Absolute or relative dates or periods.

average_number_of_named_entities_event_per_sentence() float

returns the value of (total number of named entities that are EVENT / total number of sentences) -> Named hurricanes, battles, wars, sports events, etc.

average_number_of_named_entities_event_per_word() float

returns the value of (total number of named entities that are EVENT / total number of words) -> Named hurricanes, battles, wars, sports events, etc.

average_number_of_named_entities_fac_per_sentence() float

returns the value of (total number of named entities that are FAC / total number of sentences) -> Buildings, airports, highways, bridges, etc.

average_number_of_named_entities_fac_per_word() float

returns the value of (total number of named entities that are FAC / total number of words) -> Buildings, airports, highways, bridges, etc.

average_number_of_named_entities_gpe_per_sentence() float

returns the value of (total number of named entities that are GPE / total number of sentences) -> Countries, cities, states.

average_number_of_named_entities_gpe_per_word() float

returns the value of (total number of named entities that are GPE / total number of words) -> Countries, cities, states.

average_number_of_named_entities_language_per_sentence() float

returns the value of (total number of named entities that are LANGUAGE / total number of sentences) -> Any named language.

average_number_of_named_entities_language_per_word() float

returns the value of (total number of named entities that are LANGUAGE / total number of words) -> Any named language.

average_number_of_named_entities_law_per_sentence() float

returns the value of (total number of named entities that are LAW / total number of sentences) -> Named documents made into laws.

average_number_of_named_entities_law_per_word() float

returns the value of (total number of named entities that are LAW / total number of words) -> Named documents made into laws.

average_number_of_named_entities_loc_per_sentence() float

returns the value of (total number of named entities that are LOC / total number of sentences) -> Non-GPE locations, mountain ranges, bodies of water.

average_number_of_named_entities_loc_per_word() float

returns the value of (total number of named entities that are LOC / total number of words) -> Non-GPE locations, mountain ranges, bodies of water.

average_number_of_named_entities_money_per_sentence() float

returns the value of (total number of named entities that are MONEY / total number of sentences) -> Monetary values, including unit.

average_number_of_named_entities_money_per_word() float

returns the value of (total number of named entities that are MONEY / total number of words) -> Monetary values, including unit.

average_number_of_named_entities_norp_per_sentence() float

returns the value of (total number of named entities that are NORP / total number of sentences) -> Nationalities or religious or political groups

average_number_of_named_entities_norp_per_word() float

returns the value of (total number of named entities that are NORP / total number of words) -> Nationalities or religious or political groups

average_number_of_named_entities_ordinal_per_sentence() float

returns the value of (total number of named entities that are ORDINAL / total number of sentences) -> “first”, “second”, etc.

average_number_of_named_entities_ordinal_per_word() float

returns the value of (total number of named entities that are ORDINAL / total number of words) -> “first”, “second”, etc.

average_number_of_named_entities_org_per_sentence() float

returns the value of (total number of named entities that are ORG / total number of sentences) -> Companies, agencies, institutions, etc.

average_number_of_named_entities_org_per_word() float

returns the value of (total number of named entities that are ORG / total number of words) -> Companies, agencies, institutions, etc.

average_number_of_named_entities_per_sentence() float

returns the value of (total number of named entities / total number of sentences)

average_number_of_named_entities_per_word() float

returns the value of (total number of named entities / total number of words)

average_number_of_named_entities_percent_per_sentence() float

returns the value of (total number of named entities that are PERCENT / total number of sentences) -> Percentage, including ”%“.

average_number_of_named_entities_percent_per_word() float

returns the value of (total number of named entities that are PERCENT / total number of words) -> Percentage, including ”%“.

average_number_of_named_entities_person_per_sentence() float

returns the value of (total number of named entities that are PERSON / total number of sentences) -> People, including fictional

average_number_of_named_entities_person_per_word() float

returns the value of (total number of named entities that are PERSON / total number of words) -> People, including fictional

average_number_of_named_entities_product_per_sentence() float

returns the value of (total number of named entities that are PRODUCT / total number of sentences) -> Objects, vehicles, foods, etc., but not services.

average_number_of_named_entities_product_per_word() float

returns the value of (total number of named entities that are PRODUCT / total number of words) -> Objects, vehicles, foods, etc., but not services.

average_number_of_named_entities_quantity_per_sentence() float

returns the value of (total number of named entities that are QUANTITY / total number of sentences) -> Measurements, as of weight or distance.

average_number_of_named_entities_quantity_per_word() float

returns the value of (total number of named entities that are QUANTITY / total number of words) -> Measurements, as of weight or distance.

average_number_of_named_entities_time_per_sentence() float

returns the value of (total number of named entities that are TIME / total number of sentences) -> Times smaller than a day.

average_number_of_named_entities_time_per_word() float

returns the value of (total number of named entities that are TIME / total number of words) -> Times smaller than a day.

class lftk.derivation.avgpartofspeech.AvgPartOfSpeech

Parent class for features that are in the ‘avgpartofspeech’ family.

average_number_of_adjectives_per_sentence() float

returns the value of (total number of adjectives / total number of sentence)

average_number_of_adjectives_per_word() float

returns the value of (total number of adjectives / total number of words)

average_number_of_adpositions_per_sentence() float

returns the value of (total number of adpositions / total number of sentence)

average_number_of_adpositions_per_word() float

returns the value of (total number of adpositions / total number of word)

average_number_of_adverbs_per_sentence() float

returns the value of (total number of adverbs / total number of sentence)

average_number_of_adverbs_per_word() float

returns the value of (total number of adverbs / total number of word)

average_number_of_auxiliaries_per_sentence() float

returns the value of (total number of auxiliaries / total number of sentence)

average_number_of_auxiliaries_per_word() float

returns the value of (total number of auxiliaries / total number of word)

average_number_of_coordinating_conjunctions_per_sentence() float

returns the value of (total number of coordinating conjunctions / total number of sentence)

average_number_of_coordinating_conjunctions_per_word() float

returns the value of (total number of coordinating conjunctions / total number of word)

average_number_of_determiners_per_sentence() float

returns the value of (total number of determiners / total number of sentence)

average_number_of_determiners_per_word() float

returns the value of (total number of determiners / total number of word)

average_number_of_interjections_per_sentence() float

returns the value of (total number of interjections / total number of sentence)

average_number_of_interjections_per_word() float

returns the value of (total number of interjections / total number of word)

average_number_of_nouns_per_sentence() float

returns the value of (total number of nouns / total number of sentence)

average_number_of_nouns_per_word() float

returns the value of (total number of nouns / total number of word)

average_number_of_numerals_per_sentence() float

returns the value of (total number of numerals / total number of sentence)

average_number_of_numerals_per_word() float

returns the value of (total number of numerals / total number of word)

average_number_of_particles_per_sentence() float

returns the value of (total number of particles / total number of sentence)

average_number_of_particles_per_word() float

returns the value of (total number of particles / total number of word)

average_number_of_pronouns_per_sentence() float

returns the value of (total number of pronouns / total number of sentence)

average_number_of_pronouns_per_word() float

returns the value of (total number of pronouns / total number of word)

average_number_of_proper_nouns_per_sentence() float

returns the value of (total number of proper nouns / total number of sentence)

average_number_of_proper_nouns_per_word() float

returns the value of (total number of proper nouns / total number of word)

average_number_of_punctuations_per_sentence() float

returns the value of (total number of punctuations / total number of sentence)

average_number_of_punctuations_per_word() float

returns the value of (total number of punctuations / total number of word)

average_number_of_spaces_per_sentence() float

returns the value of (total number of spaces / total number of sentence)

average_number_of_spaces_per_word() float

returns the value of (total number of spaces / total number of word)

average_number_of_subordinating_conjunctions_per_sentence() float

returns the value of (total number of subordinating conjunctions / total number of sentence)

average_number_of_subordinating_conjunctions_per_word() float

returns the value of (total number of subordinating conjunctions / total number of word)

average_number_of_symbols_per_sentence() float

returns the value of (total number of symbols / total number of sentence)

average_number_of_symbols_per_word() float

returns the value of (total number of symbols / total number of word)

average_number_of_verbs_per_sentence() float

returns the value of (total number of verbs / total number of sentence)

average_number_of_verbs_per_word() float

returns the value of (total number of verbs / total number of word)

class lftk.derivation.avgworddiff.AvgWordDiff

Parent class for features that are in the ‘avgworddiff’ family.

average_brysbaert_age_of_acquistion_of_words_per_sentence() float

returns value of (total Brysbaert score / total number of sentence)

average_brysbaert_age_of_acquistion_of_words_per_word() float

returns value of (total Brysbaert score / total number of words)

average_kuperman_age_of_acquistion_of_words_per_sentence() float

returns value of (total Kuperman score / total number of sentences)

average_kuperman_age_of_acquistion_of_words_per_word() float

returns value of (total Kuperman score / total number of words)

average_subtlex_us_zipf_of_words_per_sentence() float

returns value of (total subtlexus zipf score / total number of sentences)

average_subtlex_us_zipf_of_words_per_word() float

returns value of (total subtlexus zipf score / total number of words)

class lftk.derivation.avgwordsent.AvgWordSent

Parent class for features that are in the ‘avgwordsent’ family.

average_number_of_characters_per_sentence() float

returns value of (total number of characters / total number of sentences)

average_number_of_characters_per_word() float

returns value of (total number of characters / total number of words)

average_number_of_punctuations_per_sentence() float

returns value of (total number of punctuations / total number of sentences)

average_number_of_punctuations_per_word() float

returns value of (total number of punctuations / total number of words)

average_number_of_stop_words_per_sentence() float

returns value of (total number of stop words / total number of sentences)

average_number_of_stop_words_per_word() float

returns value of (total number of stop words / total number of words)

average_number_of_syllables_per_sentence() float

returns value of (total number of syllables / total number of sentences)

average_number_of_syllables_per_word() float

returns value of (total number of syllables / total number of words)

average_number_of_words_per_sentence() float

returns value of (total number of words / total number of sentences)

class lftk.derivation.lexicalvariation.LexicalVariation

Parent class for features that are in the ‘lexicalvariation’ family.

corrected_adjectives_variation() float

returns value of (total number of unique adjectives / root(2*total number of adjectives))

corrected_adpositions_variation() float

returns value of (total number of unique adpositions / root(2*total number of adpositions))

corrected_adverbs_variation() float

returns value of (total number of unique adverbs / root(2*total number of adverbs))

corrected_auxiliaries_variation() float

returns value of (total number of unique auxiliaries / root(2*total number of auxiliaries))

corrected_coordinating_conjunctions_variation() float

returns value of (total number of unique coordinating conjunctions / root(2*total number of coordinating conjunctions))

corrected_determiners_variation() float

returns value of (total number of unique determiners / root(2*total number of determiners))

corrected_interjections_variation() float

returns value of (total number of unique interjections / root(2*total number of interjections))

corrected_nouns_variation() float

returns value of (total number of unique nouns / root(2*total number of nouns))

corrected_numerals_variation() float

returns value of (total number of unique numerals / root(2*total number of numerals))

corrected_particles_variation() float

returns value of (total number of unique particles / root(2*total number of particles))

corrected_pronouns_variation() float

returns value of (total number of unique pronouns / root(2*total number of pronouns))

corrected_proper_nouns_variation() float

returns value of (total number of unique proper nouns / root(2*total number of proper nouns))

corrected_punctuations_variation() float

returns value of (total number of unique punctuations / root(2*total number of punctuations))

corrected_spaces_variation() float

returns value of (total number of unique spaces / root(2*total number of spaces))

corrected_subordinating_conjunctions_variation() float

returns value of (total number of unique subordinating conjunctions / root(2*total number of subordinating conjunctions))

corrected_symbols_variation() float

returns value of (total number of unique symbols / root(2*total number of symbols))

corrected_verbs_variation() float

returns value of (total number of unique verbs / root(2*total number of verbs))

root_adjectives_variation() float

returns value of (total number of unique adjectives / root(total number of adjectives))

root_adpositions_variation() float

returns value of (total number of unique adpositions / root(total number of adpositions))

root_adverbs_variation() float

returns value of (total number of unique adverbs / root(total number of adverbs))

root_auxiliaries_variation() float

returns value of (total number of unique auxiliaries / root(total number of auxiliaries))

root_coordinating_conjunctions_variation() float

returns value of (total number of unique coordinating conjunctions / root(total number of coordinating conjunctions))

root_determiners_variation() float

returns value of (total number of unique determiners / root(total number of determiners))

root_interjections_variation() float

returns value of (total number of unique interjections / root(total number of interjections))

root_nouns_variation() float

returns value of (total number of unique nouns / root(total number of nouns))

root_numerals_variation() float

returns value of (total number of unique numerals / root(total number of numerals))

root_particles_variation() float

returns value of (total number of unique particles / root(total number of particles))

root_pronouns_variation() float

returns value of (total number of unique pronouns / root(total number of pronouns))

root_proper_nouns_variation() float

returns value of (total number of unique proper nouns / root(total number of proper nouns))

root_punctuations_variation() float

returns value of (total number of unique punctuations / root(total number of punctuations))

root_spaces_variation() float

returns value of (total number of unique spaces / root(total number of spaces))

root_subordinating_conjunctions_variation() float

returns value of (total number of unique subordinating conjunctions / root(total number of subordinating conjunctions))

root_symbols_variation() float

returns value of (total number of unique symbols / root(total number of symbols))

root_verbs_variation() float

returns value of (total number of unique verbs / root(total number of verbs))

simple_adjectives_variation() float

returns value of (total number of unique adjectives / total number of adjectives)

simple_adpositions_variation() float

returns value of (total number of unique adpositions / total number of adpositions)

simple_adverbs_variation() float

returns value of (total number of unique adverbs / total number of adverbs)

simple_auxiliaries_variation() float

returns value of (total number of unique auxiliaries / total number of auxiliaries)

simple_coordinating_conjunctions_variation() float

returns value of (total number of unique coordinating conjunctions / total number of coordinating conjunctions)

simple_determiners_variation() float

returns value of (total number of unique determiners / total number of determiners)

simple_interjections_variation() float

returns value of (total number of unique interjections / total number of interjections)

simple_nouns_variation() float

returns value of (total number of unique nouns / total number of nouns)

simple_numerals_variation() float

returns value of (total number of unique numerals / total number of numerals)

simple_particles_variation() float

returns value of (total number of unique particles / total number of particles)

simple_pronouns_variation() float

returns value of (total number of unique pronouns / total number of pronouns)

simple_proper_nouns_variation() float

returns value of (total number of unique proper nouns / total number of proper nouns)

simple_punctuations_variation() float

returns value of (total number of unique punctuations / total number of punctuations)

simple_spaces_variation() float

returns value of (total number of unique spaces / total number of spaces)

simple_subordinating_conjunctions_variation() float

returns value of (total number of unique subordinating conjunctions / total number of subordinating conjunctions)

simple_symbols_variation() float

returns value of (total number of unique symbols / total number of symbols)

simple_verbs_variation() float

returns value of (total number of unique verbs / total number of verbs)

class lftk.derivation.readformula.ReadFormula

Parent class for features that are in the ‘readformula’ family.

automated_readability_index() float

returns reading difficulty value that corresponds to US grade level

coleman_liau_index() float

returns reading difficulty value that corresponds to US grade level

flesch_kincaid_grade_level() float

returns reading difficulty value that corresponds to US grade level

flesch_kincaid_reading_ease() float

returns reading difficulty value from flesch kincaid reading ease formula, where 100.00–90.00->5th grade (Very easy to read), 90.0–80.0->6th grade (Easy to read), 80.0–70.0->7th grade (Fairly easy to read), 70.0–60.0->8th & 9th grade (Plain English, 60.0–50.0->10th to 12th grade (Fairly difficult to read), 50.0–30.0->College (Difficult to read), 30.0–10.0->College graduate (Very difficult to read), 10.0–0.0->Professional (Extremely difficult to read)

gunning_fog_index() float

returns gunning fog index, where 17->College graduate, 16->College senior, 15->College junior, 14->College sophomore, 13->College freshman, 12->High school senior, 11->High school junior, 10->High school sophomore, 9->High school freshman, 8->Eighth grade, 7->Seventh grade, 6->Sixth grade

smog_index() float

returns reading difficulty value that corresponds to US grade level

class lftk.derivation.readtimeformula.ReadTimeFormula

Parent class for features that are in the ‘readtimeformula’ family.

reading_time_for_average_readers() float

returns value of (total number of words / 240)

reading_time_for_fast_readers() float

returns value of (total number of words / 300)

reading_time_for_slow_readers() float

returns value of (total number of words / 175)

class lftk.derivation.typetokenratio.TypeTokenRatio

Parent class for features that are in the ‘typetokenratio’ family.

bilogarithmic_type_token_ratio() float

returns value of (log(total number of unique lemmatized words) / log(total number of words))

bilogarithmic_type_token_ratio_no_lemma() float

returns value of (log(total number of unique words) / log(total number of words))

corrected_type_token_ratio() float

returns value of (total number of unique lemmatized words / root(2*total number of words))

corrected_type_token_ratio_no_lemma() float

returns value of (total number of unique words / root(2*total number of words))

root_type_token_ratio() float

returns value of (total number of unique lemmatized words / root(total number of words))

root_type_token_ratio_no_lemma() float

returns value of (total number of unique words / root(total number of words))

simple_type_token_ratio() float

returns value of (total number of unique lemmatized words / total number of words)

simple_type_token_ratio_no_lemma() float

returns value of (total number of unique words / total number of words)

uber_type_token_ratio() float

returns value of (log(total number of unique lemmatized words)^2 / log(total number of words / total number of unique lemmatized words))

uber_type_token_ratio_no_lemma() float

returns value of (log(total number of unique lemmatized words)^2 / log(total number of words / total number of unique lemmatized words))