User Methods

class lftk.lftk.Extractor(docs: Doc | List[Doc])

Main object

Saves your spaCy doc objects to extract linguistic features from. As you declare this object, pass in spaCy doc(s).

Parameters:

docs (Union[spacy.tokens.doc.Doc, List[spacy.tokens.doc.Doc]]) – spaCy doc object or a list of spacy doc objects

Examples

>>> import spacy
>>> import lftk
>>>
>>> nlp = spacy.load("en_core_web_sm")
>>> doc1 = nlp("I think effects computers have on people are great!")
>>> doc2 = nlp("I like drinking coffee...")
>>>
>>> LFTK = lftk.Extractor(docs = [doc1, doc2])
customize(stop_words: bool = True, punctuations: bool = True, round_decimal: int = 3) None

Global customization

Customizes all LFTK functions to extract features based on these options. This exclude some special functions that are intentionally designed to override global options.

Parameters:
  • stop_words (bool (default = True)) – Selection whether to include stop words for feature extraction

  • punctuations (bool (default = True)) – Selection whether to include punctuations for feature extraction

  • round_decimal (int (default = 3)) – The max number of decimal digits to return (for extracted feature values)

Examples

>>> import spacy
>>> import lftk
>>>
>>> nlp = spacy.load("en_core_web_sm")
>>> doc = nlp("I think effects computers have on people are great!")
>>>
>>> LFTK = lftk.Extractor(docs = doc)
>>>
>>> LFTK.customize(stop_words=True, punctuations=False, round_decimal=3)
>>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"])
>>> # {'a_word_ps': 9.0, 'a_kup_pw': 5.323, 'n_noun': 3}
>>>
>>> LFTK.customize(stop_words=False, punctuations=False, round_decimal=2)
>>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"])
>>> # {'a_word_ps': 5.0, 'a_kup_pw': 9.6, 'n_noun': 3}
extract(features: str | list = '*') Dict[str, float | int] | List[Dict[str, float | int]]

Extract function for select features

Extracts feature(s) that are passed in by the user. All spaCy docs that were saved during LFTK Extractor declaration are used. This function extracts all features available in LFTK by default.

This function only accepts feature key(s) (e.g., “t_word”, “a_kup_pw”).

Parameters:

features (Union[str, list] (default = "*")) – A single feature key or a list of feature keys. Passing “*” extracts all available features.

Returns:

result – A dictionary or a list of dictionaries, depending on the number of spaCy docs you passed in during the creation of LFTK Extractor object.

Return type:

Union[Dict[str, Union[float,int]], List[Dict[str, Union[float,int]]]]

Examples

>>> import spacy
>>> import lftk
>>>
>>> nlp = spacy.load("en_core_web_sm")
>>> doc1 = nlp("I think effects computers have on people are great!")
>>> doc2 = nlp("I like drinking coffee...")
>>>
>>> LFTK = lftk.Extractor(docs = doc1)
>>>
>>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"])
>>> # {'a_word_ps': 10.0, 'a_kup_pw': 4.791, 'n_noun': 3}
>>>
>>> LFTK = lftk.Extractor(docs = [doc1, doc2])
>>>
>>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"])
>>> # [{'a_word_ps': 10.0, 'a_kup_pw': 4.791, 'n_noun': 3}, {'a_word_ps': 5.0, 'a_kup_pw': 2.284, 'n_noun': 2}]
lftk.lftk.search_features(domain: str = '*', family: str = '*', language: str = '*', return_format: str = 'list_dict') List[dict] | DataFrame

Search features

Returns available linguistic features that match user-specified attributes. Putting “*” on any attribute is analogous to “any”. You can use this function to produce list of feature keys and pass them into LFTK Extractor’s extract() function (see above).

Parameters:
  • domain (str (default = "*")) – A single domain name (e.g., “surface”, “lexico-semantics”)

  • family (str (default = "*")) – A single family name (e.g., “worddiff”, “avgwordsent”)

  • language (str (default = "*")) – A single supported language (e.g., “worddiff”, “avgwordsent”)

  • return_format (str (default = "list_dict")) – Select how the searched features should be returned. The available options are “list_dict”, “pandas”, or “list_key”.

Returns:

result – Available features searched with user-given conditions

Return type:

Union[List[dict], pd.core.frame.DataFrame]

Examples

>>> import lftk
>>>
>>> output = lftk.search_features(domain = 'surface', family = "avgwordsent", language="general", return_format = "list_dict")
>>> # [{'key': 'a_word_ps', 'name': 'average_number_of_words_per_sentence', 'formulation': 'derivation', 'domain': 'surface', 'family': 'avgwordsent', 'language': 'general'}, {'key': 'a_char_ps', 'name': 'average_number_of_characters_per_sentence', 'formulation': 'derivation', 'domain': 'surface', 'family': 'avgwordsent', 'language': 'general'}, {'key': 'a_char_pw', 'name': 'average_number_of_characters_per_word', 'formulation': 'derivation', 'domain': 'surface', 'family': 'avgwordsent', 'language': 'general'}]
>>>
>>> output = lftk.search_features(domain = 'surface', family = "avgwordsent", language="general", return_format = "list_dict")
>>> #           key                                       name formulation   domain       family language
>>>  #9   a_word_ps       average_number_of_words_per_sentence  derivation  surface  avgwordsent  general
>>> # 10  a_char_ps  average_number_of_characters_per_sentence  derivation  surface  avgwordsent  general
>>> # 11  a_char_pw      average_number_of_characters_per_word  derivation  surface  avgwordsent  general
>>>
>>> output = lftk.search_features(domain = 'surface', family = "avgwordsent", language="general", return_format = "list_dict")
>>> # ['a_word_ps', 'a_char_ps', 'a_char_pw']