User Methods¶
- class lftk.lftk.Extractor(docs: Doc | List[Doc])¶
Main object
Saves your spaCy doc objects to extract linguistic features from. As you declare this object, pass in spaCy doc(s).
- Parameters:
docs (Union[spacy.tokens.doc.Doc, List[spacy.tokens.doc.Doc]]) – spaCy doc object or a list of spacy doc objects
Examples
>>> import spacy >>> import lftk >>> >>> nlp = spacy.load("en_core_web_sm") >>> doc1 = nlp("I think effects computers have on people are great!") >>> doc2 = nlp("I like drinking coffee...") >>> >>> LFTK = lftk.Extractor(docs = [doc1, doc2])
- customize(stop_words: bool = True, punctuations: bool = True, round_decimal: int = 3) None ¶
Global customization
Customizes all LFTK functions to extract features based on these options. This exclude some special functions that are intentionally designed to override global options.
- Parameters:
stop_words (bool (default = True)) – Selection whether to include stop words for feature extraction
punctuations (bool (default = True)) – Selection whether to include punctuations for feature extraction
round_decimal (int (default = 3)) – The max number of decimal digits to return (for extracted feature values)
Examples
>>> import spacy >>> import lftk >>> >>> nlp = spacy.load("en_core_web_sm") >>> doc = nlp("I think effects computers have on people are great!") >>> >>> LFTK = lftk.Extractor(docs = doc) >>> >>> LFTK.customize(stop_words=True, punctuations=False, round_decimal=3) >>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"]) >>> # {'a_word_ps': 9.0, 'a_kup_pw': 5.323, 'n_noun': 3} >>> >>> LFTK.customize(stop_words=False, punctuations=False, round_decimal=2) >>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"]) >>> # {'a_word_ps': 5.0, 'a_kup_pw': 9.6, 'n_noun': 3}
- extract(features: str | list = '*') Dict[str, float | int] | List[Dict[str, float | int]] ¶
Extract function for select features
Extracts feature(s) that are passed in by the user. All spaCy docs that were saved during LFTK Extractor declaration are used. This function extracts all features available in LFTK by default.
This function only accepts feature key(s) (e.g., “t_word”, “a_kup_pw”).
- Parameters:
features (Union[str, list] (default = "*")) – A single feature key or a list of feature keys. Passing “*” extracts all available features.
- Returns:
result – A dictionary or a list of dictionaries, depending on the number of spaCy docs you passed in during the creation of LFTK Extractor object.
- Return type:
Union[Dict[str, Union[float,int]], List[Dict[str, Union[float,int]]]]
Examples
>>> import spacy >>> import lftk >>> >>> nlp = spacy.load("en_core_web_sm") >>> doc1 = nlp("I think effects computers have on people are great!") >>> doc2 = nlp("I like drinking coffee...") >>> >>> LFTK = lftk.Extractor(docs = doc1) >>> >>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"]) >>> # {'a_word_ps': 10.0, 'a_kup_pw': 4.791, 'n_noun': 3} >>> >>> LFTK = lftk.Extractor(docs = [doc1, doc2]) >>> >>> output = LFTK.extract(features = ["a_word_ps", "a_kup_pw", "n_noun"]) >>> # [{'a_word_ps': 10.0, 'a_kup_pw': 4.791, 'n_noun': 3}, {'a_word_ps': 5.0, 'a_kup_pw': 2.284, 'n_noun': 2}]
- lftk.lftk.search_features(domain: str = '*', family: str = '*', language: str = '*', return_format: str = 'list_dict') List[dict] | DataFrame ¶
Search features
Returns available linguistic features that match user-specified attributes. Putting “*” on any attribute is analogous to “any”. You can use this function to produce list of feature keys and pass them into LFTK Extractor’s extract() function (see above).
- Parameters:
domain (str (default = "*")) – A single domain name (e.g., “surface”, “lexico-semantics”)
family (str (default = "*")) – A single family name (e.g., “worddiff”, “avgwordsent”)
language (str (default = "*")) – A single supported language (e.g., “worddiff”, “avgwordsent”)
return_format (str (default = "list_dict")) – Select how the searched features should be returned. The available options are “list_dict”, “pandas”, or “list_key”.
- Returns:
result – Available features searched with user-given conditions
- Return type:
Union[List[dict], pd.core.frame.DataFrame]
Examples
>>> import lftk >>> >>> output = lftk.search_features(domain = 'surface', family = "avgwordsent", language="general", return_format = "list_dict") >>> # [{'key': 'a_word_ps', 'name': 'average_number_of_words_per_sentence', 'formulation': 'derivation', 'domain': 'surface', 'family': 'avgwordsent', 'language': 'general'}, {'key': 'a_char_ps', 'name': 'average_number_of_characters_per_sentence', 'formulation': 'derivation', 'domain': 'surface', 'family': 'avgwordsent', 'language': 'general'}, {'key': 'a_char_pw', 'name': 'average_number_of_characters_per_word', 'formulation': 'derivation', 'domain': 'surface', 'family': 'avgwordsent', 'language': 'general'}] >>> >>> output = lftk.search_features(domain = 'surface', family = "avgwordsent", language="general", return_format = "list_dict") >>> # key name formulation domain family language >>> #9 a_word_ps average_number_of_words_per_sentence derivation surface avgwordsent general >>> # 10 a_char_ps average_number_of_characters_per_sentence derivation surface avgwordsent general >>> # 11 a_char_pw average_number_of_characters_per_word derivation surface avgwordsent general >>> >>> output = lftk.search_features(domain = 'surface', family = "avgwordsent", language="general", return_format = "list_dict") >>> # ['a_word_ps', 'a_char_ps', 'a_char_pw']