LTP 文档

Submodules

ltp.interface module

ltp.interface.LTP(pretrained_model_name_or_path='LTP/small', force_download=False, resume_download=False, proxies=None, use_auth_token=None, cache_dir=None, local_files_only=False, **model_kwargs)[source]
Instantiate a pretrained LTP model from a pre-trained model

configuration from huggingface-hub. The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train().

Parameters:
pretrained_model_name_or_path (str or os.PathLike):
Can be either:
  • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids are [LTP/tiny, LTP/small, LTP/base, LTP/base1, LTP/base1, LTP/legacy ], the legacy model only support cws, pos and ner, but more fast.

  • You can add revision by appending @ at the end of model_id simply like this: dbmdz/bert-base-german-cased@main Revision is the specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

  • A path to a directory containing model weights saved using [~transformers.PreTrainedModel.save_pretrained], e.g., ./my_model_directory/.

  • None if you are both providing the configuration and state dictionary (resp. with keyword arguments config and state_dict).

force_download (bool, optional, defaults to False):

Whether to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.

resume_download (bool, optional, defaults to False):

Whether to delete incompletely received files. Will attempt to resume the download if such a file exists.

proxies (Dict[str, str], optional):

A dictionary of proxy servers to use by protocol or endpoint, e.g., {‘http’: ‘foo.bar:3128’, ‘http://hostname’: ‘foo.bar:4012’}. The proxies are used on each request.

use_auth_token (str or bool, optional):

The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running transformers-cli login (stored in ~/.huggingface).

cache_dir (Union[str, os.PathLike], optional):

Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.

local_files_only(bool, optional, defaults to False):

Whether to only look at local files (i.e., do not try to download the model).

model_kwargs (Dict, optional):

model_kwargs will be passed to the model during initialization

<Tip>

Passing use_auth_token=True is required when you want to use a private model.

</Tip>

ltp.legacy module

class ltp.legacy.LTP(cws=None, pos=None, ner=None)[source]

Bases: ModelHubMixin

property version
add_word(word, freq=1)[source]
add_words(words, freq=2)[source]
enable_type_cut(a, b)[source]
enable_type_cut_d(a, b)[source]
enable_type_concat(a, b)[source]
enable_type_concat_d(a, b)[source]
disable_rule(a, b)[source]
disable_rule_d(a, b)[source]
pipeline(*args, tasks=None, raw_format=False, parallelism=True, return_dict=True)[source]
auto_hook(words)[source]
ltp.legacy.main()[source]

ltp.nerual module

ltp.nerual.no_grad(func)[source]
class ltp.nerual.LTP(config=None, tokenizer=None)[source]

Bases: BaseModule, ModelHubMixin

model
cws_vocab
pos_vocab
ner_vocab
srl_vocab
dep_vocab
sdp_vocab
add_word(word, freq=1)[source]
add_words(words, freq=2)[source]
load_state_dict(state_dict, strict=True)[source]

Copy parameters and buffers from state_dict into this module and its descendants.

If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Warning

If assign is True the optimizer must be created after the call to load_state_dict.

Parameters
  • state_dict (dict) – a dict containing parameters and persistent buffers.

  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True

  • assign (bool, optional) – whether to assign items in the state dictionary to their corresponding keys in the module instead of copying them inplace into the module’s current parameters and buffers. When False, the properties of the tensors in the current module are preserved while when True, the properties of the Tensors in the state dict are preserved. Default: False

Returns

  • missing_keys is a list of str containing the missing keys

  • unexpected_keys is a list of str containing the unexpected keys

Return type

NamedTuple with missing_keys and unexpected_keys fields

Note

If a parameter or buffer is registered as None and its corresponding key exists in state_dict, load_state_dict() will raise a RuntimeError.

property version
pipeline(**kwargs)[source]
static get_graph_entities(rarcs, rels, labels)[source]
ltp.nerual.main()[source]

Module contents

ltp.LTP(pretrained_model_name_or_path='LTP/small', force_download=False, resume_download=False, proxies=None, use_auth_token=None, cache_dir=None, local_files_only=False, **model_kwargs)[source]
Instantiate a pretrained LTP model from a pre-trained model

configuration from huggingface-hub. The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train().

Parameters:
pretrained_model_name_or_path (str or os.PathLike):
Can be either:
  • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids are [LTP/tiny, LTP/small, LTP/base, LTP/base1, LTP/base1, LTP/legacy ], the legacy model only support cws, pos and ner, but more fast.

  • You can add revision by appending @ at the end of model_id simply like this: dbmdz/bert-base-german-cased@main Revision is the specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

  • A path to a directory containing model weights saved using [~transformers.PreTrainedModel.save_pretrained], e.g., ./my_model_directory/.

  • None if you are both providing the configuration and state dictionary (resp. with keyword arguments config and state_dict).

force_download (bool, optional, defaults to False):

Whether to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.

resume_download (bool, optional, defaults to False):

Whether to delete incompletely received files. Will attempt to resume the download if such a file exists.

proxies (Dict[str, str], optional):

A dictionary of proxy servers to use by protocol or endpoint, e.g., {‘http’: ‘foo.bar:3128’, ‘http://hostname’: ‘foo.bar:4012’}. The proxies are used on each request.

use_auth_token (str or bool, optional):

The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running transformers-cli login (stored in ~/.huggingface).

cache_dir (Union[str, os.PathLike], optional):

Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.

local_files_only(bool, optional, defaults to False):

Whether to only look at local files (i.e., do not try to download the model).

model_kwargs (Dict, optional):

model_kwargs will be passed to the model during initialization

<Tip>

Passing use_auth_token=True is required when you want to use a private model.

</Tip>

class ltp.StnSplit(self)

Bases: object

batch_split(batch_text, threads=8)

batch split to sentences

bracket_as_entity

Get the value of the bracket_as_entity option.

en_quote_as_entity

Get the value of the en_quote_as_entity option.

split(text)

split to sentences

use_en

Get the value of the use_en option.

use_zh

Get the value of the use_zh option.

zh_quote_as_entity

Get the value of the zh_quote_as_entity option.