awswrangler.opensearch.index_documents

awswrangler.opensearch.index_documents(client: opensearchpy.client.OpenSearch, documents: Iterable[Mapping[str, Any]], index: str, doc_type: Optional[str] = None, keys_to_write: Optional[List[str]] = None, id_keys: Optional[List[str]] = None, ignore_status: Optional[Union[List[Any], Tuple[Any]]] = None, bulk_size: int = 1000, chunk_size: Optional[int] = 500, max_chunk_bytes: Optional[int] = 104857600, max_retries: Optional[int] = 5, initial_backoff: Optional[int] = 2, max_backoff: Optional[int] = 600, **kwargs: Any) Dict[str, Any]

Index all documents to OpenSearch index.

Note

Some of the args are referenced from opensearch-py client library (bulk helpers) https://opensearch-py.readthedocs.io/en/latest/helpers.html#opensearchpy.helpers.bulk https://opensearch-py.readthedocs.io/en/latest/helpers.html#opensearchpy.helpers.streaming_bulk

If you receive Error 429 (Too Many Requests) /_bulk please to to decrease bulk_size value. Please also consider modifying the cluster size and instance type - Read more here: https://aws.amazon.com/premiumsupport/knowledge-center/resolve-429-error-es/

Parameters
  • client (OpenSearch) – instance of opensearchpy.OpenSearch to use.

  • documents (Iterable[Mapping[str, Any]]) – List which contains the documents that will be inserted.

  • index (str) – Name of the index.

  • doc_type (str, optional) – Name of the document type (for Elasticsearch versions 5.x and earlier).

  • keys_to_write (List[str], optional) – list of keys to index. If not provided all keys will be indexed

  • id_keys (List[str], optional) – list of keys that compound document unique id. If not provided will use _id key if exists, otherwise will generate unique identifier for each document.

  • ignore_status (Union[List[Any], Tuple[Any]], optional) – list of HTTP status codes that you want to ignore (not raising an exception)

  • bulk_size (int,) – number of docs in each _bulk request (default: 1000)

  • chunk_size (int, optional) – number of docs in one chunk sent to es (default: 500)

  • max_chunk_bytes (int, optional) – the maximum size of the request in bytes (default: 100MB)

  • max_retries (int, optional) – maximum number of times a document will be retried when 429 is received, set to 0 (default) for no retries on 429 (default: 2)

  • initial_backoff (int, optional) – number of seconds we should wait before the first retry. Any subsequent retries will be powers of initial_backoff*2**retry_number (default: 2)

  • max_backoff (int, optional) – maximum number of seconds a retry will wait (default: 600)

  • **kwargs – KEYWORD arguments forwarded to bulk operation elasticsearch >= 7.10.2 / opensearch: https://opensearch.org/docs/opensearch/rest-api/document-apis/bulk/#url-parameters elasticsearch < 7.10.2: https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/rest-api-reference/#url-parameters

Returns

Response payload https://opensearch.org/docs/opensearch/rest-api/document-apis/bulk/#response.

Return type

Dict[str, Any]

Examples

Writing documents

>>> import awswrangler as wr
>>> client = wr.opensearch.connect(host='DOMAIN-ENDPOINT')
>>> wr.opensearch.index_documents(
...     documents=[{'_id': '1', 'value': 'foo'}, {'_id': '2', 'value': 'bar'}],
...     index='sample-index1'
... )