meilisearch/MeiliSearch

Make the meilisearch integration tests run faster

Open

#4840 opened on Jul 30, 2024

View on GitHub
 (16 comments) (0 reactions) (0 assignees)Rust (20,887 stars) (733 forks)batch import
good first issuehelp wantedmaintenance

Description

Hello, fellow contributors! Thanks for coming here 🌟

In https://github.com/meilisearch/meilisearch/pull/4808, I started some work to make the test faster but didn't apply the techniques implemented to every test since there are so many of them, and that's where you can help us.

There are two tasks to do that I’ll highlight right below.

Patching the .wait_task(:task_id)

A common pattern you'll see in the integration tests is that we create a task and then immediately wait for it by writing .wait_task(0).await; or any other number. Here's an example: https://github.com/meilisearch/meilisearch/blob/abe128476fca60d683725f2598a365027ef5fbec/meilisearch/tests/tasks/mod.rs#L111-L134

This is really easy to get wrong and caused us a lot of flaky tests in the past; that's why we would like to get rid of all these calls and replace them with the actual task ID returned by Meilisearch.

For example, instead of doing:

    let index = server.index("test");
    index.create(None).await;
    index.wait_task(0).await;

We should write:

    let index = server.index("test");
    let (task, _code) = index.create(None).await;
    index.wait_task(task.uid()).await;

Also, did you notice how we're waiting for the task to finish, but then we don't check if it succeeded? When the task was supposed to succeed in the tests, you should also call .succeeded() after waiting for the task:

    let index = server.index("test");
    let (task, _code) = index.create(None).await;
    index.wait_task(task.uid()).await.succeeded();

Using a shared server when possible

A lot of tests are creating a new Meilisearch instance when they could instead share one with all the other tests. The rules to know if a test can use a shared server are:

  • It should use the default configuration; the server is initialized with let server = Server::new().await;
  • It must not modify the global state of the index:
    • Don't add an API-key
    • Don't modify the features
    • Don't access indexes that are not yours
  • ❗ An exception, though, is if the request is expected to fail in the test. For example, if the test try to delete an arbitrary index and then checks for its failure, in this case, we can implement a new method on the shared index, send the task, and ensure it fails; see this method for an example: https://github.com/meilisearch/meilisearch/blob/c45706936732aef16d5a07545e47534d59901a91/meilisearch/tests/common/server.rs#L218-L225

https://github.com/meilisearch/meilisearch/blob/abe128476fca60d683725f2598a365027ef5fbec/meilisearch/tests/search/geo.rs#L41-L63

This test, for example, only creates an index and modifies its content without altering any of the global state of the index. We can update the server initialization from:

    let server = Server::new().await;
    let index = server.index("test");

To:

    let server = Server::new_shared().await;
    let index = server.unique_index(); // we cannot hardcode the name of an index; otherwise, we might have conflicts between tests

Using a shared index when possible

The latest kind of big optimization possible is to reduce the number of times we index the same documents over and over again. For example, if you look at this file: https://github.com/meilisearch/meilisearch/blob/abe128476fca60d683725f2598a365027ef5fbec/meilisearch/tests/search/geo.rs You'll notice a common pattern: All the tests start by doing these operations and then do a single search request:

    let server = Server::new().await;
    let index = server.index("test");

    let documents = DOCUMENTS.clone();
    index.update_settings_filterable_attributes(json!(["_geo"])).await;
    index.update_settings_sortable_attributes(json!(["_geo"])).await;
    index.add_documents(documents, None).await;
    index.wait_task(2).await;

What we should do here is create a shared index between all these tests that index the documents only once and can then be used by everyone. Here's an example of how to do it: https://github.com/meilisearch/meilisearch/blob/c45706936732aef16d5a07545e47534d59901a91/meilisearch/tests/common/mod.rs#L200-L216

In this case, instead of creating a new set of DOCUMENTS and a new kind of shared_index_with_geo we could simply update the original DOCUMENTS set defined here and update then use the shared index function to get an index.

In the end, the test should look like this:

let index = shared_index_with_documents().await;

// The search requests

Once again, this optimization can only be applied to tests that don't modify the content of the index:

The file to updates

Here's an almost exhaustive list of files that need to be updated:

[!TIP] Please open one PR per file updated

What to do if I’m struggling

Please ping me and ask me questions on this issue; this will help others as well.

Contributor guide