> For the complete documentation index, see [llms.txt](https://docs-old.app.metamaze.eu/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs-old.app.metamaze.eu/overview-of-project-steps.md).

# Overview of Project Steps

Below we give a high-level overview of the different steps you need to take to get your project up and running. We start with how to get your project from zero to production and end with how to maintain a successful project.

## Building a new extraction model from scratch

![Process for training the first model from scratch.](https://imgr.whimsical.com/object/S3PHLMgJDtbPJAS8K61vGJ)

1. To start, [Create a new project](/getting-started/project-management/create-a-project.md) in the [Overview](/overview.md)
2. Define the document types and entities you need in [Document types](/project/project-settings/document-types.md) & [Entities](/project/project-settings/entities.md)
3. Create guidelines, take into account [Guidelines to annotate correctly](/overview-of-project-steps/how-to-annotate-correcty.md)
4. [Uploads](/project/training/uploads.md#upload-documents-in-training). We recommend uploading at least 500 documents from the start. You don't need to annotate all of them immediately, Metamaze will automatically select  which ones are useful to annotate and add them to the suggested annotation tasks.
5. Annotate your initial training data from scratch, best practices are explained in [The data annotation process](/overview-of-project-steps/the-data-annotation-process.md). It is enough to only annotate about 10-30 documents (for example one per layout) before triggering the first training.&#x20;
6. Update the annotation guidelines based on your findings, they should not leave room for interpretation.&#x20;
7. Create a review task for training data to make sure it is correct, see [Tasks](/project/training/tasks.md)
8. Train a model for the first time as described in [Model management](/project/training/model-management.md)

After your first model training, you are able to use the suggested tasks in the task module where Metamaze uses automatic misannotation and active learning to further improve your model. Active learning is used to select which documents contain the most value to add to the model so you don't waste time annotating documents that are already well supported.

![Process for iteratively improving an existing model until it is accurate enough](https://imgr.whimsical.com/object/WPRuGSqpXRDd1HFarvYrVk)

1. Create a suggested review task for training data, see [Tasks](/project/training/tasks.md)
2. Create suggested annotation task for training data, see [Tasks](/project/training/tasks.md). We recommend retraining the model after you have added about 50 new documents. That way, the model recalculates which are the optimal selected documents to add next.&#x20;
3. Train the model again
4. If accuracy is not OK, go back to 1. and start another iteration, correcting old annotations and adding new documents from scratch.
5. Deploy model if the accuracy is fine

### An example on how accuracy evolves with each project step

![Accuracy evolution on an unstructured document type with no recurring layouts in two languages.](/files/-MdIBeGK0NJVOsNgvEHt)

## Improving the model in production using human-in-the-loop corrections

To make sure your automation rate stays high and improves over time, it's important to maintain the models you have trained by making them learn from corrections.&#x20;

A typical production process looks like this

1. You **upload new documents** in the production pipeline. If they are fully automatically processed, typically no action is needed.
2. For the documents that could not be automatically processed, go to the Human Validation section and **perform validations** on predictions to process the production documents.&#x20;

Documents that required human validation are automatically added as potential training data data with a status `Input needed`. For the model to learn from them, they need to be validated in a review task in order to be taken into account for training.

If you want to improve your models based on production validations, follow these steps:

1. In the [Tasks](/project/training/tasks.md) module, create a **suggested review task for production data**. This will create a task to verify all documents that required human validation to promote them to "golden" training data.
2. Verify all annotations and add missing ones (do not forget to label all occurrences of an entity value in the relevant context) and mark the documents as Done. They will be included in the next training.&#x20;
3. After the task has been completed, **retrain the model** in the Model Management module. Depending on the number of documents and pages, this can take anything from 30 minutes to more than a day.
4. After training has been completed, check if the accuracy is okay. Since you are only adding the hardest documents, you might see that the calculated accuracy goes down, but your production accuracy will go up. You can test a model without deploying it by taking a look at the newly created suggested tasks. These tasks contain predictions from the most recently trained model.
5. **Deploy the model** to start using it in production.
6. New uploads in production will get better predictions and have learned from past corrections.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs-old.app.metamaze.eu/overview-of-project-steps.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
