Operation Flows and Hooks
This article introduces the important flows in Paperlib, as well as the hooks available in these operation flows.
Introduction to Hooks
In Paperlib, we have placed hooks with different names in different places. A hook extension can register to the corresponding hook point to intervene in the operation flow of Paperlib. There are two types of hook points.
Modify Hook Points
Purpose: This type of hook point is used to modify the arguments passed by the hook point, or the variables within the argument objects, but cannot change the type, and finally returns the modified arguments.
Type of Callback Return Value: Modify Hook requires the return value of the callback function to be an array, each element of which corresponds to the input argument array. For example, if the arguments passed by the hook point are
(arg1: string, arg2: {value: number}), you can modifyarg1to another string andarg2.valueto another number in the callback function of the hook, but you cannot change the type ofarg1to a number, or change the type ofarg2to another type. The return value of the callback function must be an array composed of modified arguments:[arg1, arg2]. Note that even if only one argument is passed in, an array with one element needs to be returned. Because the input arguments are always treated as an arguments array
Transform Hook Points
- Purpose: This type of hook point can modify the data flow in the operation process of Paperlib. It is used to transform the input arguments into other forms of data and then return.
- Callback Return Value Type: It can be other types, but usually different hook points have expected return value types. For example, the
scrapeEntryhook point expects the return value type to be an array ofPaperEntity.
For information on how to register hooks and how to write hook extensions, please refer to Hook extensions.
Paper Import Process
Whether it is imported by dragging files or imported from browser extensions, after forming the corresponding source payload, it will enter the paper import process. The diagram of the paper import process is as follows:
In this process, the main hook points are in the scrapeEntry() and scrapeMetadata() methods of ScrapeService.
scrapeEntry()
The main task of scrapeEntry() is to convert these data from different types of sources into the internal data structure PaperEntity of Paperlib, and fill in the important fields of PaperEntity as much as possible (such as: title, doi, arxivID, etc.) for the subsequent scrapeMetadata() method to search and complete its paper metadata.
After receiving the import of the paper source, we first call scrapeEntry(). Its main argument is an array of SourcePayload, that is, the data of the paper source. source payload contains the type field indicating the type of source, and the data of the source:
interface SourcePayload {
type: "file" | "webcontent";
// For file type, value is the path of the file
// For webcontent type, value is WebContentSourcePayload
value: string | WebContentSourcePayload;
}
interface WebContentSourcePayload {
url: string; // Source page's url
document: string; // Source page's html
cookies?: string; // Some pages may contain cookies
}The return type of scrapeEntry() is an array of PaperEntity, because even if the SourcePayload array only contains one source payload, it may contain multiple papers. For example, when importing by dragging files, the dragged file might be a BibTex file, which contains information about multiple papers.
scrapeMetadata()
The return value of scrapeEntry() will be passed into the scrapeMetadata() method. The main task of scrapeMetadata() is to search for the paper's metadata from various databases on the internet and complete all fields in PaperEntity. The return type of scrapeMetadata() is also an array of PaperEntity.
As shown in the diagram, there are six hook points in this operation flow, five of the Modify type and one of the Transform type.
beforeScrapeEntry
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the very beginning of the scrapeEntry() method, before the SourcePayload being converted to PaperEntity |
| Callback arguments | SourcePayload[] |
| Callback Return Value | ArgumentArray<SourcePayload[]> |
scrapeEntry
| Parameter | Value |
|---|---|
| Type | Transform |
| Location | The main hook point of scrapeEntry(), accepts SourcePayload 数组 and outputs PaperEntity 数组 |
| Callback arguments | SourcePayload[] |
| Callback Return Value | PaperEntity[] |
afterScrapeEntry
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the end of the scrapeEntry() method, after the SourcePayload being converted to PaperEntity |
| Callback arguments | PaperEntity[] |
| Callback Return Value | ArgumentArray<PaperEntity[]> |
beforeScrapeMetadata
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the very beginning of the scrapeMetadata() method, before searching for metadata |
| Callback arguments | paperEntities: PaperEntity[], scrapers: string[], force: boolean |
| Callback Return Value | ArgumentArray<paperEntities: PaperEntity[], scrapers: string[], force: boolean> |
Here, scrapers is an array of strings. If it is not empty, it means that the user has chosen to search with specific scrapers. force indicates whether to force the search. If true, it will ignore the existing metadata in PaperEntity and force the search.
scrapeMetadata
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | The main hook point of scrapeMetadata(), accepts an array of PaperEntity, can modify each property and return |
| Callback arguments | paperEntities: PaperEntity[], scrapers: string[], force: boolean |
| Callback Return Value | ArgumentArray<paperEntities: PaperEntity[], scrapers: string[], force: boolean> |
afterScrapeMetadata
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the end of the scrapeMetadata() method, after searching for metadata |
| Callback arguments | paperEntities: PaperEntity[], scrapers: string[], force: boolean |
| Callback Return Value | ArgumentArray<paperEntities: PaperEntity[], scrapers: string[], force: boolean> |
Paper PDF Locating
When a paper does not have a corresponding PDF file, Paperlib will display a button in the detail panel to locate for available PDFs on the internet and download them. After clicking this button, it will enter the paper PDF locating process. The main function of the paper PDF locating process is the locateFileOnWeb() method of FileService. There is one available hook in this process.
locateFile
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | Inside the locateFileOnWeb() method |
| Callback arguments | PaperEntity[] |
| Callback Return Value | ArgumentArray<PaperEntity[]> |
Reference Exporting Process
In general, when exporting references, we need to get the array of PaperEntity to be exported, and then convert it into a citation.js's Cite object. Finally, we convert the Cite object into a string of the corresponding format, such as a BibTex string. For details, please refer to the various functions of ReferenceService in the GitHub code.
beforeExportBibItem
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the very beginning of the exportBibItem() method |
| Callback arguments | PaperEntity[] |
| Callback Return Value | ArgumentArray<PaperEntity[]> |
citeObjCreatedInExportBibItem
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | Inside the exportBibItem() method, after the Cite object has been created |
| Callback arguments | cite: Cite, paperEntities: PaperEntity[] |
| Callback Return Value | ArgumentArray<cite: Cite, paperEntities: PaperEntity[]> |
afterExportBibItem
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the end of the exportBibItem() method |
| Callback arguments | string |
| Callback Return Value | ArgumentArray<string> |
beforeExportBibTexKey
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the beginning of the exportBibTexKey() method |
| Callback arguments | PaperEntity[] |
| Callback Return Value | ArgumentArray<PaperEntity[]> |
citeObjCreatedInExportBibTexKey
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | Inside the exportBibTexKey() method, after the Cite object has been created |
| Callback arguments | cite: Cite, paperEntities: PaperEntity[] |
| Callback Return Value | ArgumentArray<cite: Cite, paperEntities: PaperEntity[]> |
afterExportBibTexKey
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the end of the exportBibTexKey() method |
| Callback arguments | string |
| Callback Return Value | ArgumentArray<string> |
beforeExportBibTexBody
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the beginning of the exportBibTexBody() method |
| Callback arguments | PaperEntity[] |
| Callback Return Value | ArgumentArray<PaperEntity[]> |
citeObjCreatedInExportBibTexBody
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | Inside the exportBibTexBody() method, after the Cite object has been created |
| Callback arguments | cite: Cite, paperEntities: PaperEntity[] |
| Callback Return Value | ArgumentArray<cite: Cite, paperEntities: PaperEntity[]> |
afterExportBibTexBody
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the end of the exportBibTexBody() method |
| Callback arguments | string |
| Callback Return Value | ArgumentArray<string> |
beforeExportPlainText
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the beginning of the exportPlainText() method |
| Callback arguments | PaperEntity[] |
| Callback Return Value | ArgumentArray<PaperEntity[]> |
citeObjCreatedInExportPlainText
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | Inside the exportPlainText() method, after the Cite object has been created |
| Callback arguments | cite: Cite, paperEntities: PaperEntity[] |
| Callback Return Value | ArgumentArray<cite: Cite, paperEntities: PaperEntity[]> |
afterExportPlainText
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the end of the exportPlainText() method |
| Callback arguments | string |
| Callback Return Value | ArgumentArray<string> |
Fuzzy Metadata Scraping Process
When the user clicks on the fuzzy search, the selected paper will go through this process to get the metadata of the fuzzy search.
In this process, the main hook points are in the fuzzyScrape() method of ScrapeService.
The fuzzyScrape() method accepts an array of PaperEntity and outputs a mapping of the id of each paper to the candidate metadata.
The main available hooks are as follows:
`beforeFuzzyScrape
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the very beginning of the fuzzyScrape() method |
| Callback arguments | PaperEntity[] |
| Callback Return Value | ArgumentArray<PaperEntity[]> |
fuzzyScrape
| Parameter | Value |
|---|---|
| Type | Transform |
| Location | The main hook point of fuzzyScrape(), accepts an array of PaperEntity, and outputs an array of PaperEntity arrays |
| Callback arguments | paperEntities: PaperEntity[] |
| Callback Return Value | PaperEntity[][] |
afterFuzzyScrape
| Parameter | Value |
|---|---|
| Type | Modify |
| Location | At the end of the fuzzyScrape() method, after searching for metadata |
| Callback arguments | paperEntityDraftCandidates: PaperEntity[][] |
| Callback Return Value | ArgumentArray<paperEntityDraftCandidates: PaperEntity[][]> |