mimblewimble/grin

Pool reconcile performance problem and optimization

Open

#3003 opened on Aug 27, 2019

View on GitHub
 (0 comments) (0 reactions) (0 assignees)Rust (4,876 stars) (991 forks)batch import
enhancementhelp wanted

Description

On a test on Floonet, when the txpool size is a little bit bigger (for example about 200 transactions), I saw the reconcile function is quite slow, need about 2 seconds. Imagine one day the Grin reaching the Bitcoin transaction level, there could be 10,000 unconfirmed transactions in the transaction pool, then I can't imagine how long it need to call a reconcile function.

The source code refers to here https://github.com/mimblewimble/grin/blob/v2.0.0/pool/src/pool.rs#L286-L301

	pub fn reconcile(
		&mut self,
		extra_tx: Option<Transaction>,
		header: &BlockHeader,
	) -> Result<(), PoolError> {
		let existing_entries = self.entries.clone();
		self.entries.clear();

		let mut extra_txs = vec![];
		if let Some(extra_tx) = extra_tx {
			extra_txs.push(extra_tx);
		}

		for x in existing_entries {
			let _ = self.add_to_pool(x, extra_txs.clone(), header);
		}

		Ok(())
	}

The main reason for this slowness is the design here:

  • Each reconcile will clear all the txs of the pool totally, by self.entries.clear()
  • Re-add these transactions one by one. (add_to_pool will just simply fail if a transaction should be removed from the pool)

Another key point is here:

For example for a stempool reconcile, each add_to_pool will aggregate all added transactions of stem pool and all the transactions from the txpool, to launch a final transaction validation by validate_raw_tx (https://github.com/mimblewimble/grin/blob/v2.0.0/pool/src/pool.rs#L194 ).

To estimate the workload of this reconcile, suppose we have n transactions in stempool and m transactions in txpool, then a reconcile function will need verify transactions totally:

    n*(n+1)/2 + n*m

For giving a real image about the workload, let's give an example as n=100, m=100:

    n=m=100
    n*(n+1)/2 + n*m = 15,050

the workload of 200 transactions is similar as 15,050 transactions. Even the heaviest calculation part (bulletproof verification, kernel signature verification) is released by the VerifierCache, this is still too much over expectation.

And in case n=100 and m=10,000, the workload of a reconcile call is about 1M transaction verification, which could need about 3 minutes for one function call!

I believe we need a big optimization here.

The first simple optimization I can imagine could be just removing the duplicated transactions from stempool if already seen in txpool, instead of clearing the whole stempool and adding back one by one.

Hope to hear your comments on this, thanks~

Contributor guide