diff options
| author | Tom Herbert <therbert@google.com> | 2013-01-22 09:50:39 +0000 | 
|---|---|---|
| committer | David S. Miller <davem@davemloft.net> | 2013-01-23 13:44:01 -0500 | 
| commit | 5ba24953e9707387cce87b07f0d5fbdd03c5c11b (patch) | |
| tree | c98e56f8a06f07ff585f85cbe6af8cd9c19f2ca6 /net/ipv6/inet6_hashtables.c | |
| parent | ba418fa357a7b3c9d477f4706c6c7c96ddbd1360 (diff) | |
| download | olio-linux-3.10-5ba24953e9707387cce87b07f0d5fbdd03c5c11b.tar.xz olio-linux-3.10-5ba24953e9707387cce87b07f0d5fbdd03c5c11b.zip  | |
soreuseport: TCP/IPv6 implementation
Motivation for soreuseport would be something like a web server
binding to port 80 running with multiple threads, where each thread
might have it's own listener socket.  This could be done as an
alternative to other models: 1) have one listener thread which
dispatches completed connections to workers. 2) accept on a single
listener socket from multiple threads.  In case #1 the listener thread
can easily become the bottleneck with high connection turn-over rate.
In case #2, the proportion of connections accepted per thread tends
to be uneven under high connection load (assuming simple event loop:
while (1) { accept(); process() }, wakeup does not promote fairness
among the sockets.  We have seen the  disproportion to be as high
as 3:1 ratio between thread accepting most connections and the one
accepting the fewest.  With so_reusport the distribution is
uniform.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv6/inet6_hashtables.c')
| -rw-r--r-- | net/ipv6/inet6_hashtables.c | 19 | 
1 files changed, 16 insertions, 3 deletions
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index dea17fd28e5..32b4a1675d8 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -158,25 +158,38 @@ static inline int compute_score(struct sock *sk, struct net *net,  }  struct sock *inet6_lookup_listener(struct net *net, -		struct inet_hashinfo *hashinfo, const struct in6_addr *daddr, +		struct inet_hashinfo *hashinfo, const struct in6_addr *saddr, +		const __be16 sport, const struct in6_addr *daddr,  		const unsigned short hnum, const int dif)  {  	struct sock *sk;  	const struct hlist_nulls_node *node;  	struct sock *result; -	int score, hiscore; +	int score, hiscore, matches = 0, reuseport = 0; +	u32 phash = 0;  	unsigned int hash = inet_lhashfn(net, hnum);  	struct inet_listen_hashbucket *ilb = &hashinfo->listening_hash[hash];  	rcu_read_lock();  begin:  	result = NULL; -	hiscore = -1; +	hiscore = 0;  	sk_nulls_for_each(sk, node, &ilb->head) {  		score = compute_score(sk, net, hnum, daddr, dif);  		if (score > hiscore) {  			hiscore = score;  			result = sk; +			reuseport = sk->sk_reuseport; +			if (reuseport) { +				phash = inet6_ehashfn(net, daddr, hnum, +						      saddr, sport); +				matches = 1; +			} +		} else if (score == hiscore && reuseport) { +			matches++; +			if (((u64)phash * matches) >> 32 == 0) +				result = sk; +			phash = next_pseudo_random32(phash);  		}  	}  	/*  |