on collections sharded by "_id", Mongoid::Locker can't get a lock #19

mepatterson · 2013-09-11T04:47:34Z

(at least in my production environment)
On any collection sharded by the shard key "_id", this code

2.0.0p247 :001 > i = Item.first
2.0.0p247 :002 > i.with_lock do
2.0.0p247 :003 >     puts i.inspect
2.0.0p247 :004?>   end

throws the exception "Mongoid::Locker::LockError: could not get lock"
from line 148 in lib/mongoid/locker.rb 'lock'

a collection sharded by some other key doesn't seem to have this problem, nor does an unsharded collection.

afeld · 2013-09-11T05:30:26Z

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?

mepatterson · 2013-09-11T05:34:36Z

Yeah, man. I tried EVERYTHING. The only reason I figured it out is I had 3 collections, two sharded on "_id" and one sharded on some other field. The latter was the only one that didn't throw the locker exception. So I had my ops guy rebuild the other two collections with different shard keys and it started working perfectly, no code changes on my side.

Certainly open to the idea that you might discover something even more insidious going on, but that's what we determined.

I traced it down to your lock() method where you do the atomic check to see if something is locked or can acquire a lock (and then does it). On my "id" sharded collections, that would fail (return false) on a totally new, totally unlocked document with nils for all the locked* fields. At that point, I couldn't see an obvious problem, but you do use "_id" in your atomic query, so perhaps something going on there when a collection is sharded by _id?

On Sep 11, 2013, at 12:30 AM, Aidan Feldman [email protected] wrote:

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?

—
Reply to this email directly or view it on GitHub.

mepatterson · 2013-09-11T05:35:17Z

I set the retries to 20 or something and it just spun and spun and then threw the exception

On Sep 11, 2013, at 12:34 AM, "Matt E. Patterson" [email protected] wrote:

Yeah, man. I tried EVERYTHING. The only reason I figured it out is I had 3 collections, two sharded on "_id" and one sharded on some other field. The latter was the only one that didn't throw the locker exception. So I had my ops guy rebuild the other two collections with different shard keys and it started working perfectly, no code changes on my side.

Certainly open to the idea that you might discover something even more insidious going on, but that's what we determined.

I traced it down to your lock() method where you do the atomic check to see if something is locked or can acquire a lock (and then does it). On my "id" sharded collections, that would fail (return false) on a totally new, totally unlocked document with nils for all the locked* fields. At that point, I couldn't see an obvious problem, but you do use "_id" in your atomic query, so perhaps something going on there when a collection is sharded by _id?

On Sep 11, 2013, at 12:30 AM, Aidan Feldman [email protected] wrote:

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?

—
Reply to this email directly or view it on GitHub.

afeld · 2013-09-11T05:41:51Z

Sharding/replication are the things about Mongo I know the least about, so I might pop over to the MongoDB office hours they hold in NYC to see if they have ideas.

Just a stab in the dark, but what indexes do you have on that collection that fails? Any compound indexes that include the _id?

mepatterson · 2013-09-11T05:44:36Z

Nope. One of the two troubled collections has a bunch of compound indexes, but none with _id

On Sep 11, 2013, at 12:41 AM, Aidan Feldman [email protected] wrote:

Sharding/replication are the things about Mongo I know the least about, so I might pop over to the MongoDB office hours they hold in NYC to see if they have ideas.

Just a stab in the dark, but what indexes do you have on that collection that fails? Any compound indexes that include the _id?

—
Reply to this email directly or view it on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

on collections sharded by "_id", Mongoid::Locker can't get a lock #19

on collections sharded by "_id", Mongoid::Locker can't get a lock #19

mepatterson commented Sep 11, 2013

afeld commented Sep 11, 2013

mepatterson commented Sep 11, 2013

mepatterson commented Sep 11, 2013

afeld commented Sep 11, 2013

mepatterson commented Sep 11, 2013

on collections sharded by "_id", Mongoid::Locker can't get a lock #19

on collections sharded by "_id", Mongoid::Locker can't get a lock #19

Comments

mepatterson commented Sep 11, 2013

afeld commented Sep 11, 2013

mepatterson commented Sep 11, 2013

mepatterson commented Sep 11, 2013

afeld commented Sep 11, 2013

mepatterson commented Sep 11, 2013