Understanding performance: pthread_setspecific, TlsGetValue

I'm digging into the performance of some AI code for my game, and I'm seeing interesting things in Instruments that I'd like to understand better. If a MonoTouch AOT guru could share insights, I'd greatly appreciate it.

I have a class that maintains an array and a method that just looks up an element in that array by doing some math on an x/y coordinate and then returning a value. This is called a lot by my AI code, and I see the performance behavior in Instruments shown by the attached image, indicating that a lot of time is spent calling pthread_setspecific as a result of a call to TlsGetValue and in turn mono_get_lmfaddr. I'm curious as to what's going on here, and whether I can speed up this array access.

The call to emul_op_irem_int_int makes me think that maybe I'm triggering something weird here:

    protected static int index (int x, int y) {
        int wx = (x + 1024*SIZE) % SIZE, wy = (y + 1024*SIZE) % SIZE;
        return wy * SIZE + wx;
    }

I suspect this method is getting inlined into FXGrid.get so it doesn't show up in Instruments. The get methods are simply:

    public TileEffect get (Coord coord) {
        return get(coord.x, coord.y);
    }
    public TileEffect get (int x, int y) {
        return _grid[index(x, y)];
    }

Excuse the fact that this is Java code, my app is actually written in Java and converted to CLR bytecode by IKVM, but I presume that isn't the source of this weirdness. Here's the CLR bytecode for the index() method if that's more useful:

// method line 15540
.method famorassem static hidebysig 
       default int32 index (int32 x, int32 y)  cil managed 
{
    .custom instance void class [IKVM.Runtime]IKVM.Attributes.LineNumberTableAttribute::'.ctor'(unsigned int8[]) =  (01 00 05 00 00 00 9F 70 62 7F 08 00 00 ) // .......pb....

    // Method begins at RVA 0x9d2b4
    // Code size 47 (0x2f)
    .maxstack 4
    .locals init (
            int32   V_0,
            int32   V_1)
    IL_0000:  nop 
    IL_0001:  nop 
    IL_0002:  ldarg.0 
    IL_0003:  ldc.i4 7168
    IL_0008:  add 
    IL_0009:  ldc.i4.7 
    IL_000a:  dup 
    IL_000b:  ldc.i4.m1 
    IL_000c:  bne.un.s IL_0013

    IL_000e:  pop 
    IL_000f:  pop 
    IL_0010:  ldc.i4.0 
    IL_0011:  br.s IL_0014

    IL_0013:  rem 
    IL_0014:  stloc.0 
    IL_0015:  ldarg.1 
    IL_0016:  ldc.i4 7168
    IL_001b:  add 
    IL_001c:  ldc.i4.7 
    IL_001d:  dup 
    IL_001e:  ldc.i4.m1 
    IL_001f:  bne.un.s IL_0026

    IL_0021:  pop 
    IL_0022:  pop 
    IL_0023:  ldc.i4.0 
    IL_0024:  br.s IL_0027

    IL_0026:  rem 
    IL_0027:  stloc.1 
    IL_0028:  nop 
    IL_0029:  ldloc.1 
    IL_002a:  ldc.i4.7 
    IL_002b:  mul 
    IL_002c:  ldloc.0 
    IL_002d:  add 
    IL_002e:  ret 
} // end of method FXGrid::index

Thanks in advance!

Posts

  • samskivertsamskivert USMember

    Looking at other routes into these same calls, I now suspect it has something to do with handling div0 exceptions. Is there a way to disable that for production builds?

  • samskivertsamskivert USMember

    Hrm, I just dug up /optimize:unsafe but enabling that does not seem to eliminate these calls as shown by Instruments.

  • MigueldeIcazaMigueldeIcaza USXamarin Team Xamurai

    Can you post the full screenshot?

    It is likely that one of the math operations is calling a helper function in C.

  • samskivertsamskivert USMember

    Weird. Now the screenshot I attached originally seems not to be attached to the post. Here's the shot I originally intended to attach. Are you saying you want to see more of the callstack?

  • samskivertsamskivert USMember

    More weird. Now the original screenshot is showing up again. So clearly you wanted to see something else. I'm not sure what else I can show. This is from the time profiler, so I'm just looking at methods where a lot of time is being spent. I can post a full screenshot of the profiler window if that has more useful info.

  • samskivertsamskivert USMember

    Anyhow, even if Mono is calling into a C helper function, it seems strange that a math emulation function would do something that requires a call into pthreads and setting thread-local data. That's why I suspected it was relating to div0 exception handling or something, and that /optimize:unsafe would disable it, but either /optimize:unsafe doesn't disable div0 exception handling or my hunch that it's related to said exception handling is wrong. Probably the latter.

  • samskivertsamskivert USMember

    Oh, I should point out that the screenshot is showing an inverted callstack. pthread_setspecific is being called by TlsGetValue is being called by mono_get_lmf_adder etc.

  • RolfBjarneKvingeRolfBjarneKvinge USXamarin Team Xamurai

    Have you tried enabling LLVM? It might produce better/faster code.

  • RolfBjarneKvingeRolfBjarneKvinge USXamarin Team Xamurai

    It looks very strange that you end up doing tls access for pure math. If you file a bug in bugzilla with a test case we'll have a look at what's going on.

Sign In or Register to comment.