Xamarin.iOS compiler non-deterministic, builds crashy apps?

I'm trying to understand why builds of my app crash, and the leading hypothesis is that the Xamarin.iOS compiler has race conditions that yield different binaries build-to-build.

I'm curious: is there a way to instruct Xamarin and mtouch to serialize its build steps? Looking at the console output, it appears that the AOT steps are parallelized and happen in random order.

Justification

The nature of this hypothesis is unusual -- bugs are seldom in the compiler. So, here is my justification.

Symptom

The crash happens when the app does interop with a C library entry point that provides an array of strings in an out parameter:

1 strlen
2 mono/object.c mono_string_new
3 mono/object.c mono_string_wrapper
4 MyCompany.MyApp.MyAssembly/NativeMethods:MakeNativeCall

The pointer given to strlen is not readable (access violation), even lldb won't read it.

What doesn't cause the crash

  1. The source code.
    • Without making any changes to any source file, I can build an app that won't crash.
  2. The Mac's hardware.
    • I booted to safe mode and restarted, which performs several core maintenance tasks ( http://support.apple.com/en-us/HT201262 )
    • I booted to the recovery disk and repaired the partition's folder permissions and the partition itself.
    • I booted to an install DVD to repair the disk and verified that S.M.A.R.T. status is "Verified".
    • I used the Apple Hardware Test utility to run full diagnostics, and no issues were found.

Why it might be a race condition in the compiler

  1. Slightly more than 50% of the builds have this crash.
  2. The stack trace is always the same.
  3. If a build crashes, it will always crash.
  4. Conversely, if a build doesn't crash, it will never crash.
  5. Only one Mac on my team has this problem:
    • Bad Mac: Mac Pro 5,1 with 2x 6-Core Intel Xeon and 16 GB RAM
    • Good Macs: Mac Mini 5,1 with 1x Intel Core i5 and 8 GB RAM
  6. Every Mac has the same toolchain:
    • OS X 10.9.5
    • Xamarin Studio 5.7 (build 661)
    • Mono 3.12.0 (detached/a813491)
    • Xcode 6.1.1 (6611)
    • Xamarin.iOS 8.4.0.47 (Business Edition)
  7. The previous toolchain had this same crash problem:
    • OS X 10.8.5
    • Xamarin Studio 4.2.5
    • Mono 3.2.7
    • Xcode 5.0.2
    • Xamarin.iOS 7.0.7.2 (Business edition)
  8. When I used Instruments to turn off 10 cores on the bad Mac to mimic a good Mac's capabilities, the next 10 consecutive builds were successful.

Summary

By changing one thing, the number of enabled cores on my Mac, I can eliminate the problem.

Answers

  • RolfBjarneKvingeRolfBjarneKvinge USXamarin Team Xamurai

    You don't need to make mtouch serialize its build steps (which isn't possible) in order to confirm your theory: you can verify each build step's output.

    1. Add "-v -v -v -v" to the additional mtouch arguments of the project's iOS Build options
    2. Create a build using the bad mac (with all the cores), and copy the <project dir>/obj/Debug/mtouch-cache directory somewhere safe. Verify that this is a bad build by running it and watch it crash.
    3. Create a build using the bad mac (with 1 core) and copy the same mtouch-cache directory somewhere else. Verify that this is a good build by running it.

    Now you can compare the two mtouch-cache directories and see if there are any significant differences in any of the files. In particular you're interested in the .a files (this is the ARM assembly the AOT compiler produces), which will likely be the easiest place to find differences. You can use the build log to get a timeline of what happens when in the build process.

    Have in mind that with if you do a clean rebuild in step 3, there will be some expected differences in the output (there are GUIDs and timestamps that change for every build for instance), so it might be an idea to create a second good (clean) build to see what's the expected differences are.

    All that said, can you show the code for the native API you're calling? There could be other reasons for the apparent randomness you're experiencing.

  • You don't need to make mtouch serialize its build steps (which isn't possible) in order to confirm your theory: you can verify each build step's output.

    1. Add "-v -v -v -v" to the additional mtouch arguments of the project's iOS Build options
    2. Create a build using the bad mac (with all the cores), and copy the /obj/Debug/mtouch-cache directory somewhere safe. Verify that this is a bad build by running it and watch it crash.
    3. Create a build using the bad mac (with 1 core) and copy the same mtouch-cache directory somewhere else. Verify that this is a good build by running it.

    Thanks. I was able to collect this information today. I will look for meaningful differences and report on anything interesting.

    All that said, can you show the code for the native API you're calling? There could be other reasons for the apparent randomness you're experiencing.

    Allow me to look at the differences first.

    Thanks for your response :-)

  • Now you can compare the two mtouch-cache directories and see if there are any significant differences in any of the files. In particular you're interested in the .a files (this is the ARM assembly the AOT compiler produces), which will likely be the easiest place to find differences. You can use the build log to get a timeline of what happens when in the build process.

    More Information

    • The debug builds never crash, even on the bad Mac. It's just release builds, and about 50% of the time.
    • My team updated to Xamarin.iOS 8.6.1.20 today. I'll be monitoring the new builds that follow as well.

    Synopsis

    • All of the assembly files are identical, except for the one that does interop with the native C library.
    • Furthermore, there is a group of differences after every native call that uses a string as one of its parameters.
    • Unfortunately, the other 1-core build has similar differences in similar places.

    I can't make sense of the assembly - it looks like a numeric representation of the binary - so I'd like to open a private bug report and ask for some insight which will allow me to give you more files and details.

    Process

    • I used -v -v -v -v and Xamarin Studio :: Preferences :: Projects :: Build :: Log verbosity: Diagnostic
    • I made two 1-core builds and confirmed they worked
    • I made one 24-core build and confirmed it crashed
    • For each build, I stashed the obj folder and the terminal output.

    Details

    There weren't any .a files, but there were plenty of .s files with matching .o files. The file utility reported that the .s files were assembly as well.

    $ pwd
    /Users/.../obj/iPhone/AppStore/mtouch-cache
    
    $ find . -name "*.a"
    
    $ find . -name "*.s" | wc -l
          20
    
    $ file Xamarin.iOS.dll.armv7.s
    Xamarin.iOS.dll.armv7.s: ASCII assembler program text
    
    $ file Xamarin.iOS.dll.armv7.o
    Xamarin.iOS.dll.armv7.o: Mach-O object arm
    

    Identical files

    These .s files were identical between the good and bad builds:

    Platform Assemblies

    • Mono.Dynamic.Interpreter.dll.armv7.s
    • mscorlib.dll.armv7.s
    • OpenTK-1.0.dll.armv7.s
    • System.Core.dll.armv7.s
    • System.dll.armv7.s
    • System.Numerics.dll.armv7.s
    • System.Runtime.Serialization.dll.armv7.s
    • System.Xml.dll.armv7.s
    • System.Xml.Linq.dll.armv7.s
    • Xamarin.iOS.dll.armv7.s

    My Assemblies

    • MyCompany.Common.dll.armv7.s
    • MyCompany.Controls.dll.armv7.s
    • MyCompany.Controls.Graphs.dll.armv7.s
    • MyCompany.Controls.Style.dll.armv7.s
    • MyCompany.Core.dll.armv7.s

    GUID and Version Number Differences

    These .s files only had GUID and Version Number differences:

    My Assemblies

    • MyCompany.MyApp.Controls.dll.armv7.s
    • MyCompany.MyApp.Resources.dll.armv7.s
    • MyCompany.MyApp.exe.armv7.s

    Significant Differences

    This .s file had 459 of diffs:

    My Assemblies

    • MyCompany.MyApp.Model.dll.armv7.s

    Here is a snippet from the Model assembly where there are differences after a native call using a string:

    Lme_4352:
    .text
        .align 2
        .no_dead_strip _wrapper_managed_to_native_MyCompany_NativeLibrary_NativeMethods_BeginSearch_uint__string_
    _wrapper_managed_to_native_MyCompany_NativeLibrary_NativeMethods_BeginSearch_uint__string_:
    
        .byte 128,64,45,233,13,112,160,225,13,192,160,225,240,95,45,233,176,208,77,226,0,96,160,225,4,16,141,229,0,0,159,229
        .byte 0,0,0,234
        .long _mono_aot_MyCompany_MyApp_Model_got - .
        .byte 0,0,159,231
    bl _pthread_getspecific
    
        .byte 8,0,128,226,8,16,141,226,4,0,129,229,0,192,144,229,0,192,129,229,0,16,128,229,12,208,129,229,20,176,129,229
        .byte 15,192,160,225,16,192,129,229,0,0,160,227,0,0,141,229,0,0,160,227,0,0,141,229,6,0,160,225,13,16,160,225
    bl _p_6389
    
        .byte 0,96,160,225,0,0,159,229,0,0,0,234
        .long _mono_aot_MyCompany_MyApp_Model_got - . + 14792
        .byte 0,0,159,231,0,0,144,229,0,0,80,227,50,0,0,26,0,0,157,229,0,0,80,227,39,0,0,10,96,2,11,227
        .byte 121,9,64,227
    bl _p_6390
    bl _p_6391
    
        .byte 0,32,160,225,0,16,157,229,2,0,160,225,0,32,146,229,0,128,159,229,0,0,0,234
        .long _mono_aot_MyCompany_MyApp_Model_got - . + 27220
        .byte 8,128,159,231,4,224,143,226,36,240,18,229,0,0,0,0,4,16,157,229,0,0,129,229,161,20,160,225,0,32,159,229
        .byte 0,0,0,234
        .long _mono_aot_MyCompany_MyApp_Model_got - . -4
        .byte 2,32,159,231,2,16,129,224,1,32,160,227,0,32,193,229,224,14,8,227,121,9,64,227
    bl _p_6390
    bl _p_6391
    
        .byte 0,32,160,225,0,16,157,229,2,0,160,225,0,32,146,229,0,128,159,229,0,0,0,234
        .long _mono_aot_MyCompany_MyApp_Model_got - . + 27224
        .byte 8,128,159,231,4,224,143,226,16,240,18,229,0,0,0,0,6,0,160,225,8,192,157,229,12,224,157,229,0,192,142,229
        .byte 184,208,141,226,192,31,189,232,4,208,141,226,128,128,189,232
    bl _p_4253
    
        .byte 202,255,255,234
    
  • RolfBjarneKvingeRolfBjarneKvinge USXamarin Team Xamurai
    edited February 2015

    [...] so I'd like to open a private bug report and ask for some insight which will allow me to give you more files and details.

    You can file private bug reports only Xamarin employees have access to here: http://bugzilla.xamarin.com

  • To summarize this bug:

    My app uses a custom marshaller for arrays when interoperating with a native library, and there was bug in the Xamarin implementation that calls ICustomMarshaler.GetInstance.

    So, for other readers:

    1. Avoid creating and using an ICustomMarshaler until the bug is fixed.
    2. In the mean time, to marshal:
      • strings, use [MarshalAs(UnmanagedType.LPWStr)]
      • arrays of anything, use IntPtr and manage memory explicitly -- see bug 24638 and bug 27614

    (Ed note: man, the CSS needs a tune-up.)

Sign In or Register to comment.