Project

General

Profile

Bug #320

Budget inconsistencies

Added by Philippe Le Sager over 6 years ago. Updated over 5 years ago.

Status:
In Progress
Priority:
Low
Category:
-
Target version:
-
Start date:
02/20/2015
Due date:
% Done:

0%


Description

when comparing a run with 45 procs with one with 9 procs (both with reduced grid, only latitudinal mpi decomposition), output (mmix, restart) are identical bit-for-bit, except the budgets:

  check budini ...
      28432 (out of 39168) differences found in budini
      max diff. in : [8, 21, 33] (rel.diff. 0.000000 %)
      where field #1 = 6.9922226493487276e+28
      where field #2 = 6.9922226493487382e+28
  check budadvx ...
      38668 (out of 39168) differences found in budadvx
      max diff. in : [8, 21, 33] (rel.diff. 3500.000000 %)
      where field #1 = 7123649972633.0312
      where field #2 = 256451399014789.22
  check budadvy ...
      38516 (out of 39168) differences found in budadvy
      max diff. in : [8, 21, 33] (rel.diff. -0.000000 %)
      where field #1 = -2.1878826532546108e+27
      where field #2 = -2.1878826532549399e+27

while the 0% are expected (for aggregated budgets, the different mpi layout change the order of operations), the 3500% is puzzling and requires explanation.

History

#1 Updated by Philippe Le Sager over 6 years ago

  • Subject changed from X-advection budget to Budget inconsistencies

Further comparisons between 45 and 22 cores, still for a short 3h-run. I isolate three buggy budgets:

  check budadvx ...
      38320 (out of 39168) differences found in budadvx
      max diff. in : [8, 20, 33] (rel.diff. -114.423077 %)
      where field #1 = -61738299762819.633
      where field #2 = 8904562465791.291

  check budchem ...
      44700 (out of 50796) differences found in budchem
      max diff. in : [9, 22, 33] (rel.diff. 164.438222 %)
      where field #1 = 766920097792.0
      where field #2 = -494189674496.0

  check budconv_cp ...
      30714 (out of 39168) differences found in budconv_cp
      max diff. in : [1, 5, 14] (rel.diff. -6.499508 %)
      where field #1 = -2.2055309866594325e+22
      where field #2 = -2.3488796472526816e+22

Again, j-stat, mmix and restart files are bit-for-bit identical: this is purely a budget issue.

#2 Updated by Philippe Le Sager over 5 years ago

  • Priority changed from Normal to Low

Also available in: Atom PDF