Releases: open-power/skiboot
v5.9.7
skiboot-5.9.7
skiboot 5.9.7 was released on Friday December 22nd, 2017. It replaces
skiboot-5.9.6 as the current stable release in the 5.9.x series.
Over skiboot-5.9.6, we have two bug fixes, they are:
-
phb4: Change PCI MMIO timers
Currently we have a mismatch between the NCU and PCI timers for MMIO
accesses. The PCI timers must be lower than the NCU timers otherwise
it may cause checkstops.This changes PCI timeouts controlled by skiboot to 33-50ms. It
should be forwards and backwards compatible with expected hostboot
changes to the NCU timer. -
p8-i2c: Limit number of retry attempts
Currently we will attempt to start an I2C transaction until it
succeeds. In the event that the OCC does not release the lock on an
I2C bus this results in an async token being held forever and the
kernel thread that started the transaction will block forever while
waiting for an async completion message. Fix this by limiting the
number of attempts to start the transaction.
v5.9.6
skiboot-5.9.6
skiboot 5.9.6 was released on Friday December 15th, 2017. It replaces
skiboot-5.9.5 as the current stable release in the 5.9.x series.
Over skiboot-5.9.5, we have a few bug fixes, they are:
-
sensors: occ: Skip counter type of sensors
Don't add counter type of sensors to device-tree as they don't fit
into hwmon sensor interface. -
p9_stop_api updates to support IMC across deep stop states.
-
opal/xscom: Add recovery for lost core wakeup scom failures.
Due to a hardware issue where core responding to scom was delayed
due to thread reconfiguration, leaves the SCOM logic in a state
where the subsequent scom to that core can get errors. This is
affected for Core PC scom registers in the range of
20010A80-20010ABFThe solution is if a xscom timeout occurs to one of Core PC scom
registers in the range of 20010A80-20010ABF, a clearing scom write
is done to 0x20010800 with data of '0x00000000' which will also get
a timeout but clears the scom logic errors. After the clearing write
is done the original scom operation can be retried.The scom timeout is reported as status 0x4 (Invalid address) in
HMER[21-23].
v5.9.5
skiboot-5.9.5
skiboot 5.9.5 was released on Wednesday December 13th, 2017. It replaces
skiboot-5.9.4 as the current stable release in the 5.9.x series.
Over skiboot-5.9.4, we have a few bug fixes, they are:
-
Fix extremely rare race in timer code.
-
xive: Ensure VC informational FIRs are masked
Some HostBoot versions leave those as checkstop, they are harmless
and can sometimes occur during normal operations. -
xive: Fix occasional VC checkstops in xive_reset
The current workaround for the scrub bug described in
__xive_cache_scrub() has an issue in that it can leave dirty
invalid entries in the cache.When cleaning up EQs or VPs during reset, if we then remove the
underlying indirect page for these entries, the XIVE will checkstop
when trying to flush them out of the cache.This replaces the existing workaround with a new pair of workarounds
for VPs and EQs:-
The VP one does the dummy watch on another entry than the one we
scrubbed (which does the job of pushing old stores out) using an
entry that is known to be backed by a permanent indirect page. -
The EQ one switches to a more efficient workaround
: which consists of doing a non-side-effect ESB load from the EQ's
ESe control bits. -
-
io: Add load_wait() helper
This uses the standard form twi/isync pair to ensure a load is
consumed by the core before continuing. This can be necessary under
some circumstances for example when having the following sequence:- Store reg A
- Load reg A (ensure above store pushed out)
- delay loop
- Store reg A
IE, a mandatory delay between 2 stores. In theory the first store is
only guaranteed to rach the device after the load from the same
location has completed. However the processor will start executing
the delay loop without waiting for the return value from the load.This construct enforces that the delay loop isn't executed until the
load value has been returned. -
xive: Do not return a trigger page for an escalation interrupt
This is bogus, we don't support them. (Thankfully the callers didn't
actually try to use this on escalation interrupts). -
xive: Mark a freed IRQ's IVE as valid and masked
Removing the valid bit means a FIR will trip if it's accessed
inadvertently. Under some circumstances, the XIVE will speculatively
access an IVE for a masked interrupt and trip it. So make sure that
freed entries are still marked valid (but masked). -
hw/nx: Fix NX BAR assignments
The NX rng BAR is used by each core to source random numbers for the
DARN instruction. Currently we configure each core to use the NX rng
of the chip that it exists on. Unfortunately, the NX can be
deconfigured by hostboot and in this case we need to use the NX of a
different chip.This patch moves the BAR assignments for the NX into the normal
nx-rng init path. This lets us check if the normal (chip local) NX
is active when configuring which NX a core should use so that we can
fallback gracefully.
v5.9.4
skiboot-5.9.4
skiboot 5.9.4 was released on Wednesday November 29th, 2017. It replaces
skiboot-5.9.3 as the current stable release in the 5.9.x series.
Over skiboot-5.9.3, we have one NPU2/NVLink2 fix that works around a
potential glitch (the one skiboot-5.9.3 would hard crash on rather than
let a system continue to run until it mysteriously crashed later on).
That fix is in two parts:
-
npu2: hw-procedures: Change phy_rx_clock_sel values to recover
from a potential glitch. -
npu2: hw-procedures: Manipulate IOVALID during training
Ensure that the IOVALID bit for this brick is raised at the start of
link training, in the reset_ntl procedure.Then, to protect us from a glitch when the PHY clock turns off or
gets chopped, lower IOVALID for the duration of the phy_reset and
phy_rx_dccal procedures.
v5.9.3
skiboot-5.9.3
skiboot 5.9.3 was released on Wednesday November 22nd, 2017. It replaces
skiboot-5.9.2 as the current stable release in the 5.9.x series.
Over skiboot-5.9.2, we have one NPU2/NVLink2 fix that causes the machine
to crash hard in the event of hardware error rather than crash
mysteriously later on whenever the NVLink2 links are used.
That fix is:
-
npu2: hw-procedures: Add check_credits procedure
As an immediate mitigator for a current hardware glitch, add a
procedure that can be used to validate NTL credit values. This will
be called as a safeguard to check that link training succeeded.Assert that things are exactly as we expect, because if they aren't,
the system will experience a catastrophic failure shortly after the
start of link traffic.
v5.9.2
skiboot-5.9.2
skiboot 5.9.2 was released on Thursday November 16th, 2017. It replaces
skiboot-5.9.1 as the current stable release in the 5.9.x series.
Over skiboot-5.9.1, we have a few PHB4 (PCI) fixes, an i2c fix for
POWER9 platforms to avoid conflicting with the OCC use and an important
NPU2 (NVLink2) fix.
-
phb4: Fix lane equalisation setting
Fix cut and paste from phb3. The sizes have changes now we have
GEN4, so the check here needs to change alsoWithout this we end up with the default settings (all '7') rather
than what's in HDAT. -
phb4: Fix PE mapping of M32 BAR
The M32 BAR is the PHB4 region used to map all the non-prefetchable
or 32-bit device BARs. It's supposed to have its segments remapped
via the MDT and Linux relies on that to assign them individual PE#.However, we weren't configuring that properly and instead used the
mode where PE# == segment#, thus causing EEH to freeze the wrong
device or PE#. -
phb4: Fix lost bit in PE number on config accesses
A PE number can be up to 9 bits, using a uint8_t won't fly..
That was causing error on config accesses to freeze the wrong PE.
-
phb4: Update inits
New init value from HW folks for the fence enable register.
This clears bit 17 (CFG Write Error CA or UR response) and bit 22
(MMIO Write DAT_ERR Indication) and sets bit 21 (MMIO CFG Pending
Error) -
npu2: Move to new GPU memory map
There are three different ways we configure the MCD and memory map.
- Old way (current way) Skiboot configures the MCD and puts GPUs
at 4TB and below - New way with MCD Hostboot configures the MCD and skiboot puts
GPU at 4TB and above - New way without MCD No one configures the MCD and skiboot puts
GPU at 4TB and below
The change keeps option 1 and adds options 2 and 3.
The different configurations are detected using certain scoms (see
patch).Option 1 will go away eventually as it's a configuration that can
cause xstops or data integrity problems. We are keeping it around to
support existing hostboot.Option 2 supports only 4 GPUs and 512GB of memory per socket.
Option 3 supports 6 GPUs and 4TB of memory but may have some
performance impact. - Old way (current way) Skiboot configures the MCD and puts GPUs
-
p8-i2c: Don't write the watermark register at init
On P9 the I2C master is shared with the OCC. Currently the watermark
values are set once at init time which is bad for two reasons:a) We don't take the OCC master lock before setting it. Which may
cause issues if the OCC is currently using the master.
b) The OCC might change the watermark levels and we need to reset
them.Change this so that we set the watermark value when a new
transaction is started rather than at init time.
v5.9.1
skiboot-5.9.1
skiboot 5.9.1 was released on Tuesday November 14th, 2017. It replaces
skiboot-5.9 as the current stable release in the 5.9.x series.
Over skiboot-5.9, we have two NPU2 (NVLink2) fixes and two XIVE bug
fixes:
-
npu2: hw-procedures: Refactor reset_ntl procedure
Change the implementation of reset_ntl to match the latest
programming guide documentation. -
npu2: hw-procedures: Add phy_rx_clock_sel()
Change the RX clk mux control to be done by software instead of HW.
This avoids glitches caused by changing the mux setting. -
xive: Fix ability to clear some EQ flags
We could never clear "unconditional notify" and "escalate"
-
xive: Update inits for DD2.0
This updates some inits based on information from the HW designers.
This includes enabling some new DD2.0 features that we don't yet
exploit.
v5.9
skiboot-5.9
skiboot v5.9 was released on Tuesday October 31st 2017. It is the first
release of skiboot 5.9 and becomes the new stable release of skiboot
following the 5.8 release, first released August 31st 2017. In this cyle
we have had five release candidate releases, mostly centered around bug
fixing for POWER9 platforms.
This release should be considered suitable for early-access POWER9
systems.
skiboot v5.9 contains all bug fixes as of skiboot-5.4.8 and
skiboot-5.1.21 (the currently maintained stable releases). There may be
some 5.9.x stable releases, depending on what issues are found.
For how the skiboot stable releases work, see stable-rules for details.
Over skiboot-5.8, we have the following changes:
New Features
POWER8
-
fast-reset by default (if possible)
Currently, this is limited to POWER8 systems.
A normal reboot will, rather than doing a full IPL, go through a
fast reboot procedure. This reduces the "reboot to petitboot" time
from minutes to a handful of seconds.
POWER9
Since skiboot-5.9-rc3:
- occ-sensors : Add OCC inband sensor region to exports (useful for
debugging)
Two SRESET fixes (see below for feature description):
-
core: direct-controls: Fix clearing of special wakeup
'special_wakeup_count' is incremented on successfully asserting
special wakeup. So we will never clear the special wakeup if we
check 'special_wakeup_count' to be zero. Fix this issue by
checking the 'special_wakeup_count' to 1 in
dctl_clear_special_wakeup(). -
core/direct-controls: increase special wakeup timeout on POWER9
Some instances have been observed where the special wakeup assert
times out. The current timeout is too short for deeper sleep states.
Hostboot uses 100ms, so match that.
Since skiboot-5.9-rc2: - cpu: Add
OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED
Add a new CPU reinit flag, "TM Suspend Disabled", which requests that
CPUs be configured so that TM (Transactional Memory) suspend mode is
disabled.Currently this always fails, because skiboot has no way to query the
state. A future hostboot change will add a mechanism for skiboot to
determine the status and return an appropriate error code.
Since skiboot-5.8:
-
POWER9 power management during boot
Less power should be consumed during boot.
-
OPAL_SIGNAL_SYSTEM_RESET for POWER9
This implements OPAL_SIGNAL_SYSTEM_RESET, using scom registers to
quiesce the target thread and raise a system reset exception on it.
It has been tested on DD2 with stop0 ESL=0 and ESL=1 shallow power
saving modes.DD1 is not implemented because it is sufficiently different as to
make support difficult. -
Enable deep idle states for POWER9
-
SLW: Add support for p9_stop_api
p9_stop_api's are used to set SPR state on a core wakeup form
a deeper low power state. p9_stop_api uses low level platform
formware and self-restore microcode to restore the sprs to
requested values.Code is taken from :
https://github.com/open-power/hostboot/tree/master/src/import/chips/p9/procedures/utils/stopreg -
SLW: Removing timebase related flags for stop4
When a core enters stop4, it does not loose decrementer and time
base. Hence removing flags OPAL_PM_DEC_STOP and
OPAL_PM_TIMEBASE_STOP. -
SLW: Allow deep states if homer address is known
Use a common variable has_wakeup_engine instead of has_slw to
tell if the:- SLW image is populated in case of power8
- CME image is populated in case of power9
Currently we expect CME to be loaded if homer address is known (
except for simulators) -
SLW: Configure self-restore for HRMOR
Make a stop api call using libpore to restore HRMOR register.
HRMOR needs to be cleared so that when thread exits stop, they
arrives at linux system_reset vector (0x100). -
SLW: Add opal_slw_set_reg support for power9
This OPAL call is made from Linux to OPAL to configure values in
various SPRs after wakeup from a deep idle state. -
-
PHB4: CAPP recovery
CAPP recovery is initiated when a CAPP Machine Check is detected.
The capp recovery procedure is initiated via a Hypervisor
Maintenance interrupt (HMI).CAPP Machine Check may arise from either an error that results in a
PHB freeze or from an internal CAPP error with CAPP checkstop FIR
action. An error that causes a PHB freeze will result in the link
down signal being asserted. The system continues running and the
CAPP and PSL will be re-initialized.This implements CAPP recovery for POWER9 systems
-
Add
wafer-location
property for POWER9Extract wafer-location from ECID and add property under xscom node.
- bits 64:71 are the chip x location (7:0)
- bits 72:79 are the chip y location (7:0)
Sample output: :
[root@wsp xscom@623fc00000000]# lsprop ecid ecid 019a00d4 03100718 852c0000 00fd7911 [root@wsp xscom@623fc00000000]# lsprop wafer-location wafer-location 00000085 0000002c
-
Add
wafer-id
property for POWER9Wafer id is derived from ECID data.
- bits 4:63 are the wafer id ( ten 6 bit fields each containing a
code)
Sample output: :
[root@wsp xscom@623fc00000000]# lsprop ecid ecid 019a00d4 03100718 852c0000 00fd7911 [root@wsp xscom@623fc00000000]# lsprop wafer-id wafer-id "6Q0DG340SO"
- bits 4:63 are the wafer id ( ten 6 bit fields each containing a
-
Add
ecid
property underxscom
node for POWER9. Sample output: :[root@wsp xscom@623fc00000000]# lsprop ecid ecid 019a00d4 03100718 852c0000 00fd7911
-
Add ibm,firmware-versions device tree node
In P8, hostboot provides mini device tree. It contains
/ibm,firmware-versions
node which has various firmware component
version details.In P9, OPAL is building device tree. This patch adds support to
parse VERSION section of PNOR and create/ibm,firmware-versions
device tree node.Sample output: :
/sys/firmware/devicetree/base/ibm,firmware-versions # lsprop . occ "6a00709" skiboot "v5.7-rc1-p344fb62" buildroot "2017.02.2-7-g23118ce" capp-ucode "9c73e9f" petitboot "v1.4.3-p98b6d83" sbe "02021c6" open-power "witherspoon-v1.17-128-gf1b53c7-dirty" .... ....
POWER9
Since skiboot-5.9-rc5:
-
Suppress XSCOM chiplet-offline errors on P9
Workaround on P9: PRD does operations it knows will fail with this
error to work around a hardware issue where accesses via the PIB
(FSI or OCC) work as expected, accesses via the ADU (what xscom goes
through) do not. The chip logic will always return all FFs if there
is any error on the scom. -
asm/head: initialize preferred DSCR value
POWER7/8 use DSCR=0. POWER9 preferred value has "stride-N" enabled.
Since skiboot-5.9-rc4: - opal/hmi: Workaround Power9 hw logic bug for
couple of TFMR TB errors. - opal/hmi: Fix TB reside and HDEC parity
error recovery for power9
Since skiboot-5.9-rc2: - hw/imc: Fix IMC Catalog load for DD2.X
processors
Since skiboot-5.9-rc1: - xive: Fix VP free block group mode
false-positive parameter check
The check to ensure the buddy allocation idx is aligned to its
allocation order was not taking into account the allocation split.
This would result in opal_xive_free_vp_block failures despite
giving the same value as returned by opal_xive_alloc_vp_block.E.g., starting then stopping 4 KVM guests gives the following pattern
in the host: :opal_xive_alloc_vp_block(5)=0x45000020 opal_xive_alloc_vp_block(5)=0x45000040 opal_xive_alloc_vp_block(5)=0x45000060 opal_xive_alloc_vp_block(5)=0x45000080 opal_xive_free_vp_block(0x45000020)=-1 opal_xive_free_vp_block(0x45000040)=0 opal_xive_free_vp_block(0x45000060)=-1 opal_xive_free_vp_block(0x45000080)=0
-
hw/imc: pause microcode at boot
IMC nest counters has both in-band (ucode access) and out of band
access to it. Since not all nest counter configurations are
supported by ucode, out of band tools are used to characterize other
configuration.So it is prefer to pause the nest microcode at boot to aid the nest
out of band tools. If the ucode not paused and OS does not have IMC
driver support, then out to band tools will race with ucode and end
up getting undesirable values. Patch to check and pause the ucode at
boot.OPAL provides APIs to control IMC counters.
OPAL_IMC_COUNTERS_INIT is used to initialize these counters at
boot. OPAL_IMC_COUNTERS_START and OPAL_IMC_COUNTERS_STOP API
calls should be used to start and pause these IMC engines.
doc/opal-api/opal-imc-counters.rst details the OPAL APIs and their
usage. -
hdata/i2c: update the list of known i2c devs
This updates the list of known i2c devices - as of HDAT spec
v10.5e - so that they can be properly identified during the hdat
parsing. -
hdata/i2c: log unknown i2c devices
An i2c device is unknown if either the i2c device list is outdated
or the device is marked as unknown (0xFF) in the hdat.
Since skiboot-5.8:
-
Disable Transactional Memory on Power9 DD 2.1
Update pa_features_p9[] to disable TM (Transactional Memory). On
DD 2.1 TM is not usable by Linux without other workarounds, so
skiboot must disable it. -
xscom: Do not print error me...
v5.9-rc5
skiboot-5.9-rc5
skiboot v5.9-rc5 was released on Monday October 23rd 2017 approximately
32,000ft above somewhere north of Tucson, Arizona. It is the fifth
release candidate of skiboot 5.9, which will become the new stable
release of skiboot following the 5.8 release, first released August 31st
2017.
skiboot v5.9-rc5 contains all bug fixes as of skiboot-5.4.8 and
skiboot-5.1.21 (the currently maintained stable releases). We do not
currently expect to do any 5.8.x stable releases.
For how the skiboot stable releases work, see stable-rules for details.
The current plan is to cut the final 5.9 very shortly, with skiboot 5.9
being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October
18th, so we're running a bit behind there). This release will be
targetted to early POWER9 systems.
Over skiboot-5.9-rc3, we have the following changes:
-
opal/hmi: Workaround Power9 hw logic bug for couple of TFMR TB
errors. -
opal/hmi: Fix TB reside and HDEC parity error recovery for power9
-
phb4: Escalate freeze to fence to avoid checkstop
Freeze events such as MMIO loads can cause the PHB to lose it's
limited powerbus credits. If all credits are used and a further MMIO
will cause a checkstop.To work around this, we escalate the troublesome freeze events to a
fence. The fence will cause a full PHB reset which resets the
powerbus credits and avoids the checkstop. -
phb4: Update some init registers
New inits based on next PHB4 workbook. Increases some timeouts to
avoid some spurious error conditions. -
phb4: Enable PHB MMIO in phb4_root_port_init()
Linux EEH flow is somewhat broken. It saves the PCIe config space of
the PHB on boot, which it then uses to restore on EEH recovery. It
does this to restore MMIO bars and some other pieces.Unfortunately this save is done before any drivers are bound to
devices under the PHB. A number of other things are configured in
the PHB after drivers start, hence some configuration space settings
aren't saved correctly. These include bus master and MMIO bits in
the command register.Linux tried to hack around this in this linux commit
bf898ec5cb
powerpc/eeh: Enable PCI_COMMAND_MASTER for PCI bridges This sets
the bus master bit but ignores the MMIO bit.Hence we lose MMIO after a full PHB reset. This causes the next MMIO
access to the device to fail and for us to perform a PE freeze
recovery, which still doesn't set the MMIO bit and hence we still
fail.This works around this by forcing MMIO on during
phb4_root_port_init().With this we can recovery from a PHB fence event on POWER9.
-
phb4: Reduce link degraded message log level to debug
If we hit this message we'll retry and fix the problem. If we run
out of retries and can't fix the problem, we'll still print a log
message at error level indicating a problem. -
phb4: Fix GEN3 for DD2.00
In this fix:
: 62ac763 phb4: Fix PCIe GEN4 on DD2.1 and above
We fixed DD2.1 GEN4 but broke DD2.00 as GEN3.
This fixes DD2.00 back to GEN3. This time for sure!
v5.9-rc4
skiboot-5.9-rc4
skiboot v5.9-rc4 was released on Thursday October 19th 2017. It is the
fourth release candidate of skiboot 5.9, which will become the new
stable release of skiboot following the 5.8 release, first released
August 31st 2017.
skiboot v5.9-rc4 contains all bug fixes as of skiboot-5.4.8 and
skiboot-5.1.21 (the currently maintained stable releases). We do not
currently expect to do any 5.8.x stable releases.
For how the skiboot stable releases work, see stable-rules for details.
The current plan is to cut the final 5.9 by October 20th, with skiboot
5.9 being for all POWER8 and POWER9 platforms in op-build v1.20 (Due
October 18th, so we're running a bit behind there). This release will be
targetted to early POWER9 systems.
Over skiboot-5.9-rc3, we have the following changes:
-
phb4: Fix PCIe GEN4 on DD2.1 and above
In this change:
: eef0e19 PHB4: Default to PCIe GEN3 on POWER9 DD2.00
We clamped DD2.00 parts to GEN3 but unfortunately this change also
applies to DD2.1 and above.This fixes this to only apply to DD2.00.
-
occ-sensors : Add OCC inband sensor region to exports (useful for
debugging)
Two SRESET fixes:
-
core: direct-controls: Fix clearing of special wakeup
'special_wakeup_count' is incremented on successfully asserting
special wakeup. So we will never clear the special wakeup if we
check 'special_wakeup_count' to be zero. Fix this issue by
checking the 'special_wakeup_count' to 1 in
dctl_clear_special_wakeup(). -
core/direct-controls: increase special wakeup timeout on POWER9
Some instances have been observed where the special wakeup assert
times out. The current timeout is too short for deeper sleep states.
Hostboot uses 100ms, so match that.