commit 24e0e61db3cb86a66824531989f1df80e0939f26 upstream.
In AHCI 1.3.1, the register description for CAP.SSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Slumber state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Slumber requests."
In AHCI 1.3.1, the register description for CAP.PSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Partial state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Partial requests."
Ensure that we always set the corresponding bits in PxSCTL.IPM, such that
a device is not allowed to initiate transitions to power states which are
unsupported by the HBA.
DevSleep is always initiated by the HBA, however, for completeness, set the
corresponding bit in PxSCTL.IPM such that agressive link power management
cannot transition to DevSleep if DevSleep is not supported.
sata_link_scr_lpm() is used by libahci, ata_piix and libata-pmp.
However, only libahci has the ability to read the CAP/CAP2 register to see
if these features are supported. Therefore, in order to not introduce any
regressions on ata_piix or libata-pmp, create flags that indicate that the
respective feature is NOT supported. This way, the behavior for ata_piix
and libata-pmp should remain unchanged.
This change is based on a patch originally submitted by Runa Guo-oc.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Fixes: 1152b2617a ("libata: implement sata_link_scr_lpm() and make ata_dev_set_feature() global")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 75e2bd5f1ede42a2bc88aa34b431e1ace8e0bea0 upstream.
libsas does its own domain based power management of ports. For such
ports, libata should not use a device type defining power management
operations as executing these operations for suspend/resume in addition
to libsas calls to ata_sas_port_suspend() and ata_sas_port_resume() is
not necessary (and likely dangerous to do, even though problems are not
seen currently).
Introduce the new ata_port_sas_type device_type for ports managed by
libsas. This new device type is used in ata_tport_add() and is defined
without power management operations.
Fixes: 2fcbdcb4c8 ("[SCSI] libata: export ata_port suspend/resume infrastructure for sas")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84d76529c650f887f1e18caee72d6f0589e1baf9 upstream.
Whenever an ATA adapter driver is removed (e.g. rmmod),
ata_port_detach() is called repeatedly for all the adapter ports to
remove (unload) the devices attached to the port and delete the port
device itself. Removing of devices is done using libata EH with the
ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
ata_eh_unload() which disables all devices attached to the port.
ata_port_detach() finishes by calling scsi_remove_host() to remove the
scsi host associated with the port. This function will trigger the
removal of all scsi devices attached to the host and in the case of
disks, calls to sd_shutdown() which will flush the device write cache
and stop the device. However, given that the devices were already
disabled by ata_eh_unload(), the synchronize write cache command and
start stop unit commands fail. E.g. running "rmmod ahci" with first
removing sd_mod results in error messages like:
ata13.00: disable device
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] Stopping disk
sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Fix this by removing all scsi devices of the ata devices connected to
the port before scheduling libata EH to disable the ATA devices.
Fixes: 720ba12620 ("[PATCH] libata-hp: update unload-unplug")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b8e0af4a7a331d1510e963b8fd77e2fca0a77f1 upstream.
The function ata_port_request_pm() checks the port flag
ATA_PFLAG_PM_PENDING and calls ata_port_wait_eh() if this flag is set to
ensure that power management operations for a port are not scheduled
simultaneously. However, this flag check is done without holding the
port lock.
Fix this by taking the port lock on entry to the function and checking
the flag under this lock. The lock is released and re-taken if
ata_port_wait_eh() needs to be called. The two WARN_ON() macros checking
that the ATA_PFLAG_PM_PENDING flag was cleared are removed as the first
call is racy and the second one done without holding the port lock.
Fixes: 5ef4108291 ("ata: add ata port system PM callbacks")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 69f2c9346313ba3d3dfa4091ff99df26c67c9021 ]
Commit 2dc0b46b5e ("libata: sata_down_spd_limit should return if
driver has not recorded sstatus speed") changed the behavior of
sata_down_spd_limit() to return doing nothing if a drive does not report
a current link speed, to avoid reducing the link speed to the lowest 1.5
Gbps speed.
However, the change assumed that a speed was recorded before probing
(e.g. before a suspend/resume) and set in link->sata_spd. This causes
problems with adapters/drives combination failing to establish a link
speed during probe autonegotiation. One example reported of this problem
is an mvebu adapter with a 3Gbps port-multiplier box: autonegotiation
fails, leaving no recorded link speed and no reported current link
speed. Probe retries also fail as no action is taken by sata_set_spd()
after each retry.
Fix this by returning early in sata_down_spd_limit() only if we do have
a recorded link speed, that is, if link->sata_spd is not 0. With this
fix, a failed probe not leading to a recorded link speed is retried at
the lower 1.5 Gbps speed, with the link speed potentially increased
later on the second revalidate of the device if the device reports
that it supports higher link speeds.
Reported-by: Marius Dinu <marius@psihoexpert.ro>
Fixes: 2dc0b46b5e ("libata: sata_down_spd_limit should return if driver has not recorded sstatus speed")
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Marius Dinu <marius@psihoexpert.ro>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit ea08aec7e77bfd6599489ec430f9f859ab84575a upstream.
Commit 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as
board_ahci_mobile") added an explicit entry for AMD Green Sardine
AHCI controller using the board_ahci_mobile configuration (this
configuration has later been renamed to board_ahci_low_power).
The board_ahci_low_power configuration enables support for low power
modes.
This explicit entry takes precedence over the generic AHCI controller
entry, which does not enable support for low power modes.
Therefore, when commit 1527f69204fe ("ata: ahci: Add Green Sardine
vendor ID as board_ahci_mobile") was backported to stable kernels,
it make some Pioneer optical drives, which was working perfectly fine
before the commit was backported, stop working.
The real problem is that the Pioneer optical drives do not handle low
power modes correctly. If these optical drives would have been tested
on another AHCI controller using the board_ahci_low_power configuration,
this issue would have been detected earlier.
Unfortunately, the board_ahci_low_power configuration is only used in
less than 15% of the total AHCI controller entries, so many devices
have never been tested with an AHCI controller with low power modes.
Fixes: 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile")
Cc: stable@vger.kernel.org
Reported-by: Jaap Berkhout <j.j.berkhout@staalenberk.nl>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bf476fe22aa1851bab4728e0c49025a6a0bea307 ]
In an unlikely (and probably wrong?) case that the 'ppi' parameter of
ata_host_alloc_pinfo() points to an array starting with a NULL pointer,
there's going to be a kernel oops as the 'pi' local variable won't get
reassigned from the initial value of NULL. Initialize 'pi' instead to
'&ata_dummy_port_info' to fix the possible kernel oops for good...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5399752299396a3c9df6617f4b3c907d7aa4ded8 ]
Samsung' 840 EVO with the latest firmware (EXT0DB6Q) locks up with
the a message: "READ LOG DMA EXT failed, trying PIO" during boot.
Initially this was discovered because it caused a crash
with the sata_dwc_460ex controller on a WD MyBook Live DUO.
The reporter "Tice Rex" which has the unique opportunity that he
has two Samsung 840 EVO SSD! One with the older firmware "EXT0BB0Q"
which booted fine and didn't expose "READ LOG DMA EXT". But the
newer/latest firmware "EXT0DB6Q" caused the headaches.
BugLink: https://github.com/openwrt/openwrt/issues/9505
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c8ea23d5fa59f28302d4e3370c75d9c308e64410 ]
This device is a CF card, or possibly an SSD in CF form factor.
It supports NCQ and high speed DMA.
While it also advertises TRIM support, I/O errors are reported
when the discard mount option fstrim is used. TRIM also fails
when disabling NCQ and not just as an NCQ command.
TRIM must be disabled for this device.
Signed-off-by: Zoltán Böszörményi <zboszor@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a66307d473077b7aeba74e9b09c841ab3d399c2d upstream.
The ASMedia 1092 has a configuration mode which will present a
dummy device; sadly the implementation falsely claims to provide
a device with 100M which doesn't actually exist.
So disable this device to avoid errors during boot.
Cc: stable@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f971a85439bd25dc7b4d597cf5e4e8dc7ffc884b upstream.
Checking if DMA is enabled should be done via the
ata_dma_enabled helper function, since the init state
0xff indicates disabled.
This meant that ATA_CMD_READ_LOG_DMA_EXT was used and probed
for before DMA was enabled, which caused hangs for some combinations
of controllers and devices.
It might also have caused it to be incorrectly disabled as broken,
but there have been no reports of that.
Cc: stable@vger.kernel.org
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=195895
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a8526a5cd51cf5f070310c6c37dd7293334ac49 upstream.
Many users are reporting that the Samsung 860 and 870 SSD are having
various issues when combined with AMD/ATI (vendor ID 0x1002) SATA
controllers and only completely disabling NCQ helps to avoid these
issues.
Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
SATA adapter vendor will cause I/O performance degradation with well
behaved adapters. To limit the performance impact to ATI adapters,
introduce the ATA_HORKAGE_NO_NCQ_ON_ATI flag to force disable NCQ
only for these adapters.
Also, two libata.force parameters (noncqati and ncqati) are introduced
to disable and enable the NCQ for the system which equipped with ATI
SATA adapter and Samsung 860 and 870 SSDs. The user can determine NCQ
function to be enabled or disabled according to the demand.
After verifying the chipset from the user reports, the issue appears
on AMD/ATI SB7x0/SB8x0/SB9x0 SATA Controllers and does not appear on
recent AMD SATA adapters. The vendor ID of ATI should be 0x1002.
Therefore, ATA_HORKAGE_NO_NCQ_ON_AMD was modified to
ATA_HORKAGE_NO_NCQ_ON_ATI.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
Signed-off-by: Kate Hsuan <hpa@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20210903094411.58749-1-hpa@redhat.com
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Krzysztof Olędzki <ole@ans.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a6430ab9c9c87cb64c512e505e8690bbaee190b upstream.
Commit ca6bfcb2f6 ("libata: Enable queued TRIM for Samsung SSD 860")
limited the existing ATA_HORKAGE_NO_NCQ_TRIM quirk from "Samsung SSD 8*",
covering all Samsung 800 series SSDs, to only apply to "Samsung SSD 840*"
and "Samsung SSD 850*" series based on information from Samsung.
But there is a large number of users which is still reporting issues
with the Samsung 860 and 870 SSDs combined with Intel, ASmedia or
Marvell SATA controllers and all reporters also report these problems
going away when disabling queued trims.
Note that with AMD SATA controllers users are reporting even worse
issues and only completely disabling NCQ helps there, this will be
addressed in a separate patch.
Fixes: ca6bfcb2f6 ("libata: Enable queued TRIM for Samsung SSD 860")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475
Cc: stable@vger.kernel.org
Cc: Kate Hsuan <hpa@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210823095220.30157-1-hdegoede@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95364f36701e62dd50eee91e1303187fd1a9f567 upstream.
In case a driver wants to return an error from qc_prep, return enum
ata_completion_errors. sata_mv is one of those drivers -- see the next
patch. Other drivers return the newly defined AC_ERR_OK.
[v2] use enum ata_completion_errors and AC_ERR_OK.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b5455636fe26ea21b4189d135a424a6da016418 upstream.
All three generations of Sandisk SSDs lock up hard intermittently.
Experiments showed that disabling NCQ lowered the failure rate significantly
and the kernel has been disabling NCQ for some models of SD7's and 8's,
which is obviously undesirable.
Karthik worked with Sandisk to root cause the hard lockups to trim commands
larger than 128M. This patch implements ATA_HORKAGE_MAX_TRIM_128M which
limits max trim size to 128M and applies it to all three generations of
Sandisk SSDs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Karthik Shivaram <karthikgs@fb.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b5292111de9bb70cba3489075970889765302136 ]
Commit 130f4caf145c ("libata: Ensure ata_port probe has completed before
detach") may cause system freeze during suspend.
Using async_synchronize_full() in PM callbacks is wrong, since async
callbacks that are already scheduled may wait for not-yet-scheduled
callbacks, causes a circular dependency.
Instead of using big hammer like async_synchronize_full(), use async
cookie to make sure port probe are synced, without affecting other
scheduled PM callbacks.
Fixes: 130f4caf145c ("libata: Ensure ata_port probe has completed before detach")
Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: John Garry <john.garry@huawei.com>
BugLink: https://bugs.launchpad.net/bugs/1867983
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 10a663a1b15134a5a714aa515e11425a44d4fdf7 upstream.
device_shutdown() called from reboot or power_shutdown expect
all devices to be shutdown. Same is true for even ahci pci driver.
As no ahci shutdown function is implemented, the ata subsystem
always remains alive with DMA & interrupt support. File system
related calls should not be honored after device_shutdown().
So defining ahci pci driver shutdown to freeze hardware (mask
interrupt, stop DMA engine and free DMA resources).
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8385d756e114f2df8568e508902d5f9850817ffb upstream.
ata_qc_complete_multiple() is called with a mask of the still active
tags.
mv_sata doesn't have this information directly and instead calculates
the still active tags from the started tags (ap->qc_active) and the
finished tags as (ap->qc_active ^ done_mask)
Since 28361c4036 the hw_tag and tag are no longer the same and the
equation is no longer valid. In ata_exec_internal_sg() ap->qc_active is
initialized as 1ULL << ATA_TAG_INTERNAL, but in hardware tag 0 is
started and this will be in done_mask on completion. ap->qc_active ^
done_mask becomes 0x100000000 ^ 0x1 = 0x100000001 and thus tag 0 used as
the internal tag will never be reported as completed.
This is fixed by introducing ata_qc_get_active() which returns the
active hardware tags and calling it where appropriate.
This is tested on mv_sata, but sata_fsl and sata_nv suffer from the same
problem. There is another case in sata_nv that most likely needs fixing
as well, but this looks a little different, so I wasn't confident enough
to change that.
Fixes: 28361c4036 ("libata: add extra internal command")
Cc: stable@vger.kernel.org
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add missing export of ata_qc_get_active(), as per Pali.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[ Upstream commit 130f4caf145c3562108b245a576db30b916199d2 ]
With CONFIG_DEBUG_TEST_DRIVER_REMOVE set, we may find the following WARN:
[ 23.452574] ------------[ cut here ]------------
[ 23.457190] WARNING: CPU: 59 PID: 1 at drivers/ata/libata-core.c:6676 ata_host_detach+0x15c/0x168
[ 23.466047] Modules linked in:
[ 23.469092] CPU: 59 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc1-00010-g5b83fd27752b-dirty #296
[ 23.477776] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 03/15/2019
[ 23.486286] pstate: a0c00009 (NzCv daif +PAN +UAO)
[ 23.491065] pc : ata_host_detach+0x15c/0x168
[ 23.495322] lr : ata_host_detach+0x88/0x168
[ 23.499491] sp : ffff800011cabb50
[ 23.502792] x29: ffff800011cabb50 x28: 0000000000000007
[ 23.508091] x27: ffff80001137f068 x26: ffff8000112c0c28
[ 23.513390] x25: 0000000000003848 x24: ffff0023ea185300
[ 23.518689] x23: 0000000000000001 x22: 00000000000014c0
[ 23.523987] x21: 0000000000013740 x20: ffff0023bdc20000
[ 23.529286] x19: 0000000000000000 x18: 0000000000000004
[ 23.534584] x17: 0000000000000001 x16: 00000000000000f0
[ 23.539883] x15: ffff0023eac13790 x14: ffff0023eb76c408
[ 23.545181] x13: 0000000000000000 x12: ffff0023eac13790
[ 23.550480] x11: ffff0023eb76c228 x10: 0000000000000000
[ 23.555779] x9 : ffff0023eac13798 x8 : 0000000040000000
[ 23.561077] x7 : 0000000000000002 x6 : 0000000000000001
[ 23.566376] x5 : 0000000000000002 x4 : 0000000000000000
[ 23.571674] x3 : ffff0023bf08a0bc x2 : 0000000000000000
[ 23.576972] x1 : 3099674201f72700 x0 : 0000000000400284
[ 23.582272] Call trace:
[ 23.584706] ata_host_detach+0x15c/0x168
[ 23.588616] ata_pci_remove_one+0x10/0x18
[ 23.592615] ahci_remove_one+0x20/0x40
[ 23.596356] pci_device_remove+0x3c/0xe0
[ 23.600267] really_probe+0xdc/0x3e0
[ 23.603830] driver_probe_device+0x58/0x100
[ 23.608000] device_driver_attach+0x6c/0x90
[ 23.612169] __driver_attach+0x84/0xc8
[ 23.615908] bus_for_each_dev+0x74/0xc8
[ 23.619730] driver_attach+0x20/0x28
[ 23.623292] bus_add_driver+0x148/0x1f0
[ 23.627115] driver_register+0x60/0x110
[ 23.630938] __pci_register_driver+0x40/0x48
[ 23.635199] ahci_pci_driver_init+0x20/0x28
[ 23.639372] do_one_initcall+0x5c/0x1b0
[ 23.643199] kernel_init_freeable+0x1a4/0x24c
[ 23.647546] kernel_init+0x10/0x108
[ 23.651023] ret_from_fork+0x10/0x18
[ 23.654590] ---[ end trace 634a14b675b71c13 ]---
With KASAN also enabled, we may also get many use-after-free reports.
The issue is that when CONFIG_DEBUG_TEST_DRIVER_REMOVE is set, we may
attempt to detach the ata_port before it has been probed.
This is because the ata_ports are async probed, meaning that there is no
guarantee that the ata_port has probed prior to detach. When the ata_port
does probe in this scenario, we get all sorts of issues as the detach may
have already happened.
Fix by ensuring synchronisation with async_synchronize_full(). We could
alternatively use the cookie returned from the ata_port probe
async_schedule() call, but that means managing the cookie, so more
complicated.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit dd957493baa586f1431490f97f9c7c45eaf8ab10 upstream.
We've received a bugreport that using LPM with a SAMSUNG
MZ7TE512HMHP-000L1 SSD leads to system instability, we already have
a quirk for the MZ7TD256HAFV-000L9, which is also a Samsun EVO 840 /
PM851 OEM model, so it seems some of these models have a LPM issue.
This commits adds a NOLPM quirk for the model string from the new
bugeport, to avoid the reported stability issues.
Cc: stable@vger.kernel.org
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1571330
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fd6f32f78645db32b6b95a42e45da2ddd6de0e67 ]
These devices support read zero after trim (RZAT), as they advertise to
the OS. However, the OS doesn't believe the SSDs unless they are
explicitly whitelisted.
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 410b5c7b48368317af95f0113692561d01d8144e upstream.
med_power_with_dipm still causes freezes after updating the firmware to
the latest version (DXT04L5Q).
Set model_rev to NULL and blacklist the device.
Cc: stable@vger.kernel.org
Signed-off-by: Diego Viola <diego.viola@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a435ab4f80f983c53b4ca4f8c12b3ddd3ca17670 ]
med_power_with_dipm causes my T450 to freeze with a SAMSUNG
MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
Switching the LPM to max_performance fixes this issue.
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Diego Viola <diego.viola@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jens writes:
"Storage fixes for 4.19-rc5
- Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)
- NVMe pull request from Christoph, and a single ANA log page fix
(Hannes)
- Regression fix for libata qd32 support, where we trigger an illegal
active command transition. This fixes a CD-ROM detection issue that
was reported, but could also trigger premature completion of the
internal tag (me)"
* tag 'for-linus-20180920' of git://git.kernel.dk/linux-block:
floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
libata: mask swap internal and hardware tag
nvme: count all ANA groups for ANA Log page
hen we're comparing the hardware completion mask passed in from the
driver with the internal tag pending mask, we need to account for the
fact that the internal tag is different from the hardware tag. If not,
then we can end up either prematurely completing the internal tag (since
it's not set in the hw mask), or simply flag an error:
ata2: illegal qc_active transition (100000000->00000001)
If the internal tag is set, then swap that with the hardware tag in this
case before comparing with what the hardware reports.
Fixes: 28361c4036 ("libata: add extra internal command")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201151
Cc: stable@vger.kernel.org
Reported-by: Paul Sbarra <sbarra.paul@gmail.com>
Tested-by: Paul Sbarra <sbarra.paul@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With gcc 4.1.2:
drivers/ata/libata-core.c:7396:33: warning: no newline at end of file
Fixes: 2fa4a32613 ("scsi: libsas: dynamically allocate and free ata host")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull libata updates from Tejun Heo:
"Nothing too interesting. Mostly ahci and ahci_platform changes, many
around power management"
* 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (22 commits)
ata: ahci_platform: enable to get and control reset
ata: libahci_platform: add reset control support
ata: add an extra argument to ahci_platform_get_resources()
ata: sata_rcar: Add r8a77965 support
ata: sata_rcar: exclude setting of PHY registers in Gen3
ata: sata_rcar: really mask all interrupts on Gen2 and later
Revert "ata: ahci_platform: allow disabling of hotplug to save power"
ata: libahci: Allow reconfigure of DEVSLP register
ata: libahci: Correct setting of DEVSLP register
ata: ahci: Enable DEVSLP by default on x86 with SLP_S0
ata: ahci: Support state with min power but Partial low power state
Revert "ata: ahci_platform: convert kcalloc to devm_kcalloc"
ata: sata_rcar: Add rudimentary Runtime PM support
ata: sata_rcar: Provide a short-hand for &pdev->dev
ata: Only output sg element mapped number in verbose debug
ata: Guard ata_scsi_dump_cdb() by ATA_VERBOSE_DEBUG
ata: ahci_platform: convert kcalloc to devm_kcalloc
ata: ahci_platform: convert kzallloc to kcalloc
ata: ahci_platform: correct parameter documentation for ahci_platform_shutdown
libata: remove ata_sff_data_xfer_noirq()
...
Pull SCSI updates from James Bottomley:
"This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
hisi_sas, smartpqi, megaraid_sas, arcmsr.
In addition, with the continuing absence of Nic we have target updates
for tcmu and target core (all with reviews and acks).
The biggest observable change is going to be that we're (again) trying
to switch to mulitqueue as the default (a user can still override the
setting on the kernel command line).
Other major core stuff is the removal of the remaining Microchannel
drivers, an update of the internal timers and some reworks of
completion and result handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue
scsi: ufs: remove unnecessary query(DM) UPIU trace
scsi: qla2xxx: Fix issue reported by static checker for qla2x00_els_dcmd2_sp_done()
scsi: aacraid: Spelling fix in comment
scsi: mpt3sas: Fix calltrace observed while running IO & reset
scsi: aic94xx: fix an error code in aic94xx_init()
scsi: st: remove redundant pointer STbuffer
scsi: qla2xxx: Update driver version to 10.00.00.08-k
scsi: qla2xxx: Migrate NVME N2N handling into state machine
scsi: qla2xxx: Save frame payload size from ICB
scsi: qla2xxx: Fix stalled relogin
scsi: qla2xxx: Fix race between switch cmd completion and timeout
scsi: qla2xxx: Fix Management Server NPort handle reservation logic
scsi: qla2xxx: Flush mailbox commands on chip reset
scsi: qla2xxx: Fix unintended Logout
scsi: qla2xxx: Fix session state stuck in Get Port DB
scsi: qla2xxx: Fix redundant fc_rport registration
scsi: qla2xxx: Silent erroneous message
scsi: qla2xxx: Prevent sysfs access when chip is down
scsi: qla2xxx: Add longer window for chip reset
...
Currently when min_power policy is selected, the partial low power state
is not entered and link will try aggressively enter to only slumber state.
Add a new policy which still enable DEVSLP but also try to enter partial
low power state. This policy is presented as "min_power_with_partial".
For information the difference between partial and slumber
Partial – PHY logic is powered up, and in a reduced power state. The link
PM exit latency to active state maximum is 10 ns.
Slumber – PHY logic is powered up, and in a reduced power state. The link
PM exit latency to active state maximum is 10 ms.
Devslp – PHY logic is powered down. The link PM exit latency from this
state to active state maximum is 20 ms, unless otherwise specified by
DETO.
Suggested-and-reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Defining `ATA_DEBUG` there are a lof of messages like below in the log.
[ 16.345472] ata_sg_setup: 1 sg elements mapped
As that is too verbose, only output these messages in verbose debug.
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
There have been several reports of LPM related hard freezes about once
a day on multiple Lenovo 50 series models. Strange enough these reports
where not disk model specific as LPM issues usually are and some users
with the exact same disk + laptop where seeing them while other users
where not seeing these issues.
It turns out that enabling LPM triggers a firmware bug somewhere, which
has been fixed in later BIOS versions.
This commit adds a new ahci_broken_lpm() function and a new ATA_FLAG_NO_LPM
for dealing with this.
The ahci_broken_lpm() function contains DMI match info for the 4 models
which are known to be affected by this and the DMI BIOS date field for
known good BIOS versions. If the BIOS date is older then the one in the
table LPM will be disabled and a warning will be printed.
Note the BIOS dates are for known good versions, some older versions may
work too, but we don't know for sure, the table is using dates from BIOS
versions for which users have confirmed that upgrading to that version
makes the problem go away.
Unfortunately I've been unable to get hold of the reporter who reported
that BIOS version 2.35 fixed the problems on the W541 for him. I've been
able to verify the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION from an older
dmidecode, but I don't know the exact BIOS date as reported in the DMI.
Lenovo keeps a changelog with dates in their release notes, but the
dates there are the release dates not the build dates which are in DMI.
So I've chosen to set the date to which we compare to one day past the
release date of the 2.34 BIOS. I plan to fix this with a follow up
commit once I've the necessary info.
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 2623c7a5f2 ("libata: add refcounting to ata_host") v4.17+ introduced
refcounting to ata_host and will increase or decrease the refcount when
adding or deleting transport ATA port.
Now the ata host for libsas is embedded in domain_device, and the ->kref
member is not initialized. Afer we add ata transport class, ata_host_get()
will be called when adding transport ATA port and a warning will be
triggered as below:
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 2 PID: 103 at
lib/refcount.c:153 refcount_inc+0x40/0x48 ...... Call trace:
refcount_inc+0x40/0x48
ata_host_get+0x10/0x18
ata_tport_add+0x40/0x120
ata_sas_tport_add+0xc/0x14
sas_ata_init+0x7c/0xc8
sas_discover_domain+0x380/0x53c
process_one_work+0x12c/0x288
worker_thread+0x58/0x3f0
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
And also when removing transport ATA port ata_host_put() will be called and
another similar warning will be triggered. If the refcount decreased to
zero, the ata host will be freed. But this ata host is only part of
domain_device, it cannot be freed directly.
So we have to change this embedded static ata host to a dynamically
allocated ata host and initialize the ->kref member. To use ata_host_get()
and ata_host_put() in libsas, we need to move the declaration of these
functions to the public libata.h and export them.
Fixes: b6240a4df0 ("scsi: libsas: add transport class for ATA devices")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Taras Kondratiuk <takondra@cisco.com>
CC: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull libata updates from Tejun Heo:
- libata has always been limiting the maximum queue depth to 31, with
one entry set aside mostly for historical reasons. This didn't use to
make much difference but Jens found out that modern hard drives can
actually perform measurably better with the extra one queue depth.
Jens updated libata core so that it can make use of full 32 queue
depth
- Damien updated command retry logic in error handling so that it
doesn't unnecessarily retry when upper layer (SCSI) is gonna handle
them
- A couple misc changes
* 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
sata_fsl: use the right type for tag bitshift
ahci: enable full queue depth of 32
libata: don't clamp queue depth to ATA_MAX_QUEUE - 1
libata: add extra internal command
sata_nv: set host can_queue count appropriately
libata: remove assumption that ATA_MAX_QUEUE - 1 is the max
libata: use ata_tag_internal() consistently
libata: bump ->qc_active to a 64-bit type
libata: convert core and drivers to ->hw_tag usage
libata: introduce notion of separate hardware tags
libata: Fix command retry decision
libata: Honor RQF_QUIET flag
libata: Make ata_dev_set_mode() less verbose
libata: Fix ata_err_string()
libata: Fix comment typo in ata_eh_analyze_tf()
sata_nv: don't use block layer bounce buffer
ata: hpt37x: Convert to use match_string() helper
Commit 184add2ca2 ("libata: Apply NOLPM quirk for SanDisk
SD7UB3Q*G1001 SSDs") disabled LPM for SanDisk SD7UB3Q*G1001 SSDs.
This has lead to several reports of users of that SSD where LPM
was working fine and who know have a significantly increased idle
power consumption on their laptops.
Likely there is another problem on the T450s from the original
reporter which gets exposed by the uncore reaching deeper sleep
states (higher PC-states) due to LPM being enabled. The problem as
reported, a hardfreeze about once a day, already did not sound like
it would be caused by LPM and the reports of the SSD working fine
confirm this. The original reporter is ok with dropping the quirk.
A X250 user has reported the same hard freeze problem and for him
the problem went away after unrelated updates, I suspect some GPU
driver stack changes fixed things.
TL;DR: The original reporters problem were triggered by LPM but not
an LPM issue, so drop the quirk for the SSD in question.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1583207
Cc: stable@vger.kernel.org
Cc: Richard W.M. Jones <rjones@redhat.com>
Cc: Lorenzo Dalrio <lorenzo.dalrio@gmail.com>
Reported-by: Lorenzo Dalrio <lorenzo.dalrio@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Richard W.M. Jones" <rjones@redhat.com>
While whitelisting Micron M500DC drives, the tweaked blacklist entry
enabled queued TRIM from M500IT variants also. But these do not support
queued TRIM. And while using those SSDs with the latest kernel we have
seen errors and even the partition table getting corrupted.
Some part from the dmesg:
[ 6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133
[ 6.727390] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 6.741026] ata1.00: supports DRM functions and may not be fully accessible
[ 6.759887] ata1.00: configured for UDMA/133
[ 6.762256] scsi 0:0:0:0: Direct-Access ATA Micron_M500IT_MT MU01 PQ: 0 ANSI: 5
and then for the error:
[ 120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen
[ 120.860338] ata1.00: irq_stat 0x40000008
[ 120.860342] ata1.00: failed command: SEND FPDMA QUEUED
[ 120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout)
[ 120.860353] ata1.00: status: { DRDY }
[ 120.860543] ata1: hard resetting link
[ 121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 121.166376] ata1.00: supports DRM functions and may not be fully accessible
[ 121.186238] ata1.00: supports DRM functions and may not be fully accessible
[ 121.204445] ata1.00: configured for UDMA/133
[ 121.204454] ata1.00: device reported invalid CHS sector 0
[ 121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[ 121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current]
[ 121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4
[ 121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00
[ 121.204559] print_req_error: I/O error, dev sda, sector 272512
After few reboots with these errors, and the SSD is corrupted.
After blacklisting it, the errors are not seen and the SSD does not get
corrupted any more.
Fixes: 243918be63 ("libata: Do not blacklist Micron M500DC")
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use what the driver provides, which will still be ATA_MAX_QUEUE - 1
at most anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Bump the internal tag to 32, instead of stealing the last tag in
our regular command space. This works just fine, since we don't
actually need a separate hardware tag for this. Internal commands
cannot coexist with NCQ commands.
As a bonus, we get rid of the special casing of what tag to use
for the internal command.
This is in preparation for utilizing all 32 commands for normal IO.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Some check for the value directly, use the provided helper instead.
Also make it return a bool, since that's what it does.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
This is in preparation for allowing full usage of the tag space,
which means that our reserved error handling command will be
using an internal tag value of 32. This doesn't fit in a u32, so
move to a u64.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Anything that goes to the hardware should use ->hw_tag, anything
related to internal lookup should be using ->tag.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Rigth now these are the same, but drivers should be using ->hw_tag
for their command setup and issue.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Tejun Heo <tj@kernel.org>
For a successful setting of the device transfer speed mode in
ata_dev_set_mode(), do not print the message
"ataX.XX: configured for xxx" if the EH context has the quiet flag set,
unless the device port is being reset.
This preserves the output of the message during device scan but removes
it in the case of a simple device revalidation such as trigerred by
enabling the NCQ I/O priority feature of the device
e.g. echo 1 > /sys/block/sdxx/device/ncq_iprio_enable
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Sandisk SSDs SD7SN6S256G and SD8SN8U256G are regularly locking up
regularly under sustained moderate load with NCQ enabled. Blacklist
for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: stable@vger.kernel.org
Richard Jones has reported that using med_power_with_dipm on a T450s
with a Sandisk SD7UB3Q256G1001 SSD (firmware version X2180501) is
causing the machine to hang.
Switching the LPM to max_performance fixes this, so it seems that
this Sandisk SSD does not handle LPM well.
Note in the past there have been bug-reports about the following
Sandisk models not working with min_power, so we may need to extend
the quirk list in the future: name - firmware
Sandisk SD6SB2M512G1022I - X210400
Sandisk SD6PP4M-256G-1006 - A200906
Cc: stable@vger.kernel.org
Cc: Richard W.M. Jones <rjones@redhat.com>
Reported-and-tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>