Commit Graph

216 Commits

Author SHA1 Message Date
Andrew Kanner
38cbf13b2e ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
commit 0b63c0e01fba40e3992bc627272ec7b618ccaef7 upstream.

Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove():

[   57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12
[   57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper.  Leaking 1 clusters and removing the entry
[   57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004
[...]
[   57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[...]
[   57.331328] Call Trace:
[   57.331477]  <TASK>
[...]
[   57.333511]  ? do_user_addr_fault+0x3e5/0x740
[   57.333778]  ? exc_page_fault+0x70/0x170
[   57.334016]  ? asm_exc_page_fault+0x2b/0x30
[   57.334263]  ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10
[   57.334596]  ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[   57.334913]  ocfs2_xa_remove_entry+0x23/0xc0
[   57.335164]  ocfs2_xa_set+0x704/0xcf0
[   57.335381]  ? _raw_spin_unlock+0x1a/0x40
[   57.335620]  ? ocfs2_inode_cache_unlock+0x16/0x20
[   57.335915]  ? trace_preempt_on+0x1e/0x70
[   57.336153]  ? start_this_handle+0x16c/0x500
[   57.336410]  ? preempt_count_sub+0x50/0x80
[   57.336656]  ? _raw_read_unlock+0x20/0x40
[   57.336906]  ? start_this_handle+0x16c/0x500
[   57.337162]  ocfs2_xattr_block_set+0xa6/0x1e0
[   57.337424]  __ocfs2_xattr_set_handle+0x1fd/0x5d0
[   57.337706]  ? ocfs2_start_trans+0x13d/0x290
[   57.337971]  ocfs2_xattr_set+0xb13/0xfb0
[   57.338207]  ? dput+0x46/0x1c0
[   57.338393]  ocfs2_xattr_trusted_set+0x28/0x30
[   57.338665]  ? ocfs2_xattr_trusted_set+0x28/0x30
[   57.338948]  __vfs_removexattr+0x92/0xc0
[   57.339182]  __vfs_removexattr_locked+0xd5/0x190
[   57.339456]  ? preempt_count_sub+0x50/0x80
[   57.339705]  vfs_removexattr+0x5f/0x100
[...]

Reproducer uses faultinject facility to fail ocfs2_xa_remove() ->
ocfs2_xa_value_truncate() with -ENOMEM.

In this case the comment mentions that we can return 0 if
ocfs2_xa_cleanup_value_truncate() is going to wipe the entry
anyway. But the following 'rc' check is wrong and execution flow do
'ocfs2_xa_remove_entry(loc);' twice:
* 1st: in ocfs2_xa_cleanup_value_truncate();
* 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'.

Fix this by skipping the 2nd removal of the same entry and making
syzkaller repro happy.

Link: https://lkml.kernel.org/r/20241103193845.2940988-1-andrew.kanner@gmail.com
Fixes: 399ff3a748 ("ocfs2: Handle errors while setting external xattr values.")
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
Reported-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/671e13ab.050a0220.2b8c0f.01d0.GAE@google.com/T/
Tested-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-17 14:58:07 +01:00
Gautham Ananthakrishna
5c9807c523 ocfs2: reserve space for inline xattr before attaching reflink tree
commit 5ca60b86f57a4d9648f68418a725b3a7de2816b0 upstream.

One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption.  Upon troubleshooting,
the fsck -fn output showed the below corruption

[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227.  Clamp the next record value? n

The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.

        Inode: 33080590   Mode: 0640   Generation: 2619713622 (0x9c25a856)
        FS Generation: 904309833 (0x35e6ac49)
        CRC32: 00000000   ECC: 0000
        Type: Regular   Attr: 0x0   Flags: Valid
        Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
        Extended Attributes Block: 0  Extended Attributes Inline Size: 256
        User: 0 (root)   Group: 0 (root)   Size: 281320357888
        Links: 1   Clusters: 141738
        ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
        atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
        mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
        dtime: 0x0 -- Wed Dec 31 17:00:00 1969
        Refcount Block: 2777346
        Last Extblk: 2886943   Orphan Slot: 0
        Sub Alloc Slot: 0   Sub Alloc Bit: 14
        Tree Depth: 1   Count: 227   Next Free Rec: 230
        ## Offset        Clusters       Block#
        0  0             2310           2776351
        1  2310          2139           2777375
        2  4449          1221           2778399
        3  5670          731            2779423
        4  6401          566            2780447
        .......          ....           .......
        .......          ....           .......

The issue was in the reflink workfow while reserving space for inline
xattr.  The problematic function is ocfs2_reflink_xattr_inline().  By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode.  At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block.  It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case up to 230), thereby causing corruption.

The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.

Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@oracle.com
Fixes: ef962df057 ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08 16:19:14 +01:00
Ferry Meng
e2b3d7a9d0 ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()
[ Upstream commit af77c4fc1871847b528d58b7fdafb4aa1f6a9262 ]

xattr in ocfs2 maybe 'non-indexed', which saved with additional space
requested.  It's better to check if the memory is out of bound before
memcmp, although this possibility mainly comes from crafted poisonous
images.

Link: https://lkml.kernel.org/r/20240520024024.1976129-2-joseph.qi@linux.alibaba.com
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08 16:19:03 +01:00
Ferry Meng
b49a786beb ocfs2: add bounds checking to ocfs2_xattr_find_entry()
[ Upstream commit 9e3041fecdc8f78a5900c3aa51d3d756e73264d6 ]

Add a paranoia check to make sure it doesn't stray beyond valid memory
region containing ocfs2 xattr entries when scanning for a match.  It will
prevent out-of-bound access in case of crafted images.

Link: https://lkml.kernel.org/r/20240520024024.1976129-1-joseph.qi@linux.alibaba.com
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: af77c4fc1871 ("ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08 16:19:03 +01:00
Kees Cook
b7e389235c treewide: Remove uninitialized_var() usage
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream.

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 11:45:01 +02:00
Joseph Qi
b28da9da6e Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()"
commit 94b07b6f9e2e996afff7395de6b35f34f4cb10bf upstream.

This reverts commit 56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d.

Commit 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences
in ocfs2_xa_prepare_entry()") introduces a regression that fail to
create directory with mount option user_xattr and acl.  Actually the
reported NULL pointer dereference case can be correctly handled by
loc->xl_ops->xlo_add_entry(), so revert it.

Link: http://lkml.kernel.org/r/1573624916-83825-1-git-send-email-joseph.qi@linux.alibaba.com
Fixes: 56e94ea132bb ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Thomas Voegtle <tv@lio96.de>
Acked-by: Changwei Ge <gechangwei@live.cn>
Cc: Jia-Ju Bai <baijiaju1990@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01 09:16:10 +01:00
Jia-Ju Bai
c18d860476 fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()
[ Upstream commit 56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d ]

In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
check whether loc->xl_entry is NULL:

    if (loc->xl_entry)

When loc->xl_entry is NULL, it is used on line 2158:

    ocfs2_xa_add_entry(loc, name_hash);
        loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
        loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);

and line 2164:

    ocfs2_xa_add_namevalue(loc, xi);
        loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
        loc->xl_entry->xe_name_len = xi->xi_name_len;

Thus, possible null-pointer dereferences may occur.

To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
abnormally returns with -EINVAL.

These bugs are found by a static analysis tool STCheck written by us.

[akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.com
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06 13:05:56 +01:00
YueHaibing
7113a1bc19 ocfs2: remove set but not used variable 'last_hash'
[ Upstream commit 7bc36e3ce91471b6377c8eadc0a2f220a2280083 ]

Fixes gcc '-Wunused-but-set-variable' warning:

  fs/ocfs2/xattr.c: In function ocfs2_xattr_bucket_find:
  fs/ocfs2/xattr.c:3828:6: warning: variable last_hash set but not used [-Wunused-but-set-variable]

It's never used and can be removed.

Link: http://lkml.kernel.org/r/20190716132110.34836-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-25 10:47:58 +02:00
piaojun
1119d3c06f ocfs2: use 'osb' instead of 'OCFS2_SB()'
We could use 'osb' instead of 'OCFS2_SB()' to make code more elegant.

Link: http://lkml.kernel.org/r/5A702111.7090907@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:21 -07:00
piaojun
16c8d569f5 ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
The race between *set_acl and *get_acl will cause getting incomplete
xattr data as below:

  processA                                    processB

  ocfs2_set_acl
    ocfs2_xattr_set
      __ocfs2_xattr_set_handle

                                              ocfs2_get_acl_nolock
                                                ocfs2_xattr_get_nolock:

processB may get incomplete xattr data if processA hasn't set_acl done.

So we should use 'ip_xattr_sem' to protect getting extended attribute in
ocfs2_get_acl_nolock(), as other processes could be changing it
concurrently.

Link: http://lkml.kernel.org/r/5A5DDCFF.7030001@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:35 -08:00
piaojun
c0a1a6d769 ocfs2/xattr: assign errno to 'ret' in ocfs2_calc_xattr_init()
We need catch the errno returned by ocfs2_xattr_get_nolock() and assign
it to 'ret' for printing and noticing upper callers.

Link: http://lkml.kernel.org/r/5A571CAF.8050709@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Gang He <ghe@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:35 -08:00
alex chen
32ed0bd743 ocfs2: use the OCFS2_XATTR_ROOT_SIZE macro in ocfs2_reflink_xattr_header()
Use the OCFS2_XATTR_ROOT_SIZE macro improves the readability of the
code.

Link: http://lkml.kernel.org/r/5A2E2488.70301@huawei.com
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:35 -08:00
Linus Torvalds
1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Jun Piao
964f14a0d3 ocfs2: clean up some dead code
clean up some unused functions and parameters.

Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:24 -07:00
Eric Ren
8818efaaac ocfs2: fix deadlock caused by recursive locking in xattr
Another deadlock path caused by recursive locking is reported.  This
kind of issue was introduced since commit 743b5f1434 ("ocfs2: take
inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
fixed by commit b891fa5024 ("ocfs2: fix deadlock issue when taking
inode lock at vfs entry points").  Yes, we intend to fix this kind of
case in incremental way, because it's hard to find out all possible
paths at once.

This one can be reproduced like this.  On node1, cp a large file from
home directory to ocfs2 mountpoint.  While on node2, run
setfacl/getfacl.  Both nodes will hang up there.  The backtraces:

On node1:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_write_begin+0x43/0x1a0 [ocfs2]
  generic_perform_write+0xa9/0x180
  __generic_file_write_iter+0x1aa/0x1d0
  ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
  __vfs_write+0xc3/0x130
  vfs_write+0xb1/0x1a0
  SyS_write+0x46/0xa0

On node2:
  __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
  ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
  ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
  ocfs2_set_acl+0x22d/0x260 [ocfs2]
  ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
  set_posix_acl+0x75/0xb0
  posix_acl_xattr_set+0x49/0xa0
  __vfs_setxattr+0x69/0x80
  __vfs_setxattr_noperm+0x72/0x1a0
  vfs_setxattr+0xa7/0xb0
  setxattr+0x12d/0x190
  path_setxattr+0x9f/0xb0
  SyS_setxattr+0x14/0x20

Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
exported by commit 439a36b8ef ("ocfs2/dlmglue: prepare tracking logic
to avoid recursive cluster lock").

Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
Fixes: 743b5f1434 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
Signed-off-by: Eric Ren <zren@suse.com>
Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23 16:15:55 -07:00
Darrick J. Wong
84e40080bd ocfs2: convert inode refcount test to a helper
Replace the open-coded inode refcount flag test with a helper function
to reduce the potential for bugs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-12-10 12:39:45 -08:00
Deepa Dinamani
078cd8279e fs: Replace CURRENT_TIME with current_time() for inode timestamps
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:21 -04:00
Linus Torvalds
69c4289449 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  fat: fix error message for bogus number of directory entries
  fat: fix typo s/supeblock/superblock/
  ASoC: max9877: Remove unused function declaration
  dw2102: don't output spurious blank lines to the kernel log
  init: fix Kconfig text
  ARM: io: fix comment grammar
  ocfs: fix ocfs2_xattr_user_get() argument name
  scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()
2016-07-28 14:22:25 -07:00
Andreas Gruenbacher
84d86e6d30 ocfs: fix ocfs2_xattr_user_get() argument name
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-06-21 13:22:56 +02:00
Al Viro
5930122683 switch xattr_handler->set() to passing dentry and inode separately
preparation for similar switch in ->setxattr() (see the next commit for
rationale).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-27 15:39:43 -04:00
Linus Torvalds
7f427d3a60 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull parallel filesystem directory handling update from Al Viro.

This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.

The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.

The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).

A more detailed explanation of the process from Al Viro:
 "The xattr work starts with some acl fixes, then switches ->getxattr to
  passing inode and dentry separately.  This is the point where the
  things start to get tricky - that got merged into the very beginning
  of the -rc3-based #work.lookups, to allow untangling the
  security_d_instantiate() mess.  The xattr work itself proceeds to
  switch a lot of filesystems to generic_...xattr(); no complications
  there.

  After that initial xattr work, the series then does the following:

   - untangle security_d_instantiate()

   - convert a bunch of open-coded lookup_one_len_unlocked() to calls of
     that thing; one such place (in overlayfs) actually yields a trivial
     conflict with overlayfs fixes later in the cycle - overlayfs ended
     up switching to a variant of lookup_one_len_unlocked() sans the
     permission checks.  I would've dropped that commit (it gets
     overridden on merge from #ovl-fixes in #for-next; proper resolution
     is to use the variant in mainline fs/overlayfs/super.c), but I
     didn't want to rebase the damn thing - it was fairly late in the
     cycle...

   - some filesystems had managed to depend on lookup/lookup exclusion
     for *fs-internal* data structures in a way that would break if we
     relaxed the VFS exclusion.  Fixing hadn't been hard, fortunately.

   - core of that series - parallel lookup machinery, replacing
     ->i_mutex with rwsem, making lookup_slow() take it only shared.  At
     that point lookups happen in parallel; lookups on the same name
     wait for the in-progress one to be done with that dentry.

     Surprisingly little code, at that - almost all of it is in
     fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
     making it use the new primitive and actually switching to locking
     shared.

   - parallel readdir stuff - first of all, we provide the exclusion on
     per-struct file basis, same as we do for read() vs lseek() for
     regular files.  That takes care of most of the needed exclusion in
     readdir/readdir; however, these guys are trickier than lookups, so
     I went for switching them one-by-one.  To do that, a new method
     '->iterate_shared()' is added and filesystems are switched to it
     as they are either confirmed to be OK with shared lock on directory
     or fixed to be OK with that.  I hope to kill the original method
     come next cycle (almost all in-tree filesystems are switched
     already), but it's still not quite finished.

   - several filesystems get switched to parallel readdir.  The
     interesting part here is dealing with dcache preseeding by readdir;
     that needs minor adjustment to be safe with directory locked only
     shared.

     Most of the filesystems doing that got switched to in those
     commits.  Important exception: NFS.  Turns out that NFS folks, with
     their, er, insistence on VFS getting the fuck out of the way of the
     Smart Filesystem Code That Knows How And What To Lock(tm) have
     grown the locking of their own.  They had their own homegrown
     rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
     is the reader there).  Of course, with VFS getting the fuck out of
     the way, as requested, the actual smarts of the smart filesystem
     code etc. had become exposed...

   - do_last/lookup_open/atomic_open cleanups.  As the result, open()
     without O_CREAT locks the directory only shared.  Including the
     ->atomic_open() case.  Backmerge from #for-linus in the middle of
     that - atomic_open() fix got brought in.

   - then comes NFS switch to saner (VFS-based ;-) locking, killing the
     homegrown "lookup and readdir are writers" kinda-sorta rwsem.  All
     exclusion for sillyunlink/lookup is done by the parallel lookups
     mechanism.  Exclusion between sillyunlink and rmdir is a real rwsem
     now - rmdir being the writer.

     Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
     now.

   - the rest of the series consists of switching a lot of filesystems
     to parallel readdir; in a lot of cases ->llseek() gets simplified
     as well.  One backmerge in there (again, #for-linus - rockridge
     fix)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
  ext4: switch to ->iterate_shared()
  hfs: switch to ->iterate_shared()
  hfsplus: switch to ->iterate_shared()
  hostfs: switch to ->iterate_shared()
  hpfs: switch to ->iterate_shared()
  hpfs: handle allocation failures in hpfs_add_pos()
  gfs2: switch to ->iterate_shared()
  f2fs: switch to ->iterate_shared()
  afs: switch to ->iterate_shared()
  befs: switch to ->iterate_shared()
  befs: constify stuff a bit
  isofs: switch to ->iterate_shared()
  get_acorn_filename(): deobfuscate a bit
  btrfs: switch to ->iterate_shared()
  logfs: no need to lock directory in lseek
  switch ecryptfs to ->iterate_shared
  9p: switch to ->iterate_shared()
  fat: switch to ->iterate_shared()
  romfs, squashfs: switch to ->iterate_shared()
  more trivial ->iterate_shared conversions
  ...
2016-05-17 11:01:31 -07:00
Junxiao Bi
c25a1e0671 ocfs2: fix posix_acl_create deadlock
Commit 702e5bc68a ("ocfs2: use generic posix ACL infrastructure")
refactored code to use posix_acl_create.  The problem with this function
is that it is not mindful of the cluster wide inode lock making it
unsuitable for use with ocfs2 inode creation with ACLs.  For example,
when used in ocfs2_mknod, this function can cause deadlock as follows.
The parent dir inode lock is taken when calling posix_acl_create ->
get_acl -> ocfs2_iop_get_acl which takes the inode lock again.  This can
cause deadlock if there is a blocked remote lock request waiting for the
lock to be downconverted.  And same deadlock happened in ocfs2_reflink.
This fix is to revert back using ocfs2_init_acl.

Fixes: 702e5bc68a ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12 15:52:50 -07:00
Al Viro
b296821a7c xattr_handler: pass dentry and inode as separate arguments of ->get()
... and do not assume they are already attached to each other

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 20:48:24 -04:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Andreas Gruenbacher
1046cb1195 ocfs2: Replace list xattr handler operations
The list operations of the ocfs2 xattr handlers were never called
anywhere.  Remove them and directly check in ocfs2_xattr_list_entry
which attributes should be skipped over instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-13 19:46:00 -05:00
Andreas Gruenbacher
98e9cb5711 vfs: Distinguish between full xattr names and proper prefixes
Add an additional "name" field to struct xattr_handler.  When the name
is set, the handler matches attributes with exactly that name.  When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.

This patch should avoid bugs like the one fixed in commit c361016a in
the future.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06 21:33:52 -05:00
Andreas Gruenbacher
d9a82a0403 xattr handlers: Pass handler to operations instead of flags
The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example.  In some oprations, it would be useful to also have
access to the handler prefix.  To allow that, pass a pointer to the handler
to operations instead of the flags value alone.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-13 20:34:32 -05:00
Joe Perches
7ecef14ab1 ocfs2: neaten do_error, ocfs2_error and ocfs2_abort
These uses sometimes do and sometimes don't have '\n' terminations.  Make
the uses consistently use '\n' terminations and remove the newline from
the functions.

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Goldwyn Rodrigues
17a5b9ab32 ocfs2: acknowledge return value of ocfs2_error()
Caveat: This may return -EROFS for a read case, which seems wrong.  This
is happening even without this patch series though.  Should we convert
EROFS to EIO?

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Sanidhya Kashyap
0f5e7b41f9 ocfs2: trusted xattr missing CAP_SYS_ADMIN check
The trusted extended attributes are only visible to the process which
hvae CAP_SYS_ADMIN capability but the check is missing in ocfs2
xattr_handler trusted list.  The check is important because this will be
used for implementing mechanisms in the userspace for which other
ordinary processes should not have access to.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Taesoo kim <taesoo@gatech.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
b519ea6d9a ocfs2: mark local functions as static
Some functions are only used locally, so mark them as static.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Linus Torvalds
9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
David Howells
2b0143b5c9 VFS: normal filesystems (and lustre): d_inode() annotations
that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:57 -04:00
Joseph Qi
023d4ea358 ocfs2: fix possible uninitialized variable access
In ocfs2_local_alloc_find_clear_bits and ocfs2_get_dentry, variable
numfound and set may be uninitialized and then used in tracepoint.  In
ocfs2_xattr_block_get and ocfs2_delete_xattr_in_bucket, variable block_off
and xv may be uninitialized and then used in the following logic due to
unchecked return value.

This patch fixes these possible issues.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:48:58 -07:00
Rickard Strandqvist
d6edc87af8 ocfs2: xattr: remove unused function
Remove ocfs2_xattr_bucket_get_val() that is not used anywhere.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:29 -08:00
Jan Kara
2b693005b8 ocfs2: Fix xattr check in ocfs2_get_xattr_nolock()
ocfs2_get_xattr_nolock() checks whether inode has any extended attributes
(OCFS2_HAS_XATTR_FL).  If not, it just sets 'ret' to -ENODATA but
continues with checking inline and external attributes anyway (which is
pointless although it does not harm).  Just return immediately when we
know there are no extended attributes in the inode.

Coverity id: 1226906.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:03 -08:00
Wengang Wang
9c339255cb ocfs2: pass "new" parameter to ocfs2_init_xattr_bucket
This patch fixes the following crash:

  kernel BUG at fs/ocfs2/uptodate.c:530!
  Modules linked in: ocfs2(F) ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd sunrpc 8021q garp stp llc bonding be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iTCO_wdt iTCO_vendor_support dcdbas coretemp freq_table mperf microcode pcspkr serio_raw bnx2 lpc_ich mfd_core i5k_amb i5000_edac edac_core e1000e sg shpchp ext4(F) jbd2(F) mbcache(F) dm_round_robin(F) sr_mod(F) cdrom(F) usb_storage(F) sd_mod(F) crc_t10dif(F) pata_acpi(F) ata_generic(F) ata_piix(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F) radeon(F)
   ttm(F) drm_kms_helper(F) drm(F) hwmon(F) i2c_algo_bit(F) i2c_core(F) dm_multipath(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F)
  CPU 5
  Pid: 21303, comm: xattr-test Tainted: GF       W    3.8.13-30.el6uek.x86_64 #2 Dell Inc. PowerEdge 1950/0M788G
  RIP: ocfs2_set_new_buffer_uptodate+0x51/0x60 [ocfs2]
  Process xattr-test (pid: 21303, threadinfo ffff880017aca000, task ffff880016a2c480)
  Call Trace:
    ocfs2_init_xattr_bucket+0x8a/0x120 [ocfs2]
    ocfs2_cp_xattr_bucket+0xbb/0x1b0 [ocfs2]
    ocfs2_extend_xattr_bucket+0x20a/0x2f0 [ocfs2]
    ocfs2_add_new_xattr_bucket+0x23e/0x4b0 [ocfs2]
    ocfs2_xattr_set_entry_index_block+0x13c/0x3d0 [ocfs2]
    ocfs2_xattr_block_set+0xf9/0x220 [ocfs2]
    __ocfs2_xattr_set_handle+0x118/0x710 [ocfs2]
    ocfs2_xattr_set+0x691/0x880 [ocfs2]
    ocfs2_xattr_user_set+0x46/0x50 [ocfs2]
    generic_setxattr+0x96/0xa0
    __vfs_setxattr_noperm+0x7b/0x170
    vfs_setxattr+0xbc/0xc0
    setxattr+0xde/0x230
    sys_fsetxattr+0xc6/0xf0
    system_call_fastpath+0x16/0x1b
  Code: 41 80 0c 24 01 48 89 df e8 7d f0 ff ff 4c 89 e6 48 89 df e8 a2 fe ff ff 48 89 df e8 3a f0 ff ff 48 8b 1c 24 4c 8b 64 24 08 c9 c3 <0f> 0b eb fe 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 66 66
  RIP  ocfs2_set_new_buffer_uptodate+0x51/0x60 [ocfs2]

It hit the BUG_ON() in ocfs2_set_new_buffer_uptodate():

    void ocfs2_set_new_buffer_uptodate(struct ocfs2_caching_info *ci,
                                       struct buffer_head *bh)
    {
          /* This should definitely *not* exist in our cache */
          if (ocfs2_buffer_cached(ci, bh))
                  printk(KERN_ERR "bh->b_blocknr: %lu @ %p\n", bh->b_blocknr, bh);
          BUG_ON(ocfs2_buffer_cached(ci, bh));

          set_buffer_uptodate(bh);

          ocfs2_metadata_cache_io_lock(ci);
          ocfs2_set_buffer_uptodate(ci, bh);
          ocfs2_metadata_cache_io_unlock(ci);
    }

The problem here is:

We cached a block, but the buffer_head got reused.  When we are to pick
up this block again, a new buffer_head created with UPTODATE flag
cleared.  ocfs2_buffer_uptodate() returned false since no UPTODATE is
set on the buffer_head.  so we set this block to cache as a NEW block,
then it failed at asserting block is not in cache.

The fix is to add a new parameter indicating the bucket is a new
allocated or not to ocfs2_init_xattr_bucket().
ocfs2_init_xattr_bucket() assert block not cached accordingly.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:57 -07:00
Darrick J. Wong
6fdb702d62 ocfs2: call ocfs2_update_inode_fsync_trans when updating any inode
Ensure that ocfs2_update_inode_fsync_trans() is called any time we touch
an inode in a given transaction.  This is a follow-on to the previous
patch to reduce lock contention and deadlocking during an fsync
operation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Wengang <wen.gang.wang@oracle.com>
Cc: Greg Marsden <greg.marsden@oracle.com>
Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:56 -07:00
Tariq Saeed
3ed2be719e ocfs2: allow for more than one data extent when creating xattr
Orabug: 18108070

ocfs2_xattr_extend_allocation() hits panic when creating xattr during
data extent alloc phase.  The problem occurs if due to local alloc
fragmentation, clusters are spread over multiple extents.  In this case
ocfs2_add_clusters_in_btree() finds no space to store more than one
extent record and therefore fails returning RESTART_META.  The situation
is anticipated for xattr update case but not xattr create case.  This
fix simply ports that code to create case.

Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:55 -07:00
Christoph Hellwig
702e5bc68a ocfs2: use generic posix ACL infrastructure
This contains some major refactoring for the create path so that
inodes are created with the right mode to start with instead of
fixing it up later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:21 -05:00
Rui Xiang
58796207cf ocfs2: add necessary check in case sb_getblk() fails
sb_getblk() may return an err, so add a check for bh.

[joseph.qi@huawei.com: also add a check after calling sb_getblk() in ocfs2_create_xattr_block()]
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:00 +09:00
Rui Xiang
7391a294b8 ocfs2: return ENOMEM when sb_getblk() fails
The only reason for sb_getblk() failing is if it can't allocate the
buffer_head.  So return ENOMEM instead when it fails.

[joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change]
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:00 +09:00
Goldwyn Rodrigues
06f9da6e82 fs/ocfs2: remove unnecessary variable bits_wanted from ocfs2_calc_extend_credits
Code cleanup to remove unnecessary variable passed but never used
to ocfs2_calc_extend_credits.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:00 +09:00
Joseph Qi
6cae6d3189 ocfs2: fix possible double free in ocfs2_reflink_xattr_rec
In ocfs2_reflink_xattr_rec(), meta_ac and data_ac are allocated by calling
ocfs2_lock_reflink_xattr_rec_allocators().

Once an error occurs when allocating *data_ac, it frees *meta_ac which is
allocated before.  Here it mistakenly sets meta_ac to NULL but *meta_ac.
Then ocfs2_reflink_xattr_rec() will try to free meta_ac again which is
already invalid.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:49 -07:00
Younger Liu
6ea437a363 ocfs2: free meta_ac and data_ac when ocfs2_start_trans fails in ocfs2_xattr_set()
In ocfs2_xattr_set(), if ocfs2_start_trans failed, meta_ac and data_ac
should be free.  Otherwise, It would lead to a memory leak.

Signed-off-by: Younger Liu <younger.liu@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:47 -07:00
Joseph Qi
17caf9555e ocfs2: add the missing return value check of ocfs2_xattr_get_clusters
In ocfs2_xattr_value_attach_refcount(), if error occurs when calling
ocfs2_xattr_get_clusters(), it will go with unexpected behavior since
local variables p_cluster, num_clusters and ext_flags are declared without
initialization.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:45 -07:00
Junxiao Bi
ef962df057 ocfs2: xattr: fix inlined xattr reflink
Inlined xattr shared free space of inode block with inlined data or data
extent record, so the size of the later two should be adjusted when
inlined xattr is enabled.  See ocfs2_xattr_ibody_init().  But this isn't
done well when reflink.  For inode with inlined data, its max inlined
data size is adjusted in ocfs2_duplicate_inline_data(), no problem.  But
for inode with data extent record, its record count isn't adjusted.  Fix
it, or data extent record and inlined xattr may overwrite each other,
then cause data corruption or xattr failure.

One panic caused by this bug in our test environment is the following:

  kernel BUG at fs/ocfs2/xattr.c:1435!
  invalid opcode: 0000 [#1] SMP
  Pid: 10871, comm: multi_reflink_t Not tainted 2.6.39-300.17.1.el5uek #1
  RIP: ocfs2_xa_offset_pointer+0x17/0x20 [ocfs2]
  RSP: e02b:ffff88007a587948  EFLAGS: 00010283
  RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00000000000051e4
  RDX: ffff880057092060 RSI: 0000000000000f80 RDI: ffff88007a587a68
  RBP: ffff88007a587948 R08: 00000000000062f4 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000010
  R13: ffff88007a587a68 R14: 0000000000000001 R15: ffff88007a587c68
  FS:  00007fccff7f06e0(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
  CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00000000015cf000 CR3: 000000007aa76000 CR4: 0000000000000660
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process multi_reflink_t
  Call Trace:
    ocfs2_xa_reuse_entry+0x60/0x280 [ocfs2]
    ocfs2_xa_prepare_entry+0x17e/0x2a0 [ocfs2]
    ocfs2_xa_set+0xcc/0x250 [ocfs2]
    ocfs2_xattr_ibody_set+0x98/0x230 [ocfs2]
    __ocfs2_xattr_set_handle+0x4f/0x700 [ocfs2]
    ocfs2_xattr_set+0x6c6/0x890 [ocfs2]
    ocfs2_xattr_user_set+0x46/0x50 [ocfs2]
    generic_setxattr+0x70/0x90
    __vfs_setxattr_noperm+0x80/0x1a0
    vfs_setxattr+0xa9/0xb0
    setxattr+0xc3/0x120
    sys_fsetxattr+0xa8/0xd0
    system_call_fastpath+0x16/0x1b

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:24 -07:00
Junxiao Bi
b30f14c490 ocfs2: xattr: remove useless free space checking
Free space checking will be done in ocfs2_xattr_ibody_init().  So remove
here.

[akpm@linux-foundation.org: remove unused local]
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:24 -07:00
Jeff Liu
32918dd9f1 ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctly
We need to re-initialize the security for a new reflinked inode with its
parent dirs if it isn't specified to be preserved for ocfs2_reflink().
However, the code logic is broken at ocfs2_init_security_and_acl()
although ocfs2_init_security_get() succeed.  As a result,
ocfs2_acl_init() does not involked and therefore the default ACL of
parent dir was missing on the new inode.

Note this was introduced by 9d8f13ba3 ("security: new
security_inode_init_security API adds function callback")

To reproduce:

    set default ACL for the parent dir(ocfs2 in this case):
    $ setfacl -m default:user:jeff:rwx ../ocfs2/
    $ getfacl ../ocfs2/
    # file: ../ocfs2/
    # owner: jeff
    # group: jeff
    user::rwx
    group::r-x
    other::r-x
    default:user::rwx
    default:user:jeff:rwx
    default:group::r-x
    default😷:rwx
    default:other::r-x

    $ touch a
    $ getfacl a
    # file: a
    # owner: jeff
    # group: jeff
    user::rw-
    group::rw-
    other::r--

Before patching, create reflink file b from a, the user
default ACL entry(user:jeff:rwx)was missing:

    $ ./ocfs2_reflink a b
    $ getfacl b
    # file: b
    # owner: jeff
    # group: jeff
    user::rw-
    group::rw-
    other::r--

In this case, the end user can also observed an error message at syslog:

  (ocfs2_reflink,3229,2):ocfs2_init_security_and_acl:7193 ERROR: status = 0

After applying this patch, create reflink file c from a:

    $ ./ocfs2_reflink a c
    $ getfacl c
    # file: c
    # owner: jeff
    # group: jeff
    user::rw-
    user:jeff:rwx			#effective:rw-
    group::r-x			#effective:r--
    mask::rw-
    other::r--

Test program:
/* Usage: reflink <source> <dest> */
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>

static int
reflink_file(char const *src_name, char const *dst_name,
	     bool preserve_attrs)
{
	int fd;

#ifndef REFLINK_ATTR_NONE
#  define REFLINK_ATTR_NONE 0
#endif
#ifndef REFLINK_ATTR_PRESERVE
#  define REFLINK_ATTR_PRESERVE 1
#endif
#ifndef OCFS2_IOC_REFLINK
	struct reflink_arguments {
		uint64_t old_path;
		uint64_t new_path;
		uint64_t preserve;
	};

#  define OCFS2_IOC_REFLINK _IOW ('o', 4, struct reflink_arguments)
#endif
	struct reflink_arguments args = {
		.old_path = (unsigned long) src_name,
		.new_path = (unsigned long) dst_name,
		.preserve = preserve_attrs ? REFLINK_ATTR_PRESERVE :
					     REFLINK_ATTR_NONE,
	};

	fd = open(src_name, O_RDONLY);
	if (fd < 0) {
		fprintf(stderr, "Failed to open %s: %s\n",
			src_name, strerror(errno));
		return -1;
	}

	if (ioctl(fd, OCFS2_IOC_REFLINK, &args) < 0) {
		fprintf(stderr, "Failed to reflink %s to %s: %s\n",
			src_name, dst_name, strerror(errno));
		return -1;
	}
}

int
main(int argc, char *argv[])
{
	if (argc != 3) {
		fprintf(stdout, "Usage: %s source dest\n", argv[0]);
		return 1;
	}

	return reflink_file(argv[1], argv[2], 0);
}

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Tao Ma <boyu.mt@taobao.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:09 -08:00
Al Viro
67697cbdcc ocfs2: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:02 -05:00