Skip to content
  1. Aug 08, 2018
  2. Aug 07, 2018
  3. Aug 03, 2018
  4. Aug 02, 2018
    • Matt Arsenault's avatar
      AMDGPU: Fix missing declaration of queue ptr builtin · e3d81572
      Matt Arsenault authored
      llvm-svn: 338754
      e3d81572
    • Matt Arsenault's avatar
      Try to make builtin address space declarations not useless · c65f966d
      Matt Arsenault authored
      The way address space declarations for builtins currently work
      is nearly useless. The code assumes the address spaces used for
      builtins is a confusingly named "target address space" from user
      code using __attribute__((address_space(N))) that matches
      the builtin declaration. There's no way to use this to declare
      a builtin that returns a language specific address space.
      The terminology used is highly cofusing since it has nothing
      to do with the the address space selected by the target to use
      for a language address space.
      
      This feature is essentially unused as-is. AMDGPU and NVPTX
      are the only in-tree targets attempting to use this. The AMDGPU
      builtins certainly do not behave as intended (i.e. all of the
      builtins returning pointers can never compile because the numbered
      address space never matches the expected named address space).
      
      The NVPTX builtins are missing tests for some, and the others
      seem to rely on an implicit addrspacecast.
      
      Change the used address space for builtins based on a target
      hook to allow using a language address space for a builtin.
      This allows the same builtin declaration to be used for multiple
      languages with similarly purposed address spaces (e.g. the same
      AMDGPU builtin can be used in OpenCL and CUDA even though the
      constant address spaces are arbitarily different).
      
      This breaks the possibility of using arbitrary numbered
      address spaces alongside the named address spaces for builtins.
      If this is an issue we probably need to introduce another builtin
      declaration character to distinguish language address spaces from
      so-called "target address spaces".
      
      llvm-svn: 338707
      c65f966d
  5. Aug 01, 2018
  6. Jul 30, 2018
  7. Jul 13, 2018
  8. May 21, 2018
    • Daniil Fukalov's avatar
      [AMDGPU] fixes for lds f32 builtins · 1b14a3ad
      Daniil Fukalov authored
      1. added restrictions to memory scope, order and volatile parameters
      2. added custom processing for these builtins - currently is not used code,
         needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
         tree)
      3. builtins renamed as requested
      
      Differential Revision: https://reviews.llvm.org/D43281
      
      llvm-svn: 332848
      1b14a3ad
  9. May 16, 2018
    • Sanjay Patel's avatar
      [OpenCL] make test independent of optimizer · cda77b30
      Sanjay Patel authored
      There shouldn't be any tests that run the entire optimizer here,
      but the last test in this file is definitely going to break with 
      a change in LLVM IR canonicalization. Change that part to check
      the unoptimized IR because that's the real intent of this file.
      
      llvm-svn: 332473
      cda77b30
  10. May 09, 2018
  11. May 01, 2018
  12. Apr 30, 2018
  13. Apr 27, 2018
    • Sven van Haastregt's avatar
      [OpenCL] Add separate read_only and write_only pipe IR types · 4700faa2
      Sven van Haastregt authored
      SPIR-V encodes the read_only and write_only access qualifiers of pipes,
      so separate LLVM IR types are required to target SPIR-V.  Other backends
      may also find this useful.
      
      These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
      replace `opencl.pipe_t`.
      
      This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
      which took a read_only pipe with separate versions for read_only and
      write_only pipes, namely:
      
       * __get_pipe_num_packets_ro(...)
       * __get_pipe_num_packets_wo(...)
       * __get_pipe_max_packets_ro(...)
       * __get_pipe_max_packets_wo(...)
      
      These separate versions exist to avoid needing a bitcast to one of the
      two qualified pipe types.
      
      Patch by Stuart Brady.
      
      Differential Revision: https://reviews.llvm.org/D46015
      
      llvm-svn: 331026
      4700faa2
  14. Apr 20, 2018
  15. Apr 06, 2018
    • Alexander Kornienko's avatar
      Fix typos in clang · 2a8c18d9
      Alexander Kornienko authored
      Found via codespell -q 3 -I ../clang-whitelist.txt
      Where whitelist consists of:
      
        archtype
        cas
        classs
        checkk
        compres
        definit
        frome
        iff
        inteval
        ith
        lod
        methode
        nd
        optin
        ot
        pres
        statics
        te
        thru
      
      Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
      files that have dubious fixes reverted.)
      
      Differential revision: https://reviews.llvm.org/D44188
      
      llvm-svn: 329399
      2a8c18d9
  16. Mar 27, 2018
  17. Mar 23, 2018
  18. Mar 15, 2018
  19. Mar 10, 2018
  20. Mar 07, 2018
    • Yaxun Liu's avatar
      CodeGen: Fix address space of indirect function argument · 06dd8114
      Yaxun Liu authored
      The indirect function argument is in alloca address space in LLVM IR. However,
      during Clang codegen for C++, the address space of indirect function argument
      should match its address space in the source code, i.e., default addr space, even
      for indirect argument. This is because destructor of the indirect argument may
      be called in the caller function, and address of the indirect argument may be
      taken, in either case the indirect function argument is expected to be in default
      addr space, not the alloca address space.
      
      Therefore, the indirect function argument should be mapped to the temp var
      casted to default address space. The caller will cast it to alloca addr space
      when passing it to the callee. In the callee, the argument is also casted to the
      default address space and used.
      
      CallArg is refactored to facilitate this fix.
      
      Differential Revision: https://reviews.llvm.org/D34367
      
      llvm-svn: 326946
      06dd8114
    • Yaxun Liu's avatar
      [OpenCL] Remove block invoke function from emitted block literal struct · cb35e9fa
      Yaxun Liu authored
      OpenCL runtime tracks the invoke function emitted for
      any block expression. Due to restrictions on blocks in
      OpenCL (v2.0 s6.12.5), it is always possible to know the
      block invoke function when emitting call of block expression
      or __enqueue_kernel builtin functions. Since __enqueu_kernel
      already has an argument for the invoke function, it is redundant
      to have invoke function member in the llvm block literal structure.
      
      This patch removes invoke function from the llvm block literal
      structure. It also removes the bitcast of block invoke function
      to the generic block literal type which is useless for OpenCL.
      
      This will save some space for the kernel argument, and also
      eliminate some store instructions.
      
      Differential Revision: https://reviews.llvm.org/D43783
      
      llvm-svn: 326937
      cb35e9fa
  21. Feb 23, 2018
    • Rafael Espindola's avatar
      Bring r325915 back. · 922f2aa9
      Rafael Espindola authored
      The tests that failed on a windows host have been fixed.
      
      Original message:
      
      Start setting dso_local for COFF.
      
      With this there are still some GVs where we don't set dso_local
      because setGVProperties is never called. I intend to fix that in
      followup commits. This is just the bare minimum to teach
      shouldAssumeDSOLocal what it should do for COFF.
      
      llvm-svn: 325940
      922f2aa9
  22. Feb 22, 2018
    • Alexey Sotkin's avatar
      [OpenCL] Add '-cl-uniform-work-group-size' compile option · 20f65928
      Alexey Sotkin authored
      Summary:
      OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option,
      which requires that the global work-size be a multiple of the work-group
      size specified to clEnqueueNDRangeKernel and allows optimizations that
      are made possible by this restriction.
      
      The patch introduces the support of this option.
      
      To keep information about whether an OpenCL kernel has uniform work
      group size or not, clang generates 'uniform-work-group-size' function
      attribute for every kernel:
      - "uniform-work-group-size"="true" for OpenCL 1.2 and lower,
      - "uniform-work-group-size"="true" for OpenCL 2.0 and higher if
       '-cl-uniform-work-group-size' option was specified,
      - "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no
       '-cl-uniform-work-group-size' options was specified.
      
      If the function is not an OpenCL kernel, 'uniform-work-group-size'
      attribute isn't generated.
      
      Patch by: krisb
      
      Reviewers: yaxunl, Anastasia, b-sumner
      
      Reviewed By: yaxunl, Anastasia
      
      Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits
      
      Differential Revision: https://reviews.llvm.org/D43570
      
      llvm-svn: 325771
      20f65928
  23. Feb 15, 2018
    • Yaxun Liu's avatar
      Clean up AMDGCN tests · f8ad59d9
      Yaxun Liu authored
      Differential Revision: https://reviews.llvm.org/D43340
      
      llvm-svn: 325279
      f8ad59d9
    • Yaxun Liu's avatar
      [OpenCL] Fix __enqueue_block for block with captures · fa13d015
      Yaxun Liu authored
      The following test case causes issue with codegen of __enqueue_block
      
      void (^block)(void) = ^{ callee(id, out); };
      
      enqueue_kernel(queue, 0, ndrange, block);
      Clang first does codegen for block expression in the first line and deletes its block info.
      Clang then tries to do codegen for the same block expression again for the second line,
      and fails because the block info is gone.
      
      The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
      record llvm block invoke function and llvm block literal emitted for each AST block
      expression, and use the recorded information for generating the wrapper kernel.
      
      The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
      of blocks.
      
      Another minor issue is that some clean up AST expression is generated for block
      with captures, which can be stripped by IgnoreImplicit.
      
      Differential Revision: https://reviews.llvm.org/D43240
      
      llvm-svn: 325264
      fa13d015
  24. Feb 13, 2018
  25. Feb 09, 2018
  26. Feb 08, 2018
  27. Feb 04, 2018
Loading