48 files changed, 0 insertions, 18717 deletions
diff --git a/vendor/github.com/klauspost/compress/zstd/README.md b/vendor/github.com/klauspost/compress/zstd/README.md
deleted file mode 100644
index 92e2347bb..000000000
--- a/vendor/github.com/klauspost/compress/zstd/README.md
+++ /dev/null
@@ -1,441 +0,0 @@
-# zstd 
-
-[Zstandard](https://facebook.github.io/zstd/) is a real-time compression algorithm, providing high compression ratios. 
-It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder.
-A high performance compression algorithm is implemented. For now focused on speed. 
-
-This package provides [compression](#Compressor) to and [decompression](#Decompressor) of Zstandard content. 
-
-This package is pure Go and without use of "unsafe". 
-
-The `zstd` package is provided as open source software using a Go standard license.
-
-Currently the package is heavily optimized for 64 bit processors and will be significantly slower on 32 bit processors.
-
-For seekable zstd streams, see [this excellent package](https://github.com/SaveTheRbtz/zstd-seekable-format-go).
-
-## Installation
-
-Install using `go get -u github.com/klauspost/compress`. The package is located in `github.com/klauspost/compress/zstd`.
-
-[![Go Reference](https://pkg.go.dev/badge/github.com/klauspost/compress/zstd.svg)](https://pkg.go.dev/github.com/klauspost/compress/zstd)
-
-## Compressor
-
-### Status: 
-
-STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is actively 
-used by several projects. This library is being [fuzz-tested](https://github.com/klauspost/compress-fuzz) for all updates.
-
-There may still be specific combinations of data types/size/settings that could lead to edge cases, 
-so as always, testing is recommended.  
-
-For now, a high speed (fastest) and medium-fast (default) compressor has been implemented. 
-
-* The "Fastest" compression ratio is roughly equivalent to zstd level 1. 
-* The "Default" compression ratio is roughly equivalent to zstd level 3 (default).
-* The "Better" compression ratio is roughly equivalent to zstd level 7.
-* The "Best" compression ratio is roughly equivalent to zstd level 11.
-
-In terms of speed, it is typically 2x as fast as the stdlib deflate/gzip in its fastest mode. 
-The compression ratio compared to stdlib is around level 3, but usually 3x as fast.
-
- 
-### Usage
-
-An Encoder can be used for either compressing a stream via the
-`io.WriteCloser` interface supported by the Encoder or as multiple independent
-tasks via the `EncodeAll` function.
-Smaller encodes are encouraged to use the EncodeAll function.
-Use `NewWriter` to create a new instance that can be used for both.
-
-To create a writer with default options, do like this:
-
-```Go
-// Compress input to output.
-func Compress(in io.Reader, out io.Writer) error {
-    enc, err := zstd.NewWriter(out)
-    if err != nil {
-        return err
-    }
-    _, err = io.Copy(enc, in)
-    if err != nil {
-        enc.Close()
-        return err
-    }
-    return enc.Close()
-}
-```
-
-Now you can encode by writing data to `enc`. The output will be finished writing when `Close()` is called.
-Even if your encode fails, you should still call `Close()` to release any resources that may be held up.  
-
-The above is fine for big encodes. However, whenever possible try to *reuse* the writer.
-
-To reuse the encoder, you can use the `Reset(io.Writer)` function to change to another output. 
-This will allow the encoder to reuse all resources and avoid wasteful allocations. 
-
-Currently stream encoding has 'light' concurrency, meaning up to 2 goroutines can be working on part 
-of a stream. This is independent of the `WithEncoderConcurrency(n)`, but that is likely to change 
-in the future. So if you want to limit concurrency for future updates, specify the concurrency
-you would like.
-
-If you would like stream encoding to be done without spawning async goroutines, use `WithEncoderConcurrency(1)`
-which will compress input as each block is completed, blocking on writes until each has completed.
-
-You can specify your desired compression level using `WithEncoderLevel()` option. Currently only pre-defined 
-compression settings can be specified.
-
-#### Future Compatibility Guarantees
-
-This will be an evolving project. When using this package it is important to note that both the compression efficiency and speed may change.
-
-The goal will be to keep the default efficiency at the default zstd (level 3). 
-However the encoding should never be assumed to remain the same, 
-and you should not use hashes of compressed output for similarity checks.
-
-The Encoder can be assumed to produce the same output from the exact same code version.
-However, the may be modes in the future that break this, 
-although they will not be enabled without an explicit option.   
-
-This encoder is not designed to (and will probably never) output the exact same bitstream as the reference encoder.
-
-Also note, that the cgo decompressor currently does not [report all errors on invalid input](https://github.com/DataDog/zstd/issues/59),
-[omits error checks](https://github.com/DataDog/zstd/issues/61), [ignores checksums](https://github.com/DataDog/zstd/issues/43) 
-and seems to ignore concatenated streams, even though [it is part of the spec](https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#frames).
-
-#### Blocks
-
-For compressing small blocks, the returned encoder has a function called `EncodeAll(src, dst []byte) []byte`.
-
-`EncodeAll` will encode all input in src and append it to dst.
-This function can be called concurrently. 
-Each call will only run on a same goroutine as the caller.
-
-Encoded blocks can be concatenated and the result will be the combined input stream.
-Data compressed with EncodeAll can be decoded with the Decoder, using either a stream or `DecodeAll`.
-
-Especially when encoding blocks you should take special care to reuse the encoder. 
-This will effectively make it run without allocations after a warmup period. 
-To make it run completely without allocations, supply a destination buffer with space for all content.   
-
-```Go
-import "github.com/klauspost/compress/zstd"
-
-// Create a writer that caches compressors.
-// For this operation type we supply a nil Reader.
-var encoder, _ = zstd.NewWriter(nil)
-
-// Compress a buffer. 
-// If you have a destination buffer, the allocation in the call can also be eliminated.
-func Compress(src []byte) []byte {
-    return encoder.EncodeAll(src, make([]byte, 0, len(src)))
-} 
-```
-
-You can control the maximum number of concurrent encodes using the `WithEncoderConcurrency(n)` 
-option when creating the writer.
-
-Using the Encoder for both a stream and individual blocks concurrently is safe. 
-
-### Performance
-
-I have collected some speed examples to compare speed and compression against other compressors.
-
-* `file` is the input file.
-* `out` is the compressor used. `zskp` is this package. `zstd` is the Datadog cgo library. `gzstd/gzkp` is gzip standard and this library.
-* `level` is the compression level used. For `zskp` level 1 is "fastest", level 2 is "default"; 3 is "better", 4 is "best".
-* `insize`/`outsize` is the input/output size.
-* `millis` is the number of milliseconds used for compression.
-* `mb/s` is megabytes (2^20 bytes) per second.
-
-```
-Silesia Corpus:
-http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
-
-This package:
-file    out     level   insize      outsize     millis  mb/s
-silesia.tar zskp    1   211947520   73821326    634     318.47
-silesia.tar zskp    2   211947520   67655404    1508    133.96
-silesia.tar zskp    3   211947520   64746933    3000    67.37
-silesia.tar zskp    4   211947520   60073508    16926   11.94
-
-cgo zstd:
-silesia.tar zstd    1   211947520   73605392    543     371.56
-silesia.tar zstd    3   211947520   66793289    864     233.68
-silesia.tar zstd    6   211947520   62916450    1913    105.66
-silesia.tar zstd    9   211947520   60212393    5063    39.92
-
-gzip, stdlib/this package:
-silesia.tar gzstd   1   211947520   80007735    1498    134.87
-silesia.tar gzkp    1   211947520   80088272    1009    200.31
-
-GOB stream of binary data. Highly compressible.
-https://files.klauspost.com/compress/gob-stream.7z
-
-file        out     level   insize  outsize     millis  mb/s
-gob-stream  zskp    1   1911399616  233948096   3230    564.34
-gob-stream  zskp    2   1911399616  203997694   4997    364.73
-gob-stream  zskp    3   1911399616  173526523   13435   135.68
-gob-stream  zskp    4   1911399616  162195235   47559   38.33
-
-gob-stream  zstd    1   1911399616  249810424   2637    691.26
-gob-stream  zstd    3   1911399616  208192146   3490    522.31
-gob-stream  zstd    6   1911399616  193632038   6687    272.56
-gob-stream  zstd    9   1911399616  177620386   16175   112.70
-
-gob-stream  gzstd   1   1911399616  357382013   9046    201.49
-gob-stream  gzkp    1   1911399616  359136669   4885    373.08
-
-The test data for the Large Text Compression Benchmark is the first
-10^9 bytes of the English Wikipedia dump on Mar. 3, 2006.
-http://mattmahoney.net/dc/textdata.html
-
-file    out level   insize      outsize     millis  mb/s
-enwik9  zskp    1   1000000000  343833605   3687    258.64
-enwik9  zskp    2   1000000000  317001237   7672    124.29
-enwik9  zskp    3   1000000000  291915823   15923   59.89
-enwik9  zskp    4   1000000000  261710291   77697   12.27
-
-enwik9  zstd    1   1000000000  358072021   3110    306.65
-enwik9  zstd    3   1000000000  313734672   4784    199.35
-enwik9  zstd    6   1000000000  295138875   10290   92.68
-enwik9  zstd    9   1000000000  278348700   28549   33.40
-
-enwik9  gzstd   1   1000000000  382578136   8608    110.78
-enwik9  gzkp    1   1000000000  382781160   5628    169.45
-
-Highly compressible JSON file.
-https://files.klauspost.com/compress/github-june-2days-2019.json.zst
-
-file                        out level   insize      outsize     millis  mb/s
-github-june-2days-2019.json zskp    1   6273951764  697439532   9789    611.17
-github-june-2days-2019.json zskp    2   6273951764  610876538   18553   322.49
-github-june-2days-2019.json zskp    3   6273951764  517662858   44186   135.41
-github-june-2days-2019.json zskp    4   6273951764  464617114   165373  36.18
-
-github-june-2days-2019.json zstd    1   6273951764  766284037   8450    708.00
-github-june-2days-2019.json zstd    3   6273951764  661889476   10927   547.57
-github-june-2days-2019.json zstd    6   6273951764  642756859   22996   260.18
-github-june-2days-2019.json zstd    9   6273951764  601974523   52413   114.16
-
-github-june-2days-2019.json gzstd   1   6273951764  1164397768  26793   223.32
-github-june-2days-2019.json gzkp    1   6273951764  1120631856  17693   338.16
-
-VM Image, Linux mint with a few installed applications:
-https://files.klauspost.com/compress/rawstudio-mint14.7z
-
-file                    out level   insize      outsize     millis  mb/s
-rawstudio-mint14.tar    zskp    1   8558382592  3718400221  18206   448.29
-rawstudio-mint14.tar    zskp    2   8558382592  3326118337  37074   220.15
-rawstudio-mint14.tar    zskp    3   8558382592  3163842361  87306   93.49
-rawstudio-mint14.tar    zskp    4   8558382592  2970480650  783862  10.41
-
-rawstudio-mint14.tar    zstd    1   8558382592  3609250104  17136   476.27
-rawstudio-mint14.tar    zstd    3   8558382592  3341679997  29262   278.92
-rawstudio-mint14.tar    zstd    6   8558382592  3235846406  77904   104.77
-rawstudio-mint14.tar    zstd    9   8558382592  3160778861  140946  57.91
-
-rawstudio-mint14.tar    gzstd   1   8558382592  3926234992  51345   158.96
-rawstudio-mint14.tar    gzkp    1   8558382592  3960117298  36722   222.26
-
-CSV data:
-https://files.klauspost.com/compress/nyc-taxi-data-10M.csv.zst
-
-file                    out level   insize      outsize     millis  mb/s
-nyc-taxi-data-10M.csv   zskp    1   3325605752  641319332   9462    335.17
-nyc-taxi-data-10M.csv   zskp    2   3325605752  588976126   17570   180.50
-nyc-taxi-data-10M.csv   zskp    3   3325605752  529329260   32432   97.79
-nyc-taxi-data-10M.csv   zskp    4   3325605752  474949772   138025  22.98
-
-nyc-taxi-data-10M.csv   zstd    1   3325605752  687399637   8233    385.18
-nyc-taxi-data-10M.csv   zstd    3   3325605752  598514411   10065   315.07
-nyc-taxi-data-10M.csv   zstd    6   3325605752  570522953   20038   158.27
-nyc-taxi-data-10M.csv   zstd    9   3325605752  517554797   64565   49.12
-
-nyc-taxi-data-10M.csv   gzstd   1   3325605752  928654908   21270   149.11
-nyc-taxi-data-10M.csv   gzkp    1   3325605752  922273214   13929   227.68
-```
-
-## Decompressor
-
-Status: STABLE - there may still be subtle bugs, but a wide variety of content has been tested.
-
-This library is being continuously [fuzz-tested](https://github.com/klauspost/compress-fuzz),
-kindly supplied by [fuzzit.dev](https://fuzzit.dev/). 
-The main purpose of the fuzz testing is to ensure that it is not possible to crash the decoder, 
-or run it past its limits with ANY input provided.  
- 
-### Usage
-
-The package has been designed for two main usages, big streams of data and smaller in-memory buffers. 
-There are two main usages of the package for these. Both of them are accessed by creating a `Decoder`.
-
-For streaming use a simple setup could look like this:
-
-```Go
-import "github.com/klauspost/compress/zstd"
-
-func Decompress(in io.Reader, out io.Writer) error {
-    d, err := zstd.NewReader(in)
-    if err != nil {
-        return err
-    }
-    defer d.Close()
-    
-    // Copy content...
-    _, err = io.Copy(out, d)
-    return err
-}
-```
-
-It is important to use the "Close" function when you no longer need the Reader to stop running goroutines, 
-when running with default settings.
-Goroutines will exit once an error has been returned, including `io.EOF` at the end of a stream.
-
-Streams are decoded concurrently in 4 asynchronous stages to give the best possible throughput.
-However, if you prefer synchronous decompression, use `WithDecoderConcurrency(1)` which will decompress data 
-as it is being requested only.
-
-For decoding buffers, it could look something like this:
-
-```Go
-import "github.com/klauspost/compress/zstd"
-
-// Create a reader that caches decompressors.
-// For this operation type we supply a nil Reader.
-var decoder, _ = zstd.NewReader(nil, zstd.WithDecoderConcurrency(0))
-
-// Decompress a buffer. We don't supply a destination buffer,
-// so it will be allocated by the decoder.
-func Decompress(src []byte) ([]byte, error) {
-    return decoder.DecodeAll(src, nil)
-} 
-```
-
-Both of these cases should provide the functionality needed. 
-The decoder can be used for *concurrent* decompression of multiple buffers.
-By default 4 decompressors will be created. 
-
-It will only allow a certain number of concurrent operations to run. 
-To tweak that yourself use the `WithDecoderConcurrency(n)` option when creating the decoder.
-It is possible to use `WithDecoderConcurrency(0)` to create GOMAXPROCS decoders.
-
-### Dictionaries
-
-Data compressed with [dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression) can be decompressed.
-
-Dictionaries are added individually to Decoders.
-Dictionaries are generated by the `zstd --train` command and contains an initial state for the decoder.
-To add a dictionary use the `WithDecoderDicts(dicts ...[]byte)` option with the dictionary data.
-Several dictionaries can be added at once.
-
-The dictionary will be used automatically for the data that specifies them.
-A re-used Decoder will still contain the dictionaries registered.
-
-When registering multiple dictionaries with the same ID, the last one will be used.
-
-It is possible to use dictionaries when compressing data.
-
-To enable a dictionary use `WithEncoderDict(dict []byte)`. Here only one dictionary will be used 
-and it will likely be used even if it doesn't improve compression. 
-
-The used dictionary must be used to decompress the content.
-
-For any real gains, the dictionary should be built with similar data. 
-If an unsuitable dictionary is used the output may be slightly larger than using no dictionary.
-Use the [zstd commandline tool](https://github.com/facebook/zstd/releases) to build a dictionary from sample data.
-For information see [zstd dictionary information](https://github.com/facebook/zstd#the-case-for-small-data-compression). 
-
-For now there is a fixed startup performance penalty for compressing content with dictionaries. 
-This will likely be improved over time. Just be aware to test performance when implementing.  
-
-### Allocation-less operation
-
-The decoder has been designed to operate without allocations after a warmup. 
-
-This means that you should *store* the decoder for best performance. 
-To re-use a stream decoder, use the `Reset(r io.Reader) error` to switch to another stream.
-A decoder can safely be re-used even if the previous stream failed.
-
-To release the resources, you must call the `Close()` function on a decoder.
-After this it can *no longer be reused*, but all running goroutines will be stopped.
-So you *must* use this if you will no longer need the Reader.
-
-For decompressing smaller buffers a single decoder can be used.
-When decoding buffers, you can supply a destination slice with length 0 and your expected capacity.
-In this case no unneeded allocations should be made. 
-
-### Concurrency
-
-The buffer decoder does everything on the same goroutine and does nothing concurrently.
-It can however decode several buffers concurrently. Use `WithDecoderConcurrency(n)` to limit that.
-
-The stream decoder will create goroutines that:
-
-1) Reads input and splits the input into blocks.
-2) Decompression of literals.
-3) Decompression of sequences.
-4) Reconstruction of output stream.
-
-So effectively this also means the decoder will "read ahead" and prepare data to always be available for output.
-
-The concurrency level will, for streams, determine how many blocks ahead the compression will start.
-
-Since "blocks" are quite dependent on the output of the previous block stream decoding will only have limited concurrency.
-
-In practice this means that concurrency is often limited to utilizing about 3 cores effectively.
-  
-### Benchmarks
-
-The first two are streaming decodes and the last are smaller inputs. 
-
-Running on AMD Ryzen 9 3950X 16-Core Processor. AMD64 assembly used.
-
-```
-BenchmarkDecoderSilesia-32    	                   5	 206878840 ns/op	1024.50 MB/s	   49808 B/op	      43 allocs/op
-BenchmarkDecoderEnwik9-32                          1	1271809000 ns/op	 786.28 MB/s	   72048 B/op	      52 allocs/op
-
-Concurrent blocks, performance:
-
-BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32         	   67356	     17857 ns/op	10321.96 MB/s	        22.48 pct	     102 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32     	  266656	      4421 ns/op	26823.21 MB/s	        11.89 pct	      19 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32      	   20992	     56842 ns/op	8477.17 MB/s	        39.90 pct	     754 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32        	   27456	     43932 ns/op	9714.01 MB/s	        33.27 pct	     524 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32      	   78432	     15047 ns/op	8319.15 MB/s	        40.34 pct	      66 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32       	   65800	     18436 ns/op	8249.63 MB/s	        37.75 pct	      88 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32          	  102993	     11523 ns/op	35546.09 MB/s	         3.637 pct	     143 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32    	 1000000	      1070 ns/op	95720.98 MB/s	        80.53 pct	       3 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32    	  749802	      1752 ns/op	70272.35 MB/s	       100.0 pct	       5 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32          	   22640	     52934 ns/op	13263.37 MB/s	        26.25 pct	    1014 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/html.zst-32              	  226412	      5232 ns/op	19572.27 MB/s	        14.49 pct	      20 B/op	       0 allocs/op
-BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32     	  923041	      1276 ns/op	3194.71 MB/s	        31.26 pct	       0 B/op	       0 allocs/op
-```
-
-This reflects the performance around May 2022, but this may be out of date.
-
-## Zstd inside ZIP files
-
-It is possible to use zstandard to compress individual files inside zip archives.
-While this isn't widely supported it can be useful for internal files.
-
-To support the compression and decompression of these files you must register a compressor and decompressor.
-
-It is highly recommended registering the (de)compressors on individual zip Reader/Writer and NOT
-use the global registration functions. The main reason for this is that 2 registrations from 
-different packages will result in a panic.
-
-It is a good idea to only have a single compressor and decompressor, since they can be used for multiple zip
-files concurrently, and using a single instance will allow reusing some resources.
-
-See [this example](https://pkg.go.dev/github.com/klauspost/compress/zstd#example-ZipCompressor) for 
-how to compress and decompress files inside zip archives.
-
-# Contributions
-
-Contributions are always welcome. 
-For new features/fixes, remember to add tests and for performance enhancements include benchmarks.
-
-For general feedback and experience reports, feel free to open an issue or write me on [Twitter](https://twitter.com/sh0dan).
-
-This package includes the excellent [`github.com/cespare/xxhash`](https://github.com/cespare/xxhash) package Copyright (c) 2016 Caleb Spare.
diff --git a/vendor/github.com/klauspost/compress/zstd/bitreader.go b/vendor/github.com/klauspost/compress/zstd/bitreader.go
deleted file mode 100644
index 25ca98394..000000000
--- a/vendor/github.com/klauspost/compress/zstd/bitreader.go
+++ /dev/null
@@ -1,136 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"encoding/binary"
-	"errors"
-	"fmt"
-	"io"
-	"math/bits"
-)
-
-// bitReader reads a bitstream in reverse.
-// The last set bit indicates the start of the stream and is used
-// for aligning the input.
-type bitReader struct {
-	in       []byte
-	value    uint64 // Maybe use [16]byte, but shifting is awkward.
-	bitsRead uint8
-}
-
-// init initializes and resets the bit reader.
-func (b *bitReader) init(in []byte) error {
-	if len(in) < 1 {
-		return errors.New("corrupt stream: too short")
-	}
-	b.in = in
-	// The highest bit of the last byte indicates where to start
-	v := in[len(in)-1]
-	if v == 0 {
-		return errors.New("corrupt stream, did not find end of stream")
-	}
-	b.bitsRead = 64
-	b.value = 0
-	if len(in) >= 8 {
-		b.fillFastStart()
-	} else {
-		b.fill()
-		b.fill()
-	}
-	b.bitsRead += 8 - uint8(highBits(uint32(v)))
-	return nil
-}
-
-// getBits will return n bits. n can be 0.
-func (b *bitReader) getBits(n uint8) int {
-	if n == 0 /*|| b.bitsRead >= 64 */ {
-		return 0
-	}
-	return int(b.get32BitsFast(n))
-}
-
-// get32BitsFast requires that at least one bit is requested every time.
-// There are no checks if the buffer is filled.
-func (b *bitReader) get32BitsFast(n uint8) uint32 {
-	const regMask = 64 - 1
-	v := uint32((b.value << (b.bitsRead & regMask)) >> ((regMask + 1 - n) & regMask))
-	b.bitsRead += n
-	return v
-}
-
-// fillFast() will make sure at least 32 bits are available.
-// There must be at least 4 bytes available.
-func (b *bitReader) fillFast() {
-	if b.bitsRead < 32 {
-		return
-	}
-	v := b.in[len(b.in)-4:]
-	b.in = b.in[:len(b.in)-4]
-	low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
-	b.value = (b.value << 32) | uint64(low)
-	b.bitsRead -= 32
-}
-
-// fillFastStart() assumes the bitreader is empty and there is at least 8 bytes to read.
-func (b *bitReader) fillFastStart() {
-	v := b.in[len(b.in)-8:]
-	b.in = b.in[:len(b.in)-8]
-	b.value = binary.LittleEndian.Uint64(v)
-	b.bitsRead = 0
-}
-
-// fill() will make sure at least 32 bits are available.
-func (b *bitReader) fill() {
-	if b.bitsRead < 32 {
-		return
-	}
-	if len(b.in) >= 4 {
-		v := b.in[len(b.in)-4:]
-		b.in = b.in[:len(b.in)-4]
-		low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
-		b.value = (b.value << 32) | uint64(low)
-		b.bitsRead -= 32
-		return
-	}
-
-	b.bitsRead -= uint8(8 * len(b.in))
-	for len(b.in) > 0 {
-		b.value = (b.value << 8) | uint64(b.in[len(b.in)-1])
-		b.in = b.in[:len(b.in)-1]
-	}
-}
-
-// finished returns true if all bits have been read from the bit stream.
-func (b *bitReader) finished() bool {
-	return len(b.in) == 0 && b.bitsRead >= 64
-}
-
-// overread returns true if more bits have been requested than is on the stream.
-func (b *bitReader) overread() bool {
-	return b.bitsRead > 64
-}
-
-// remain returns the number of bits remaining.
-func (b *bitReader) remain() uint {
-	return 8*uint(len(b.in)) + 64 - uint(b.bitsRead)
-}
-
-// close the bitstream and returns an error if out-of-buffer reads occurred.
-func (b *bitReader) close() error {
-	// Release reference.
-	b.in = nil
-	if !b.finished() {
-		return fmt.Errorf("%d extra bits on block, should be 0", b.remain())
-	}
-	if b.bitsRead > 64 {
-		return io.ErrUnexpectedEOF
-	}
-	return nil
-}
-
-func highBits(val uint32) (n uint32) {
-	return uint32(bits.Len32(val) - 1)
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/bitwriter.go b/vendor/github.com/klauspost/compress/zstd/bitwriter.go
deleted file mode 100644
index 1952f175b..000000000
--- a/vendor/github.com/klauspost/compress/zstd/bitwriter.go
+++ /dev/null
@@ -1,112 +0,0 @@
-// Copyright 2018 Klaus Post. All rights reserved.
-// Use of this source code is governed by a BSD-style
-// license that can be found in the LICENSE file.
-// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
-
-package zstd
-
-// bitWriter will write bits.
-// First bit will be LSB of the first byte of output.
-type bitWriter struct {
-	bitContainer uint64
-	nBits        uint8
-	out          []byte
-}
-
-// bitMask16 is bitmasks. Has extra to avoid bounds check.
-var bitMask16 = [32]uint16{
-	0, 1, 3, 7, 0xF, 0x1F,
-	0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF,
-	0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF, 0xFFFF,
-	0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,
-	0xFFFF, 0xFFFF} /* up to 16 bits */
-
-var bitMask32 = [32]uint32{
-	0, 1, 3, 7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF,
-	0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF,
-	0x1ffff, 0x3ffff, 0x7FFFF, 0xfFFFF, 0x1fFFFF, 0x3fFFFF, 0x7fFFFF, 0xffFFFF,
-	0x1ffFFFF, 0x3ffFFFF, 0x7ffFFFF, 0xfffFFFF, 0x1fffFFFF, 0x3fffFFFF, 0x7fffFFFF,
-} // up to 32 bits
-
-// addBits16NC will add up to 16 bits.
-// It will not check if there is space for them,
-// so the caller must ensure that it has flushed recently.
-func (b *bitWriter) addBits16NC(value uint16, bits uint8) {
-	b.bitContainer |= uint64(value&bitMask16[bits&31]) << (b.nBits & 63)
-	b.nBits += bits
-}
-
-// addBits32NC will add up to 31 bits.
-// It will not check if there is space for them,
-// so the caller must ensure that it has flushed recently.
-func (b *bitWriter) addBits32NC(value uint32, bits uint8) {
-	b.bitContainer |= uint64(value&bitMask32[bits&31]) << (b.nBits & 63)
-	b.nBits += bits
-}
-
-// addBits64NC will add up to 64 bits.
-// There must be space for 32 bits.
-func (b *bitWriter) addBits64NC(value uint64, bits uint8) {
-	if bits <= 31 {
-		b.addBits32Clean(uint32(value), bits)
-		return
-	}
-	b.addBits32Clean(uint32(value), 32)
-	b.flush32()
-	b.addBits32Clean(uint32(value>>32), bits-32)
-}
-
-// addBits32Clean will add up to 32 bits.
-// It will not check if there is space for them.
-// The input must not contain more bits than specified.
-func (b *bitWriter) addBits32Clean(value uint32, bits uint8) {
-	b.bitContainer |= uint64(value) << (b.nBits & 63)
-	b.nBits += bits
-}
-
-// addBits16Clean will add up to 16 bits. value may not contain more set bits than indicated.
-// It will not check if there is space for them, so the caller must ensure that it has flushed recently.
-func (b *bitWriter) addBits16Clean(value uint16, bits uint8) {
-	b.bitContainer |= uint64(value) << (b.nBits & 63)
-	b.nBits += bits
-}
-
-// flush32 will flush out, so there are at least 32 bits available for writing.
-func (b *bitWriter) flush32() {
-	if b.nBits < 32 {
-		return
-	}
-	b.out = append(b.out,
-		byte(b.bitContainer),
-		byte(b.bitContainer>>8),
-		byte(b.bitContainer>>16),
-		byte(b.bitContainer>>24))
-	b.nBits -= 32
-	b.bitContainer >>= 32
-}
-
-// flushAlign will flush remaining full bytes and align to next byte boundary.
-func (b *bitWriter) flushAlign() {
-	nbBytes := (b.nBits + 7) >> 3
-	for i := uint8(0); i < nbBytes; i++ {
-		b.out = append(b.out, byte(b.bitContainer>>(i*8)))
-	}
-	b.nBits = 0
-	b.bitContainer = 0
-}
-
-// close will write the alignment bit and write the final byte(s)
-// to the output.
-func (b *bitWriter) close() {
-	// End mark
-	b.addBits16Clean(1, 1)
-	// flush until next byte.
-	b.flushAlign()
-}
-
-// reset and continue writing by appending to out.
-func (b *bitWriter) reset(out []byte) {
-	b.bitContainer = 0
-	b.nBits = 0
-	b.out = out
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/blockdec.go b/vendor/github.com/klauspost/compress/zstd/blockdec.go
deleted file mode 100644
index 9c28840c3..000000000
--- a/vendor/github.com/klauspost/compress/zstd/blockdec.go
+++ /dev/null
@@ -1,731 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"bytes"
-	"encoding/binary"
-	"errors"
-	"fmt"
-	"hash/crc32"
-	"io"
-	"os"
-	"path/filepath"
-	"sync"
-
-	"github.com/klauspost/compress/huff0"
-	"github.com/klauspost/compress/zstd/internal/xxhash"
-)
-
-type blockType uint8
-
-//go:generate stringer -type=blockType,literalsBlockType,seqCompMode,tableIndex
-
-const (
-	blockTypeRaw blockType = iota
-	blockTypeRLE
-	blockTypeCompressed
-	blockTypeReserved
-)
-
-type literalsBlockType uint8
-
-const (
-	literalsBlockRaw literalsBlockType = iota
-	literalsBlockRLE
-	literalsBlockCompressed
-	literalsBlockTreeless
-)
-
-const (
-	// maxCompressedBlockSize is the biggest allowed compressed block size (128KB)
-	maxCompressedBlockSize = 128 << 10
-
-	compressedBlockOverAlloc    = 16
-	maxCompressedBlockSizeAlloc = 128<<10 + compressedBlockOverAlloc
-
-	// Maximum possible block size (all Raw+Uncompressed).
-	maxBlockSize = (1 << 21) - 1
-
-	maxMatchLen  = 131074
-	maxSequences = 0x7f00 + 0xffff
-
-	// We support slightly less than the reference decoder to be able to
-	// use ints on 32 bit archs.
-	maxOffsetBits = 30
-)
-
-var (
-	huffDecoderPool = sync.Pool{New: func() interface{} {
-		return &huff0.Scratch{}
-	}}
-
-	fseDecoderPool = sync.Pool{New: func() interface{} {
-		return &fseDecoder{}
-	}}
-)
-
-type blockDec struct {
-	// Raw source data of the block.
-	data        []byte
-	dataStorage []byte
-
-	// Destination of the decoded data.
-	dst []byte
-
-	// Buffer for literals data.
-	literalBuf []byte
-
-	// Window size of the block.
-	WindowSize uint64
-
-	err error
-
-	// Check against this crc, if hasCRC is true.
-	checkCRC uint32
-	hasCRC   bool
-
-	// Frame to use for singlethreaded decoding.
-	// Should not be used by the decoder itself since parent may be another frame.
-	localFrame *frameDec
-
-	sequence []seqVals
-
-	async struct {
-		newHist  *history
-		literals []byte
-		seqData  []byte
-		seqSize  int // Size of uncompressed sequences
-		fcs      uint64
-	}
-
-	// Block is RLE, this is the size.
-	RLESize uint32
-
-	Type blockType
-
-	// Is this the last block of a frame?
-	Last bool
-
-	// Use less memory
-	lowMem bool
-}
-
-func (b *blockDec) String() string {
-	if b == nil {
-		return "<nil>"
-	}
-	return fmt.Sprintf("Steam Size: %d, Type: %v, Last: %t, Window: %d", len(b.data), b.Type, b.Last, b.WindowSize)
-}
-
-func newBlockDec(lowMem bool) *blockDec {
-	b := blockDec{
-		lowMem: lowMem,
-	}
-	return &b
-}
-
-// reset will reset the block.
-// Input must be a start of a block and will be at the end of the block when returned.
-func (b *blockDec) reset(br byteBuffer, windowSize uint64) error {
-	b.WindowSize = windowSize
-	tmp, err := br.readSmall(3)
-	if err != nil {
-		println("Reading block header:", err)
-		return err
-	}
-	bh := uint32(tmp[0]) | (uint32(tmp[1]) << 8) | (uint32(tmp[2]) << 16)
-	b.Last = bh&1 != 0
-	b.Type = blockType((bh >> 1) & 3)
-	// find size.
-	cSize := int(bh >> 3)
-	maxSize := maxCompressedBlockSizeAlloc
-	switch b.Type {
-	case blockTypeReserved:
-		return ErrReservedBlockType
-	case blockTypeRLE:
-		if cSize > maxCompressedBlockSize || cSize > int(b.WindowSize) {
-			if debugDecoder {
-				printf("rle block too big: csize:%d block: %+v\n", uint64(cSize), b)
-			}
-			return ErrWindowSizeExceeded
-		}
-		b.RLESize = uint32(cSize)
-		if b.lowMem {
-			maxSize = cSize
-		}
-		cSize = 1
-	case blockTypeCompressed:
-		if debugDecoder {
-			println("Data size on stream:", cSize)
-		}
-		b.RLESize = 0
-		maxSize = maxCompressedBlockSizeAlloc
-		if windowSize < maxCompressedBlockSize && b.lowMem {
-			maxSize = int(windowSize) + compressedBlockOverAlloc
-		}
-		if cSize > maxCompressedBlockSize || uint64(cSize) > b.WindowSize {
-			if debugDecoder {
-				printf("compressed block too big: csize:%d block: %+v\n", uint64(cSize), b)
-			}
-			return ErrCompressedSizeTooBig
-		}
-		// Empty compressed blocks must at least be 2 bytes
-		// for Literals_Block_Type and one for Sequences_Section_Header.
-		if cSize < 2 {
-			return ErrBlockTooSmall
-		}
-	case blockTypeRaw:
-		if cSize > maxCompressedBlockSize || cSize > int(b.WindowSize) {
-			if debugDecoder {
-				printf("rle block too big: csize:%d block: %+v\n", uint64(cSize), b)
-			}
-			return ErrWindowSizeExceeded
-		}
-
-		b.RLESize = 0
-		// We do not need a destination for raw blocks.
-		maxSize = -1
-	default:
-		panic("Invalid block type")
-	}
-
-	// Read block data.
-	if _, ok := br.(*byteBuf); !ok && cap(b.dataStorage) < cSize {
-		// byteBuf doesn't need a destination buffer.
-		if b.lowMem || cSize > maxCompressedBlockSize {
-			b.dataStorage = make([]byte, 0, cSize+compressedBlockOverAlloc)
-		} else {
-			b.dataStorage = make([]byte, 0, maxCompressedBlockSizeAlloc)
-		}
-	}
-	b.data, err = br.readBig(cSize, b.dataStorage)
-	if err != nil {
-		if debugDecoder {
-			println("Reading block:", err, "(", cSize, ")", len(b.data))
-			printf("%T", br)
-		}
-		return err
-	}
-	if cap(b.dst) <= maxSize {
-		b.dst = make([]byte, 0, maxSize+1)
-	}
-	return nil
-}
-
-// sendEOF will make the decoder send EOF on this frame.
-func (b *blockDec) sendErr(err error) {
-	b.Last = true
-	b.Type = blockTypeReserved
-	b.err = err
-}
-
-// Close will release resources.
-// Closed blockDec cannot be reset.
-func (b *blockDec) Close() {
-}
-
-// decodeBuf
-func (b *blockDec) decodeBuf(hist *history) error {
-	switch b.Type {
-	case blockTypeRLE:
-		if cap(b.dst) < int(b.RLESize) {
-			if b.lowMem {
-				b.dst = make([]byte, b.RLESize)
-			} else {
-				b.dst = make([]byte, maxCompressedBlockSize)
-			}
-		}
-		b.dst = b.dst[:b.RLESize]
-		v := b.data[0]
-		for i := range b.dst {
-			b.dst[i] = v
-		}
-		hist.appendKeep(b.dst)
-		return nil
-	case blockTypeRaw:
-		hist.appendKeep(b.data)
-		return nil
-	case blockTypeCompressed:
-		saved := b.dst
-		// Append directly to history
-		if hist.ignoreBuffer == 0 {
-			b.dst = hist.b
-			hist.b = nil
-		} else {
-			b.dst = b.dst[:0]
-		}
-		err := b.decodeCompressed(hist)
-		if debugDecoder {
-			println("Decompressed to total", len(b.dst), "bytes, hash:", xxhash.Sum64(b.dst), "error:", err)
-		}
-		if hist.ignoreBuffer == 0 {
-			hist.b = b.dst
-			b.dst = saved
-		} else {
-			hist.appendKeep(b.dst)
-		}
-		return err
-	case blockTypeReserved:
-		// Used for returning errors.
-		return b.err
-	default:
-		panic("Invalid block type")
-	}
-}
-
-func (b *blockDec) decodeLiterals(in []byte, hist *history) (remain []byte, err error) {
-	// There must be at least one byte for Literals_Block_Type and one for Sequences_Section_Header
-	if len(in) < 2 {
-		return in, ErrBlockTooSmall
-	}
-
-	litType := literalsBlockType(in[0] & 3)
-	var litRegenSize int
-	var litCompSize int
-	sizeFormat := (in[0] >> 2) & 3
-	var fourStreams bool
-	var literals []byte
-	switch litType {
-	case literalsBlockRaw, literalsBlockRLE:
-		switch sizeFormat {
-		case 0, 2:
-			// Regenerated_Size uses 5 bits (0-31). Literals_Section_Header uses 1 byte.
-			litRegenSize = int(in[0] >> 3)
-			in = in[1:]
-		case 1:
-			// Regenerated_Size uses 12 bits (0-4095). Literals_Section_Header uses 2 bytes.
-			litRegenSize = int(in[0]>>4) + (int(in[1]) << 4)
-			in = in[2:]
-		case 3:
-			//  Regenerated_Size uses 20 bits (0-1048575). Literals_Section_Header uses 3 bytes.
-			if len(in) < 3 {
-				println("too small: litType:", litType, " sizeFormat", sizeFormat, len(in))
-				return in, ErrBlockTooSmall
-			}
-			litRegenSize = int(in[0]>>4) + (int(in[1]) << 4) + (int(in[2]) << 12)
-			in = in[3:]
-		}
-	case literalsBlockCompressed, literalsBlockTreeless:
-		switch sizeFormat {
-		case 0, 1:
-			// Both Regenerated_Size and Compressed_Size use 10 bits (0-1023).
-			if len(in) < 3 {
-				println("too small: litType:", litType, " sizeFormat", sizeFormat, len(in))
-				return in, ErrBlockTooSmall
-			}
-			n := uint64(in[0]>>4) + (uint64(in[1]) << 4) + (uint64(in[2]) << 12)
-			litRegenSize = int(n & 1023)
-			litCompSize = int(n >> 10)
-			fourStreams = sizeFormat == 1
-			in = in[3:]
-		case 2:
-			fourStreams = true
-			if len(in) < 4 {
-				println("too small: litType:", litType, " sizeFormat", sizeFormat, len(in))
-				return in, ErrBlockTooSmall
-			}
-			n := uint64(in[0]>>4) + (uint64(in[1]) << 4) + (uint64(in[2]) << 12) + (uint64(in[3]) << 20)
-			litRegenSize = int(n & 16383)
-			litCompSize = int(n >> 14)
-			in = in[4:]
-		case 3:
-			fourStreams = true
-			if len(in) < 5 {
-				println("too small: litType:", litType, " sizeFormat", sizeFormat, len(in))
-				return in, ErrBlockTooSmall
-			}
-			n := uint64(in[0]>>4) + (uint64(in[1]) << 4) + (uint64(in[2]) << 12) + (uint64(in[3]) << 20) + (uint64(in[4]) << 28)
-			litRegenSize = int(n & 262143)
-			litCompSize = int(n >> 18)
-			in = in[5:]
-		}
-	}
-	if debugDecoder {
-		println("literals type:", litType, "litRegenSize:", litRegenSize, "litCompSize:", litCompSize, "sizeFormat:", sizeFormat, "4X:", fourStreams)
-	}
-	if litRegenSize > int(b.WindowSize) || litRegenSize > maxCompressedBlockSize {
-		return in, ErrWindowSizeExceeded
-	}
-
-	switch litType {
-	case literalsBlockRaw:
-		if len(in) < litRegenSize {
-			println("too small: litType:", litType, " sizeFormat", sizeFormat, "remain:", len(in), "want:", litRegenSize)
-			return in, ErrBlockTooSmall
-		}
-		literals = in[:litRegenSize]
-		in = in[litRegenSize:]
-		//printf("Found %d uncompressed literals\n", litRegenSize)
-	case literalsBlockRLE:
-		if len(in) < 1 {
-			println("too small: litType:", litType, " sizeFormat", sizeFormat, "remain:", len(in), "want:", 1)
-			return in, ErrBlockTooSmall
-		}
-		if cap(b.literalBuf) < litRegenSize {
-			if b.lowMem {
-				b.literalBuf = make([]byte, litRegenSize, litRegenSize+compressedBlockOverAlloc)
-			} else {
-				b.literalBuf = make([]byte, litRegenSize, maxCompressedBlockSize+compressedBlockOverAlloc)
-			}
-		}
-		literals = b.literalBuf[:litRegenSize]
-		v := in[0]
-		for i := range literals {
-			literals[i] = v
-		}
-		in = in[1:]
-		if debugDecoder {
-			printf("Found %d RLE compressed literals\n", litRegenSize)
-		}
-	case literalsBlockTreeless:
-		if len(in) < litCompSize {
-			println("too small: litType:", litType, " sizeFormat", sizeFormat, "remain:", len(in), "want:", litCompSize)
-			return in, ErrBlockTooSmall
-		}
-		// Store compressed literals, so we defer decoding until we get history.
-		literals = in[:litCompSize]
-		in = in[litCompSize:]
-		if debugDecoder {
-			printf("Found %d compressed literals\n", litCompSize)
-		}
-		huff := hist.huffTree
-		if huff == nil {
-			return in, errors.New("literal block was treeless, but no history was defined")
-		}
-		// Ensure we have space to store it.
-		if cap(b.literalBuf) < litRegenSize {
-			if b.lowMem {
-				b.literalBuf = make([]byte, 0, litRegenSize+compressedBlockOverAlloc)
-			} else {
-				b.literalBuf = make([]byte, 0, maxCompressedBlockSize+compressedBlockOverAlloc)
-			}
-		}
-		var err error
-		// Use our out buffer.
-		huff.MaxDecodedSize = litRegenSize
-		if fourStreams {
-			literals, err = huff.Decoder().Decompress4X(b.literalBuf[:0:litRegenSize], literals)
-		} else {
-			literals, err = huff.Decoder().Decompress1X(b.literalBuf[:0:litRegenSize], literals)
-		}
-		// Make sure we don't leak our literals buffer
-		if err != nil {
-			println("decompressing literals:", err)
-			return in, err
-		}
-		if len(literals) != litRegenSize {
-			return in, fmt.Errorf("literal output size mismatch want %d, got %d", litRegenSize, len(literals))
-		}
-
-	case literalsBlockCompressed:
-		if len(in) < litCompSize {
-			println("too small: litType:", litType, " sizeFormat", sizeFormat, "remain:", len(in), "want:", litCompSize)
-			return in, ErrBlockTooSmall
-		}
-		literals = in[:litCompSize]
-		in = in[litCompSize:]
-		// Ensure we have space to store it.
-		if cap(b.literalBuf) < litRegenSize {
-			if b.lowMem {
-				b.literalBuf = make([]byte, 0, litRegenSize+compressedBlockOverAlloc)
-			} else {
-				b.literalBuf = make([]byte, 0, maxCompressedBlockSize+compressedBlockOverAlloc)
-			}
-		}
-		huff := hist.huffTree
-		if huff == nil || (hist.dict != nil && huff == hist.dict.litEnc) {
-			huff = huffDecoderPool.Get().(*huff0.Scratch)
-			if huff == nil {
-				huff = &huff0.Scratch{}
-			}
-		}
-		var err error
-		if debugDecoder {
-			println("huff table input:", len(literals), "CRC:", crc32.ChecksumIEEE(literals))
-		}
-		huff, literals, err = huff0.ReadTable(literals, huff)
-		if err != nil {
-			println("reading huffman table:", err)
-			return in, err
-		}
-		hist.huffTree = huff
-		huff.MaxDecodedSize = litRegenSize
-		// Use our out buffer.
-		if fourStreams {
-			literals, err = huff.Decoder().Decompress4X(b.literalBuf[:0:litRegenSize], literals)
-		} else {
-			literals, err = huff.Decoder().Decompress1X(b.literalBuf[:0:litRegenSize], literals)
-		}
-		if err != nil {
-			println("decoding compressed literals:", err)
-			return in, err
-		}
-		// Make sure we don't leak our literals buffer
-		if len(literals) != litRegenSize {
-			return in, fmt.Errorf("literal output size mismatch want %d, got %d", litRegenSize, len(literals))
-		}
-		// Re-cap to get extra size.
-		literals = b.literalBuf[:len(literals)]
-		if debugDecoder {
-			printf("Decompressed %d literals into %d bytes\n", litCompSize, litRegenSize)
-		}
-	}
-	hist.decoders.literals = literals
-	return in, nil
-}
-
-// decodeCompressed will start decompressing a block.
-func (b *blockDec) decodeCompressed(hist *history) error {
-	in := b.data
-	in, err := b.decodeLiterals(in, hist)
-	if err != nil {
-		return err
-	}
-	err = b.prepareSequences(in, hist)
-	if err != nil {
-		return err
-	}
-	if hist.decoders.nSeqs == 0 {
-		b.dst = append(b.dst, hist.decoders.literals...)
-		return nil
-	}
-	before := len(hist.decoders.out)
-	err = hist.decoders.decodeSync(hist.b[hist.ignoreBuffer:])
-	if err != nil {
-		return err
-	}
-	if hist.decoders.maxSyncLen > 0 {
-		hist.decoders.maxSyncLen += uint64(before)
-		hist.decoders.maxSyncLen -= uint64(len(hist.decoders.out))
-	}
-	b.dst = hist.decoders.out
-	hist.recentOffsets = hist.decoders.prevOffset
-	return nil
-}
-
-func (b *blockDec) prepareSequences(in []byte, hist *history) (err error) {
-	if debugDecoder {
-		printf("prepareSequences: %d byte(s) input\n", len(in))
-	}
-	// Decode Sequences
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#sequences-section
-	if len(in) < 1 {
-		return ErrBlockTooSmall
-	}
-	var nSeqs int
-	seqHeader := in[0]
-	switch {
-	case seqHeader < 128:
-		nSeqs = int(seqHeader)
-		in = in[1:]
-	case seqHeader < 255:
-		if len(in) < 2 {
-			return ErrBlockTooSmall
-		}
-		nSeqs = int(seqHeader-128)<<8 | int(in[1])
-		in = in[2:]
-	case seqHeader == 255:
-		if len(in) < 3 {
-			return ErrBlockTooSmall
-		}
-		nSeqs = 0x7f00 + int(in[1]) + (int(in[2]) << 8)
-		in = in[3:]
-	}
-	if nSeqs == 0 && len(in) != 0 {
-		// When no sequences, there should not be any more data...
-		if debugDecoder {
-			printf("prepareSequences: 0 sequences, but %d byte(s) left on stream\n", len(in))
-		}
-		return ErrUnexpectedBlockSize
-	}
-
-	var seqs = &hist.decoders
-	seqs.nSeqs = nSeqs
-	if nSeqs > 0 {
-		if len(in) < 1 {
-			return ErrBlockTooSmall
-		}
-		br := byteReader{b: in, off: 0}
-		compMode := br.Uint8()
-		br.advance(1)
-		if debugDecoder {
-			printf("Compression modes: 0b%b", compMode)
-		}
-		if compMode&3 != 0 {
-			return errors.New("corrupt block: reserved bits not zero")
-		}
-		for i := uint(0); i < 3; i++ {
-			mode := seqCompMode((compMode >> (6 - i*2)) & 3)
-			if debugDecoder {
-				println("Table", tableIndex(i), "is", mode)
-			}
-			var seq *sequenceDec
-			switch tableIndex(i) {
-			case tableLiteralLengths:
-				seq = &seqs.litLengths
-			case tableOffsets:
-				seq = &seqs.offsets
-			case tableMatchLengths:
-				seq = &seqs.matchLengths
-			default:
-				panic("unknown table")
-			}
-			switch mode {
-			case compModePredefined:
-				if seq.fse != nil && !seq.fse.preDefined {
-					fseDecoderPool.Put(seq.fse)
-				}
-				seq.fse = &fsePredef[i]
-			case compModeRLE:
-				if br.remain() < 1 {
-					return ErrBlockTooSmall
-				}
-				v := br.Uint8()
-				br.advance(1)
-				if seq.fse == nil || seq.fse.preDefined {
-					seq.fse = fseDecoderPool.Get().(*fseDecoder)
-				}
-				symb, err := decSymbolValue(v, symbolTableX[i])
-				if err != nil {
-					printf("RLE Transform table (%v) error: %v", tableIndex(i), err)
-					return err
-				}
-				seq.fse.setRLE(symb)
-				if debugDecoder {
-					printf("RLE set to 0x%x, code: %v", symb, v)
-				}
-			case compModeFSE:
-				if debugDecoder {
-					println("Reading table for", tableIndex(i))
-				}
-				if seq.fse == nil || seq.fse.preDefined {
-					seq.fse = fseDecoderPool.Get().(*fseDecoder)
-				}
-				err := seq.fse.readNCount(&br, uint16(maxTableSymbol[i]))
-				if err != nil {
-					println("Read table error:", err)
-					return err
-				}
-				err = seq.fse.transform(symbolTableX[i])
-				if err != nil {
-					println("Transform table error:", err)
-					return err
-				}
-				if debugDecoder {
-					println("Read table ok", "symbolLen:", seq.fse.symbolLen)
-				}
-			case compModeRepeat:
-				seq.repeat = true
-			}
-			if br.overread() {
-				return io.ErrUnexpectedEOF
-			}
-		}
-		in = br.unread()
-	}
-	if debugDecoder {
-		println("Literals:", len(seqs.literals), "hash:", xxhash.Sum64(seqs.literals), "and", seqs.nSeqs, "sequences.")
-	}
-
-	if nSeqs == 0 {
-		if len(b.sequence) > 0 {
-			b.sequence = b.sequence[:0]
-		}
-		return nil
-	}
-	br := seqs.br
-	if br == nil {
-		br = &bitReader{}
-	}
-	if err := br.init(in); err != nil {
-		return err
-	}
-
-	if err := seqs.initialize(br, hist, b.dst); err != nil {
-		println("initializing sequences:", err)
-		return err
-	}
-	// Extract blocks...
-	if false && hist.dict == nil {
-		fatalErr := func(err error) {
-			if err != nil {
-				panic(err)
-			}
-		}
-		fn := fmt.Sprintf("n-%d-lits-%d-prev-%d-%d-%d-win-%d.blk", hist.decoders.nSeqs, len(hist.decoders.literals), hist.recentOffsets[0], hist.recentOffsets[1], hist.recentOffsets[2], hist.windowSize)
-		var buf bytes.Buffer
-		fatalErr(binary.Write(&buf, binary.LittleEndian, hist.decoders.litLengths.fse))
-		fatalErr(binary.Write(&buf, binary.LittleEndian, hist.decoders.matchLengths.fse))
-		fatalErr(binary.Write(&buf, binary.LittleEndian, hist.decoders.offsets.fse))
-		buf.Write(in)
-		os.WriteFile(filepath.Join("testdata", "seqs", fn), buf.Bytes(), os.ModePerm)
-	}
-
-	return nil
-}
-
-func (b *blockDec) decodeSequences(hist *history) error {
-	if cap(b.sequence) < hist.decoders.nSeqs {
-		if b.lowMem {
-			b.sequence = make([]seqVals, 0, hist.decoders.nSeqs)
-		} else {
-			b.sequence = make([]seqVals, 0, 0x7F00+0xffff)
-		}
-	}
-	b.sequence = b.sequence[:hist.decoders.nSeqs]
-	if hist.decoders.nSeqs == 0 {
-		hist.decoders.seqSize = len(hist.decoders.literals)
-		return nil
-	}
-	hist.decoders.windowSize = hist.windowSize
-	hist.decoders.prevOffset = hist.recentOffsets
-
-	err := hist.decoders.decode(b.sequence)
-	hist.recentOffsets = hist.decoders.prevOffset
-	return err
-}
-
-func (b *blockDec) executeSequences(hist *history) error {
-	hbytes := hist.b
-	if len(hbytes) > hist.windowSize {
-		hbytes = hbytes[len(hbytes)-hist.windowSize:]
-		// We do not need history anymore.
-		if hist.dict != nil {
-			hist.dict.content = nil
-		}
-	}
-	hist.decoders.windowSize = hist.windowSize
-	hist.decoders.out = b.dst[:0]
-	err := hist.decoders.execute(b.sequence, hbytes)
-	if err != nil {
-		return err
-	}
-	return b.updateHistory(hist)
-}
-
-func (b *blockDec) updateHistory(hist *history) error {
-	if len(b.data) > maxCompressedBlockSize {
-		return fmt.Errorf("compressed block size too large (%d)", len(b.data))
-	}
-	// Set output and release references.
-	b.dst = hist.decoders.out
-	hist.recentOffsets = hist.decoders.prevOffset
-
-	if b.Last {
-		// if last block we don't care about history.
-		println("Last block, no history returned")
-		hist.b = hist.b[:0]
-		return nil
-	} else {
-		hist.append(b.dst)
-		if debugDecoder {
-			println("Finished block with ", len(b.sequence), "sequences. Added", len(b.dst), "to history, now length", len(hist.b))
-		}
-	}
-	hist.decoders.out, hist.decoders.literals = nil, nil
-
-	return nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/blockenc.go b/vendor/github.com/klauspost/compress/zstd/blockenc.go
deleted file mode 100644
index 32a7f401d..000000000
--- a/vendor/github.com/klauspost/compress/zstd/blockenc.go
+++ /dev/null
@@ -1,909 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"errors"
-	"fmt"
-	"math"
-	"math/bits"
-
-	"github.com/klauspost/compress/huff0"
-)
-
-type blockEnc struct {
-	size       int
-	literals   []byte
-	sequences  []seq
-	coders     seqCoders
-	litEnc     *huff0.Scratch
-	dictLitEnc *huff0.Scratch
-	wr         bitWriter
-
-	extraLits         int
-	output            []byte
-	recentOffsets     [3]uint32
-	prevRecentOffsets [3]uint32
-
-	last   bool
-	lowMem bool
-}
-
-// init should be used once the block has been created.
-// If called more than once, the effect is the same as calling reset.
-func (b *blockEnc) init() {
-	if b.lowMem {
-		// 1K literals
-		if cap(b.literals) < 1<<10 {
-			b.literals = make([]byte, 0, 1<<10)
-		}
-		const defSeqs = 20
-		if cap(b.sequences) < defSeqs {
-			b.sequences = make([]seq, 0, defSeqs)
-		}
-		// 1K
-		if cap(b.output) < 1<<10 {
-			b.output = make([]byte, 0, 1<<10)
-		}
-	} else {
-		if cap(b.literals) < maxCompressedBlockSize {
-			b.literals = make([]byte, 0, maxCompressedBlockSize)
-		}
-		const defSeqs = 2000
-		if cap(b.sequences) < defSeqs {
-			b.sequences = make([]seq, 0, defSeqs)
-		}
-		if cap(b.output) < maxCompressedBlockSize {
-			b.output = make([]byte, 0, maxCompressedBlockSize)
-		}
-	}
-
-	if b.coders.mlEnc == nil {
-		b.coders.mlEnc = &fseEncoder{}
-		b.coders.mlPrev = &fseEncoder{}
-		b.coders.ofEnc = &fseEncoder{}
-		b.coders.ofPrev = &fseEncoder{}
-		b.coders.llEnc = &fseEncoder{}
-		b.coders.llPrev = &fseEncoder{}
-	}
-	b.litEnc = &huff0.Scratch{WantLogLess: 4}
-	b.reset(nil)
-}
-
-// initNewEncode can be used to reset offsets and encoders to the initial state.
-func (b *blockEnc) initNewEncode() {
-	b.recentOffsets = [3]uint32{1, 4, 8}
-	b.litEnc.Reuse = huff0.ReusePolicyNone
-	b.coders.setPrev(nil, nil, nil)
-}
-
-// reset will reset the block for a new encode, but in the same stream,
-// meaning that state will be carried over, but the block content is reset.
-// If a previous block is provided, the recent offsets are carried over.
-func (b *blockEnc) reset(prev *blockEnc) {
-	b.extraLits = 0
-	b.literals = b.literals[:0]
-	b.size = 0
-	b.sequences = b.sequences[:0]
-	b.output = b.output[:0]
-	b.last = false
-	if prev != nil {
-		b.recentOffsets = prev.prevRecentOffsets
-	}
-	b.dictLitEnc = nil
-}
-
-// reset will reset the block for a new encode, but in the same stream,
-// meaning that state will be carried over, but the block content is reset.
-// If a previous block is provided, the recent offsets are carried over.
-func (b *blockEnc) swapEncoders(prev *blockEnc) {
-	b.coders.swap(&prev.coders)
-	b.litEnc, prev.litEnc = prev.litEnc, b.litEnc
-}
-
-// blockHeader contains the information for a block header.
-type blockHeader uint32
-
-// setLast sets the 'last' indicator on a block.
-func (h *blockHeader) setLast(b bool) {
-	if b {
-		*h = *h | 1
-	} else {
-		const mask = (1 << 24) - 2
-		*h = *h & mask
-	}
-}
-
-// setSize will store the compressed size of a block.
-func (h *blockHeader) setSize(v uint32) {
-	const mask = 7
-	*h = (*h)&mask | blockHeader(v<<3)
-}
-
-// setType sets the block type.
-func (h *blockHeader) setType(t blockType) {
-	const mask = 1 | (((1 << 24) - 1) ^ 7)
-	*h = (*h & mask) | blockHeader(t<<1)
-}
-
-// appendTo will append the block header to a slice.
-func (h blockHeader) appendTo(b []byte) []byte {
-	return append(b, uint8(h), uint8(h>>8), uint8(h>>16))
-}
-
-// String returns a string representation of the block.
-func (h blockHeader) String() string {
-	return fmt.Sprintf("Type: %d, Size: %d, Last:%t", (h>>1)&3, h>>3, h&1 == 1)
-}
-
-// literalsHeader contains literals header information.
-type literalsHeader uint64
-
-// setType can be used to set the type of literal block.
-func (h *literalsHeader) setType(t literalsBlockType) {
-	const mask = math.MaxUint64 - 3
-	*h = (*h & mask) | literalsHeader(t)
-}
-
-// setSize can be used to set a single size, for uncompressed and RLE content.
-func (h *literalsHeader) setSize(regenLen int) {
-	inBits := bits.Len32(uint32(regenLen))
-	// Only retain 2 bits
-	const mask = 3
-	lh := uint64(*h & mask)
-	switch {
-	case inBits < 5:
-		lh |= (uint64(regenLen) << 3) | (1 << 60)
-		if debugEncoder {
-			got := int(lh>>3) & 0xff
-			if got != regenLen {
-				panic(fmt.Sprint("litRegenSize = ", regenLen, "(want) != ", got, "(got)"))
-			}
-		}
-	case inBits < 12:
-		lh |= (1 << 2) | (uint64(regenLen) << 4) | (2 << 60)
-	case inBits < 20:
-		lh |= (3 << 2) | (uint64(regenLen) << 4) | (3 << 60)
-	default:
-		panic(fmt.Errorf("internal error: block too big (%d)", regenLen))
-	}
-	*h = literalsHeader(lh)
-}
-
-// setSizes will set the size of a compressed literals section and the input length.
-func (h *literalsHeader) setSizes(compLen, inLen int, single bool) {
-	compBits, inBits := bits.Len32(uint32(compLen)), bits.Len32(uint32(inLen))
-	// Only retain 2 bits
-	const mask = 3
-	lh := uint64(*h & mask)
-	switch {
-	case compBits <= 10 && inBits <= 10:
-		if !single {
-			lh |= 1 << 2
-		}
-		lh |= (uint64(inLen) << 4) | (uint64(compLen) << (10 + 4)) | (3 << 60)
-		if debugEncoder {
-			const mmask = (1 << 24) - 1
-			n := (lh >> 4) & mmask
-			if int(n&1023) != inLen {
-				panic(fmt.Sprint("regensize:", int(n&1023), "!=", inLen, inBits))
-			}
-			if int(n>>10) != compLen {
-				panic(fmt.Sprint("compsize:", int(n>>10), "!=", compLen, compBits))
-			}
-		}
-	case compBits <= 14 && inBits <= 14:
-		lh |= (2 << 2) | (uint64(inLen) << 4) | (uint64(compLen) << (14 + 4)) | (4 << 60)
-		if single {
-			panic("single stream used with more than 10 bits length.")
-		}
-	case compBits <= 18 && inBits <= 18:
-		lh |= (3 << 2) | (uint64(inLen) << 4) | (uint64(compLen) << (18 + 4)) | (5 << 60)
-		if single {
-			panic("single stream used with more than 10 bits length.")
-		}
-	default:
-		panic("internal error: block too big")
-	}
-	*h = literalsHeader(lh)
-}
-
-// appendTo will append the literals header to a byte slice.
-func (h literalsHeader) appendTo(b []byte) []byte {
-	size := uint8(h >> 60)
-	switch size {
-	case 1:
-		b = append(b, uint8(h))
-	case 2:
-		b = append(b, uint8(h), uint8(h>>8))
-	case 3:
-		b = append(b, uint8(h), uint8(h>>8), uint8(h>>16))
-	case 4:
-		b = append(b, uint8(h), uint8(h>>8), uint8(h>>16), uint8(h>>24))
-	case 5:
-		b = append(b, uint8(h), uint8(h>>8), uint8(h>>16), uint8(h>>24), uint8(h>>32))
-	default:
-		panic(fmt.Errorf("internal error: literalsHeader has invalid size (%d)", size))
-	}
-	return b
-}
-
-// size returns the output size with currently set values.
-func (h literalsHeader) size() int {
-	return int(h >> 60)
-}
-
-func (h literalsHeader) String() string {
-	return fmt.Sprintf("Type: %d, SizeFormat: %d, Size: 0x%d, Bytes:%d", literalsBlockType(h&3), (h>>2)&3, h&((1<<60)-1)>>4, h>>60)
-}
-
-// pushOffsets will push the recent offsets to the backup store.
-func (b *blockEnc) pushOffsets() {
-	b.prevRecentOffsets = b.recentOffsets
-}
-
-// pushOffsets will push the recent offsets to the backup store.
-func (b *blockEnc) popOffsets() {
-	b.recentOffsets = b.prevRecentOffsets
-}
-
-// matchOffset will adjust recent offsets and return the adjusted one,
-// if it matches a previous offset.
-func (b *blockEnc) matchOffset(offset, lits uint32) uint32 {
-	// Check if offset is one of the recent offsets.
-	// Adjusts the output offset accordingly.
-	// Gives a tiny bit of compression, typically around 1%.
-	if true {
-		if lits > 0 {
-			switch offset {
-			case b.recentOffsets[0]:
-				offset = 1
-			case b.recentOffsets[1]:
-				b.recentOffsets[1] = b.recentOffsets[0]
-				b.recentOffsets[0] = offset
-				offset = 2
-			case b.recentOffsets[2]:
-				b.recentOffsets[2] = b.recentOffsets[1]
-				b.recentOffsets[1] = b.recentOffsets[0]
-				b.recentOffsets[0] = offset
-				offset = 3
-			default:
-				b.recentOffsets[2] = b.recentOffsets[1]
-				b.recentOffsets[1] = b.recentOffsets[0]
-				b.recentOffsets[0] = offset
-				offset += 3
-			}
-		} else {
-			switch offset {
-			case b.recentOffsets[1]:
-				b.recentOffsets[1] = b.recentOffsets[0]
-				b.recentOffsets[0] = offset
-				offset = 1
-			case b.recentOffsets[2]:
-				b.recentOffsets[2] = b.recentOffsets[1]
-				b.recentOffsets[1] = b.recentOffsets[0]
-				b.recentOffsets[0] = offset
-				offset = 2
-			case b.recentOffsets[0] - 1:
-				b.recentOffsets[2] = b.recentOffsets[1]
-				b.recentOffsets[1] = b.recentOffsets[0]
-				b.recentOffsets[0] = offset
-				offset = 3
-			default:
-				b.recentOffsets[2] = b.recentOffsets[1]
-				b.recentOffsets[1] = b.recentOffsets[0]
-				b.recentOffsets[0] = offset
-				offset += 3
-			}
-		}
-	} else {
-		offset += 3
-	}
-	return offset
-}
-
-// encodeRaw can be used to set the output to a raw representation of supplied bytes.
-func (b *blockEnc) encodeRaw(a []byte) {
-	var bh blockHeader
-	bh.setLast(b.last)
-	bh.setSize(uint32(len(a)))
-	bh.setType(blockTypeRaw)
-	b.output = bh.appendTo(b.output[:0])
-	b.output = append(b.output, a...)
-	if debugEncoder {
-		println("Adding RAW block, length", len(a), "last:", b.last)
-	}
-}
-
-// encodeRaw can be used to set the output to a raw representation of supplied bytes.
-func (b *blockEnc) encodeRawTo(dst, src []byte) []byte {
-	var bh blockHeader
-	bh.setLast(b.last)
-	bh.setSize(uint32(len(src)))
-	bh.setType(blockTypeRaw)
-	dst = bh.appendTo(dst)
-	dst = append(dst, src...)
-	if debugEncoder {
-		println("Adding RAW block, length", len(src), "last:", b.last)
-	}
-	return dst
-}
-
-// encodeLits can be used if the block is only litLen.
-func (b *blockEnc) encodeLits(lits []byte, raw bool) error {
-	var bh blockHeader
-	bh.setLast(b.last)
-	bh.setSize(uint32(len(lits)))
-
-	// Don't compress extremely small blocks
-	if len(lits) < 8 || (len(lits) < 32 && b.dictLitEnc == nil) || raw {
-		if debugEncoder {
-			println("Adding RAW block, length", len(lits), "last:", b.last)
-		}
-		bh.setType(blockTypeRaw)
-		b.output = bh.appendTo(b.output)
-		b.output = append(b.output, lits...)
-		return nil
-	}
-
-	var (
-		out            []byte
-		reUsed, single bool
-		err            error
-	)
-	if b.dictLitEnc != nil {
-		b.litEnc.TransferCTable(b.dictLitEnc)
-		b.litEnc.Reuse = huff0.ReusePolicyAllow
-		b.dictLitEnc = nil
-	}
-	if len(lits) >= 1024 {
-		// Use 4 Streams.
-		out, reUsed, err = huff0.Compress4X(lits, b.litEnc)
-	} else if len(lits) > 16 {
-		// Use 1 stream
-		single = true
-		out, reUsed, err = huff0.Compress1X(lits, b.litEnc)
-	} else {
-		err = huff0.ErrIncompressible
-	}
-	if err == nil && len(out)+5 > len(lits) {
-		// If we are close, we may still be worse or equal to raw.
-		var lh literalsHeader
-		lh.setSizes(len(out), len(lits), single)
-		if len(out)+lh.size() >= len(lits) {
-			err = huff0.ErrIncompressible
-		}
-	}
-	switch err {
-	case huff0.ErrIncompressible:
-		if debugEncoder {
-			println("Adding RAW block, length", len(lits), "last:", b.last)
-		}
-		bh.setType(blockTypeRaw)
-		b.output = bh.appendTo(b.output)
-		b.output = append(b.output, lits...)
-		return nil
-	case huff0.ErrUseRLE:
-		if debugEncoder {
-			println("Adding RLE block, length", len(lits))
-		}
-		bh.setType(blockTypeRLE)
-		b.output = bh.appendTo(b.output)
-		b.output = append(b.output, lits[0])
-		return nil
-	case nil:
-	default:
-		return err
-	}
-	// Compressed...
-	// Now, allow reuse
-	b.litEnc.Reuse = huff0.ReusePolicyAllow
-	bh.setType(blockTypeCompressed)
-	var lh literalsHeader
-	if reUsed {
-		if debugEncoder {
-			println("Reused tree, compressed to", len(out))
-		}
-		lh.setType(literalsBlockTreeless)
-	} else {
-		if debugEncoder {
-			println("New tree, compressed to", len(out), "tree size:", len(b.litEnc.OutTable))
-		}
-		lh.setType(literalsBlockCompressed)
-	}
-	// Set sizes
-	lh.setSizes(len(out), len(lits), single)
-	bh.setSize(uint32(len(out) + lh.size() + 1))
-
-	// Write block headers.
-	b.output = bh.appendTo(b.output)
-	b.output = lh.appendTo(b.output)
-	// Add compressed data.
-	b.output = append(b.output, out...)
-	// No sequences.
-	b.output = append(b.output, 0)
-	return nil
-}
-
-// encodeRLE will encode an RLE block.
-func (b *blockEnc) encodeRLE(val byte, length uint32) {
-	var bh blockHeader
-	bh.setLast(b.last)
-	bh.setSize(length)
-	bh.setType(blockTypeRLE)
-	b.output = bh.appendTo(b.output)
-	b.output = append(b.output, val)
-}
-
-// fuzzFseEncoder can be used to fuzz the FSE encoder.
-func fuzzFseEncoder(data []byte) int {
-	if len(data) > maxSequences || len(data) < 2 {
-		return 0
-	}
-	enc := fseEncoder{}
-	hist := enc.Histogram()
-	maxSym := uint8(0)
-	for i, v := range data {
-		v = v & 63
-		data[i] = v
-		hist[v]++
-		if v > maxSym {
-			maxSym = v
-		}
-	}
-	if maxSym == 0 {
-		// All 0
-		return 0
-	}
-	maxCount := func(a []uint32) int {
-		var max uint32
-		for _, v := range a {
-			if v > max {
-				max = v
-			}
-		}
-		return int(max)
-	}
-	cnt := maxCount(hist[:maxSym])
-	if cnt == len(data) {
-		// RLE
-		return 0
-	}
-	enc.HistogramFinished(maxSym, cnt)
-	err := enc.normalizeCount(len(data))
-	if err != nil {
-		return 0
-	}
-	_, err = enc.writeCount(nil)
-	if err != nil {
-		panic(err)
-	}
-	return 1
-}
-
-// encode will encode the block and append the output in b.output.
-// Previous offset codes must be pushed if more blocks are expected.
-func (b *blockEnc) encode(org []byte, raw, rawAllLits bool) error {
-	if len(b.sequences) == 0 {
-		return b.encodeLits(b.literals, rawAllLits)
-	}
-	if len(b.sequences) == 1 && len(org) > 0 && len(b.literals) <= 1 {
-		// Check common RLE cases.
-		seq := b.sequences[0]
-		if seq.litLen == uint32(len(b.literals)) && seq.offset-3 == 1 {
-			// Offset == 1 and 0 or 1 literals.
-			b.encodeRLE(org[0], b.sequences[0].matchLen+zstdMinMatch+seq.litLen)
-			return nil
-		}
-	}
-
-	// We want some difference to at least account for the headers.
-	saved := b.size - len(b.literals) - (b.size >> 6)
-	if saved < 16 {
-		if org == nil {
-			return errIncompressible
-		}
-		b.popOffsets()
-		return b.encodeLits(org, rawAllLits)
-	}
-
-	var bh blockHeader
-	var lh literalsHeader
-	bh.setLast(b.last)
-	bh.setType(blockTypeCompressed)
-	// Store offset of the block header. Needed when we know the size.
-	bhOffset := len(b.output)
-	b.output = bh.appendTo(b.output)
-
-	var (
-		out            []byte
-		reUsed, single bool
-		err            error
-	)
-	if b.dictLitEnc != nil {
-		b.litEnc.TransferCTable(b.dictLitEnc)
-		b.litEnc.Reuse = huff0.ReusePolicyAllow
-		b.dictLitEnc = nil
-	}
-	if len(b.literals) >= 1024 && !raw {
-		// Use 4 Streams.
-		out, reUsed, err = huff0.Compress4X(b.literals, b.litEnc)
-	} else if len(b.literals) > 16 && !raw {
-		// Use 1 stream
-		single = true
-		out, reUsed, err = huff0.Compress1X(b.literals, b.litEnc)
-	} else {
-		err = huff0.ErrIncompressible
-	}
-
-	if err == nil && len(out)+5 > len(b.literals) {
-		// If we are close, we may still be worse or equal to raw.
-		var lh literalsHeader
-		lh.setSize(len(b.literals))
-		szRaw := lh.size()
-		lh.setSizes(len(out), len(b.literals), single)
-		szComp := lh.size()
-		if len(out)+szComp >= len(b.literals)+szRaw {
-			err = huff0.ErrIncompressible
-		}
-	}
-	switch err {
-	case huff0.ErrIncompressible:
-		lh.setType(literalsBlockRaw)
-		lh.setSize(len(b.literals))
-		b.output = lh.appendTo(b.output)
-		b.output = append(b.output, b.literals...)
-		if debugEncoder {
-			println("Adding literals RAW, length", len(b.literals))
-		}
-	case huff0.ErrUseRLE:
-		lh.setType(literalsBlockRLE)
-		lh.setSize(len(b.literals))
-		b.output = lh.appendTo(b.output)
-		b.output = append(b.output, b.literals[0])
-		if debugEncoder {
-			println("Adding literals RLE")
-		}
-	case nil:
-		// Compressed litLen...
-		if reUsed {
-			if debugEncoder {
-				println("reused tree")
-			}
-			lh.setType(literalsBlockTreeless)
-		} else {
-			if debugEncoder {
-				println("new tree, size:", len(b.litEnc.OutTable))
-			}
-			lh.setType(literalsBlockCompressed)
-			if debugEncoder {
-				_, _, err := huff0.ReadTable(out, nil)
-				if err != nil {
-					panic(err)
-				}
-			}
-		}
-		lh.setSizes(len(out), len(b.literals), single)
-		if debugEncoder {
-			printf("Compressed %d literals to %d bytes", len(b.literals), len(out))
-			println("Adding literal header:", lh)
-		}
-		b.output = lh.appendTo(b.output)
-		b.output = append(b.output, out...)
-		b.litEnc.Reuse = huff0.ReusePolicyAllow
-		if debugEncoder {
-			println("Adding literals compressed")
-		}
-	default:
-		if debugEncoder {
-			println("Adding literals ERROR:", err)
-		}
-		return err
-	}
-	// Sequence compression
-
-	// Write the number of sequences
-	switch {
-	case len(b.sequences) < 128:
-		b.output = append(b.output, uint8(len(b.sequences)))
-	case len(b.sequences) < 0x7f00: // TODO: this could be wrong
-		n := len(b.sequences)
-		b.output = append(b.output, 128+uint8(n>>8), uint8(n))
-	default:
-		n := len(b.sequences) - 0x7f00
-		b.output = append(b.output, 255, uint8(n), uint8(n>>8))
-	}
-	if debugEncoder {
-		println("Encoding", len(b.sequences), "sequences")
-	}
-	b.genCodes()
-	llEnc := b.coders.llEnc
-	ofEnc := b.coders.ofEnc
-	mlEnc := b.coders.mlEnc
-	err = llEnc.normalizeCount(len(b.sequences))
-	if err != nil {
-		return err
-	}
-	err = ofEnc.normalizeCount(len(b.sequences))
-	if err != nil {
-		return err
-	}
-	err = mlEnc.normalizeCount(len(b.sequences))
-	if err != nil {
-		return err
-	}
-
-	// Choose the best compression mode for each type.
-	// Will evaluate the new vs predefined and previous.
-	chooseComp := func(cur, prev, preDef *fseEncoder) (*fseEncoder, seqCompMode) {
-		// See if predefined/previous is better
-		hist := cur.count[:cur.symbolLen]
-		nSize := cur.approxSize(hist) + cur.maxHeaderSize()
-		predefSize := preDef.approxSize(hist)
-		prevSize := prev.approxSize(hist)
-
-		// Add a small penalty for new encoders.
-		// Don't bother with extremely small (<2 byte gains).
-		nSize = nSize + (nSize+2*8*16)>>4
-		switch {
-		case predefSize <= prevSize && predefSize <= nSize || forcePreDef:
-			if debugEncoder {
-				println("Using predefined", predefSize>>3, "<=", nSize>>3)
-			}
-			return preDef, compModePredefined
-		case prevSize <= nSize:
-			if debugEncoder {
-				println("Using previous", prevSize>>3, "<=", nSize>>3)
-			}
-			return prev, compModeRepeat
-		default:
-			if debugEncoder {
-				println("Using new, predef", predefSize>>3, ". previous:", prevSize>>3, ">", nSize>>3, "header max:", cur.maxHeaderSize()>>3, "bytes")
-				println("tl:", cur.actualTableLog, "symbolLen:", cur.symbolLen, "norm:", cur.norm[:cur.symbolLen], "hist", cur.count[:cur.symbolLen])
-			}
-			return cur, compModeFSE
-		}
-	}
-
-	// Write compression mode
-	var mode uint8
-	if llEnc.useRLE {
-		mode |= uint8(compModeRLE) << 6
-		llEnc.setRLE(b.sequences[0].llCode)
-		if debugEncoder {
-			println("llEnc.useRLE")
-		}
-	} else {
-		var m seqCompMode
-		llEnc, m = chooseComp(llEnc, b.coders.llPrev, &fsePredefEnc[tableLiteralLengths])
-		mode |= uint8(m) << 6
-	}
-	if ofEnc.useRLE {
-		mode |= uint8(compModeRLE) << 4
-		ofEnc.setRLE(b.sequences[0].ofCode)
-		if debugEncoder {
-			println("ofEnc.useRLE")
-		}
-	} else {
-		var m seqCompMode
-		ofEnc, m = chooseComp(ofEnc, b.coders.ofPrev, &fsePredefEnc[tableOffsets])
-		mode |= uint8(m) << 4
-	}
-
-	if mlEnc.useRLE {
-		mode |= uint8(compModeRLE) << 2
-		mlEnc.setRLE(b.sequences[0].mlCode)
-		if debugEncoder {
-			println("mlEnc.useRLE, code: ", b.sequences[0].mlCode, "value", b.sequences[0].matchLen)
-		}
-	} else {
-		var m seqCompMode
-		mlEnc, m = chooseComp(mlEnc, b.coders.mlPrev, &fsePredefEnc[tableMatchLengths])
-		mode |= uint8(m) << 2
-	}
-	b.output = append(b.output, mode)
-	if debugEncoder {
-		printf("Compression modes: 0b%b", mode)
-	}
-	b.output, err = llEnc.writeCount(b.output)
-	if err != nil {
-		return err
-	}
-	start := len(b.output)
-	b.output, err = ofEnc.writeCount(b.output)
-	if err != nil {
-		return err
-	}
-	if false {
-		println("block:", b.output[start:], "tablelog", ofEnc.actualTableLog, "maxcount:", ofEnc.maxCount)
-		fmt.Printf("selected TableLog: %d, Symbol length: %d\n", ofEnc.actualTableLog, ofEnc.symbolLen)
-		for i, v := range ofEnc.norm[:ofEnc.symbolLen] {
-			fmt.Printf("%3d: %5d -> %4d \n", i, ofEnc.count[i], v)
-		}
-	}
-	b.output, err = mlEnc.writeCount(b.output)
-	if err != nil {
-		return err
-	}
-
-	// Maybe in block?
-	wr := &b.wr
-	wr.reset(b.output)
-
-	var ll, of, ml cState
-
-	// Current sequence
-	seq := len(b.sequences) - 1
-	s := b.sequences[seq]
-	llEnc.setBits(llBitsTable[:])
-	mlEnc.setBits(mlBitsTable[:])
-	ofEnc.setBits(nil)
-
-	llTT, ofTT, mlTT := llEnc.ct.symbolTT[:256], ofEnc.ct.symbolTT[:256], mlEnc.ct.symbolTT[:256]
-
-	// We have 3 bounds checks here (and in the loop).
-	// Since we are iterating backwards it is kinda hard to avoid.
-	llB, ofB, mlB := llTT[s.llCode], ofTT[s.ofCode], mlTT[s.mlCode]
-	ll.init(wr, &llEnc.ct, llB)
-	of.init(wr, &ofEnc.ct, ofB)
-	wr.flush32()
-	ml.init(wr, &mlEnc.ct, mlB)
-
-	// Each of these lookups also generates a bounds check.
-	wr.addBits32NC(s.litLen, llB.outBits)
-	wr.addBits32NC(s.matchLen, mlB.outBits)
-	wr.flush32()
-	wr.addBits32NC(s.offset, ofB.outBits)
-	if debugSequences {
-		println("Encoded seq", seq, s, "codes:", s.llCode, s.mlCode, s.ofCode, "states:", ll.state, ml.state, of.state, "bits:", llB, mlB, ofB)
-	}
-	seq--
-	// Store sequences in reverse...
-	for seq >= 0 {
-		s = b.sequences[seq]
-
-		ofB := ofTT[s.ofCode]
-		wr.flush32() // tablelog max is below 8 for each, so it will fill max 24 bits.
-		//of.encode(ofB)
-		nbBitsOut := (uint32(of.state) + ofB.deltaNbBits) >> 16
-		dstState := int32(of.state>>(nbBitsOut&15)) + int32(ofB.deltaFindState)
-		wr.addBits16NC(of.state, uint8(nbBitsOut))
-		of.state = of.stateTable[dstState]
-
-		// Accumulate extra bits.
-		outBits := ofB.outBits & 31
-		extraBits := uint64(s.offset & bitMask32[outBits])
-		extraBitsN := outBits
-
-		mlB := mlTT[s.mlCode]
-		//ml.encode(mlB)
-		nbBitsOut = (uint32(ml.state) + mlB.deltaNbBits) >> 16
-		dstState = int32(ml.state>>(nbBitsOut&15)) + int32(mlB.deltaFindState)
-		wr.addBits16NC(ml.state, uint8(nbBitsOut))
-		ml.state = ml.stateTable[dstState]
-
-		outBits = mlB.outBits & 31
-		extraBits = extraBits<<outBits | uint64(s.matchLen&bitMask32[outBits])
-		extraBitsN += outBits
-
-		llB := llTT[s.llCode]
-		//ll.encode(llB)
-		nbBitsOut = (uint32(ll.state) + llB.deltaNbBits) >> 16
-		dstState = int32(ll.state>>(nbBitsOut&15)) + int32(llB.deltaFindState)
-		wr.addBits16NC(ll.state, uint8(nbBitsOut))
-		ll.state = ll.stateTable[dstState]
-
-		outBits = llB.outBits & 31
-		extraBits = extraBits<<outBits | uint64(s.litLen&bitMask32[outBits])
-		extraBitsN += outBits
-
-		wr.flush32()
-		wr.addBits64NC(extraBits, extraBitsN)
-
-		if debugSequences {
-			println("Encoded seq", seq, s)
-		}
-
-		seq--
-	}
-	ml.flush(mlEnc.actualTableLog)
-	of.flush(ofEnc.actualTableLog)
-	ll.flush(llEnc.actualTableLog)
-	wr.close()
-	b.output = wr.out
-
-	// Maybe even add a bigger margin.
-	if len(b.output)-3-bhOffset >= b.size {
-		// Discard and encode as raw block.
-		b.output = b.encodeRawTo(b.output[:bhOffset], org)
-		b.popOffsets()
-		b.litEnc.Reuse = huff0.ReusePolicyNone
-		return nil
-	}
-
-	// Size is output minus block header.
-	bh.setSize(uint32(len(b.output)-bhOffset) - 3)
-	if debugEncoder {
-		println("Rewriting block header", bh)
-	}
-	_ = bh.appendTo(b.output[bhOffset:bhOffset])
-	b.coders.setPrev(llEnc, mlEnc, ofEnc)
-	return nil
-}
-
-var errIncompressible = errors.New("incompressible")
-
-func (b *blockEnc) genCodes() {
-	if len(b.sequences) == 0 {
-		// nothing to do
-		return
-	}
-	if len(b.sequences) > math.MaxUint16 {
-		panic("can only encode up to 64K sequences")
-	}
-	// No bounds checks after here:
-	llH := b.coders.llEnc.Histogram()
-	ofH := b.coders.ofEnc.Histogram()
-	mlH := b.coders.mlEnc.Histogram()
-	for i := range llH {
-		llH[i] = 0
-	}
-	for i := range ofH {
-		ofH[i] = 0
-	}
-	for i := range mlH {
-		mlH[i] = 0
-	}
-
-	var llMax, ofMax, mlMax uint8
-	for i := range b.sequences {
-		seq := &b.sequences[i]
-		v := llCode(seq.litLen)
-		seq.llCode = v
-		llH[v]++
-		if v > llMax {
-			llMax = v
-		}
-
-		v = ofCode(seq.offset)
-		seq.ofCode = v
-		ofH[v]++
-		if v > ofMax {
-			ofMax = v
-		}
-
-		v = mlCode(seq.matchLen)
-		seq.mlCode = v
-		mlH[v]++
-		if v > mlMax {
-			mlMax = v
-			if debugAsserts && mlMax > maxMatchLengthSymbol {
-				panic(fmt.Errorf("mlMax > maxMatchLengthSymbol (%d), matchlen: %d", mlMax, seq.matchLen))
-			}
-		}
-	}
-	maxCount := func(a []uint32) int {
-		var max uint32
-		for _, v := range a {
-			if v > max {
-				max = v
-			}
-		}
-		return int(max)
-	}
-	if debugAsserts && mlMax > maxMatchLengthSymbol {
-		panic(fmt.Errorf("mlMax > maxMatchLengthSymbol (%d)", mlMax))
-	}
-	if debugAsserts && ofMax > maxOffsetBits {
-		panic(fmt.Errorf("ofMax > maxOffsetBits (%d)", ofMax))
-	}
-	if debugAsserts && llMax > maxLiteralLengthSymbol {
-		panic(fmt.Errorf("llMax > maxLiteralLengthSymbol (%d)", llMax))
-	}
-
-	b.coders.mlEnc.HistogramFinished(mlMax, maxCount(mlH[:mlMax+1]))
-	b.coders.ofEnc.HistogramFinished(ofMax, maxCount(ofH[:ofMax+1]))
-	b.coders.llEnc.HistogramFinished(llMax, maxCount(llH[:llMax+1]))
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/blocktype_string.go b/vendor/github.com/klauspost/compress/zstd/blocktype_string.go
deleted file mode 100644
index 01a01e486..000000000
--- a/vendor/github.com/klauspost/compress/zstd/blocktype_string.go
+++ /dev/null
@@ -1,85 +0,0 @@
-// Code generated by "stringer -type=blockType,literalsBlockType,seqCompMode,tableIndex"; DO NOT EDIT.
-
-package zstd
-
-import "strconv"
-
-func _() {
-	// An "invalid array index" compiler error signifies that the constant values have changed.
-	// Re-run the stringer command to generate them again.
-	var x [1]struct{}
-	_ = x[blockTypeRaw-0]
-	_ = x[blockTypeRLE-1]
-	_ = x[blockTypeCompressed-2]
-	_ = x[blockTypeReserved-3]
-}
-
-const _blockType_name = "blockTypeRawblockTypeRLEblockTypeCompressedblockTypeReserved"
-
-var _blockType_index = [...]uint8{0, 12, 24, 43, 60}
-
-func (i blockType) String() string {
-	if i >= blockType(len(_blockType_index)-1) {
-		return "blockType(" + strconv.FormatInt(int64(i), 10) + ")"
-	}
-	return _blockType_name[_blockType_index[i]:_blockType_index[i+1]]
-}
-func _() {
-	// An "invalid array index" compiler error signifies that the constant values have changed.
-	// Re-run the stringer command to generate them again.
-	var x [1]struct{}
-	_ = x[literalsBlockRaw-0]
-	_ = x[literalsBlockRLE-1]
-	_ = x[literalsBlockCompressed-2]
-	_ = x[literalsBlockTreeless-3]
-}
-
-const _literalsBlockType_name = "literalsBlockRawliteralsBlockRLEliteralsBlockCompressedliteralsBlockTreeless"
-
-var _literalsBlockType_index = [...]uint8{0, 16, 32, 55, 76}
-
-func (i literalsBlockType) String() string {
-	if i >= literalsBlockType(len(_literalsBlockType_index)-1) {
-		return "literalsBlockType(" + strconv.FormatInt(int64(i), 10) + ")"
-	}
-	return _literalsBlockType_name[_literalsBlockType_index[i]:_literalsBlockType_index[i+1]]
-}
-func _() {
-	// An "invalid array index" compiler error signifies that the constant values have changed.
-	// Re-run the stringer command to generate them again.
-	var x [1]struct{}
-	_ = x[compModePredefined-0]
-	_ = x[compModeRLE-1]
-	_ = x[compModeFSE-2]
-	_ = x[compModeRepeat-3]
-}
-
-const _seqCompMode_name = "compModePredefinedcompModeRLEcompModeFSEcompModeRepeat"
-
-var _seqCompMode_index = [...]uint8{0, 18, 29, 40, 54}
-
-func (i seqCompMode) String() string {
-	if i >= seqCompMode(len(_seqCompMode_index)-1) {
-		return "seqCompMode(" + strconv.FormatInt(int64(i), 10) + ")"
-	}
-	return _seqCompMode_name[_seqCompMode_index[i]:_seqCompMode_index[i+1]]
-}
-func _() {
-	// An "invalid array index" compiler error signifies that the constant values have changed.
-	// Re-run the stringer command to generate them again.
-	var x [1]struct{}
-	_ = x[tableLiteralLengths-0]
-	_ = x[tableOffsets-1]
-	_ = x[tableMatchLengths-2]
-}
-
-const _tableIndex_name = "tableLiteralLengthstableOffsetstableMatchLengths"
-
-var _tableIndex_index = [...]uint8{0, 19, 31, 48}
-
-func (i tableIndex) String() string {
-	if i >= tableIndex(len(_tableIndex_index)-1) {
-		return "tableIndex(" + strconv.FormatInt(int64(i), 10) + ")"
-	}
-	return _tableIndex_name[_tableIndex_index[i]:_tableIndex_index[i+1]]
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/bytebuf.go b/vendor/github.com/klauspost/compress/zstd/bytebuf.go
deleted file mode 100644
index 55a388553..000000000
--- a/vendor/github.com/klauspost/compress/zstd/bytebuf.go
+++ /dev/null
@@ -1,131 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"fmt"
-	"io"
-)
-
-type byteBuffer interface {
-	// Read up to 8 bytes.
-	// Returns io.ErrUnexpectedEOF if this cannot be satisfied.
-	readSmall(n int) ([]byte, error)
-
-	// Read >8 bytes.
-	// MAY use the destination slice.
-	readBig(n int, dst []byte) ([]byte, error)
-
-	// Read a single byte.
-	readByte() (byte, error)
-
-	// Skip n bytes.
-	skipN(n int64) error
-}
-
-// in-memory buffer
-type byteBuf []byte
-
-func (b *byteBuf) readSmall(n int) ([]byte, error) {
-	if debugAsserts && n > 8 {
-		panic(fmt.Errorf("small read > 8 (%d). use readBig", n))
-	}
-	bb := *b
-	if len(bb) < n {
-		return nil, io.ErrUnexpectedEOF
-	}
-	r := bb[:n]
-	*b = bb[n:]
-	return r, nil
-}
-
-func (b *byteBuf) readBig(n int, dst []byte) ([]byte, error) {
-	bb := *b
-	if len(bb) < n {
-		return nil, io.ErrUnexpectedEOF
-	}
-	r := bb[:n]
-	*b = bb[n:]
-	return r, nil
-}
-
-func (b *byteBuf) readByte() (byte, error) {
-	bb := *b
-	if len(bb) < 1 {
-		return 0, io.ErrUnexpectedEOF
-	}
-	r := bb[0]
-	*b = bb[1:]
-	return r, nil
-}
-
-func (b *byteBuf) skipN(n int64) error {
-	bb := *b
-	if n < 0 {
-		return fmt.Errorf("negative skip (%d) requested", n)
-	}
-	if int64(len(bb)) < n {
-		return io.ErrUnexpectedEOF
-	}
-	*b = bb[n:]
-	return nil
-}
-
-// wrapper around a reader.
-type readerWrapper struct {
-	r   io.Reader
-	tmp [8]byte
-}
-
-func (r *readerWrapper) readSmall(n int) ([]byte, error) {
-	if debugAsserts && n > 8 {
-		panic(fmt.Errorf("small read > 8 (%d). use readBig", n))
-	}
-	n2, err := io.ReadFull(r.r, r.tmp[:n])
-	// We only really care about the actual bytes read.
-	if err != nil {
-		if err == io.EOF {
-			return nil, io.ErrUnexpectedEOF
-		}
-		if debugDecoder {
-			println("readSmall: got", n2, "want", n, "err", err)
-		}
-		return nil, err
-	}
-	return r.tmp[:n], nil
-}
-
-func (r *readerWrapper) readBig(n int, dst []byte) ([]byte, error) {
-	if cap(dst) < n {
-		dst = make([]byte, n)
-	}
-	n2, err := io.ReadFull(r.r, dst[:n])
-	if err == io.EOF && n > 0 {
-		err = io.ErrUnexpectedEOF
-	}
-	return dst[:n2], err
-}
-
-func (r *readerWrapper) readByte() (byte, error) {
-	n2, err := io.ReadFull(r.r, r.tmp[:1])
-	if err != nil {
-		if err == io.EOF {
-			err = io.ErrUnexpectedEOF
-		}
-		return 0, err
-	}
-	if n2 != 1 {
-		return 0, io.ErrUnexpectedEOF
-	}
-	return r.tmp[0], nil
-}
-
-func (r *readerWrapper) skipN(n int64) error {
-	n2, err := io.CopyN(io.Discard, r.r, n)
-	if n2 != n {
-		err = io.ErrUnexpectedEOF
-	}
-	return err
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/bytereader.go b/vendor/github.com/klauspost/compress/zstd/bytereader.go
deleted file mode 100644
index 0e59a242d..000000000
--- a/vendor/github.com/klauspost/compress/zstd/bytereader.go
+++ /dev/null
@@ -1,82 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-// byteReader provides a byte reader that reads
-// little endian values from a byte stream.
-// The input stream is manually advanced.
-// The reader performs no bounds checks.
-type byteReader struct {
-	b   []byte
-	off int
-}
-
-// advance the stream b n bytes.
-func (b *byteReader) advance(n uint) {
-	b.off += int(n)
-}
-
-// overread returns whether we have advanced too far.
-func (b *byteReader) overread() bool {
-	return b.off > len(b.b)
-}
-
-// Int32 returns a little endian int32 starting at current offset.
-func (b byteReader) Int32() int32 {
-	b2 := b.b[b.off:]
-	b2 = b2[:4]
-	v3 := int32(b2[3])
-	v2 := int32(b2[2])
-	v1 := int32(b2[1])
-	v0 := int32(b2[0])
-	return v0 | (v1 << 8) | (v2 << 16) | (v3 << 24)
-}
-
-// Uint8 returns the next byte
-func (b *byteReader) Uint8() uint8 {
-	v := b.b[b.off]
-	return v
-}
-
-// Uint32 returns a little endian uint32 starting at current offset.
-func (b byteReader) Uint32() uint32 {
-	if r := b.remain(); r < 4 {
-		// Very rare
-		v := uint32(0)
-		for i := 1; i <= r; i++ {
-			v = (v << 8) | uint32(b.b[len(b.b)-i])
-		}
-		return v
-	}
-	b2 := b.b[b.off:]
-	b2 = b2[:4]
-	v3 := uint32(b2[3])
-	v2 := uint32(b2[2])
-	v1 := uint32(b2[1])
-	v0 := uint32(b2[0])
-	return v0 | (v1 << 8) | (v2 << 16) | (v3 << 24)
-}
-
-// Uint32NC returns a little endian uint32 starting at current offset.
-// The caller must be sure if there are at least 4 bytes left.
-func (b byteReader) Uint32NC() uint32 {
-	b2 := b.b[b.off:]
-	b2 = b2[:4]
-	v3 := uint32(b2[3])
-	v2 := uint32(b2[2])
-	v1 := uint32(b2[1])
-	v0 := uint32(b2[0])
-	return v0 | (v1 << 8) | (v2 << 16) | (v3 << 24)
-}
-
-// unread returns the unread portion of the input.
-func (b byteReader) unread() []byte {
-	return b.b[b.off:]
-}
-
-// remain will return the number of bytes remaining.
-func (b byteReader) remain() int {
-	return len(b.b) - b.off
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/decodeheader.go b/vendor/github.com/klauspost/compress/zstd/decodeheader.go
deleted file mode 100644
index 6a5a2988b..000000000
--- a/vendor/github.com/klauspost/compress/zstd/decodeheader.go
+++ /dev/null
@@ -1,261 +0,0 @@
-// Copyright 2020+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-
-package zstd
-
-import (
-	"encoding/binary"
-	"errors"
-	"io"
-)
-
-// HeaderMaxSize is the maximum size of a Frame and Block Header.
-// If less is sent to Header.Decode it *may* still contain enough information.
-const HeaderMaxSize = 14 + 3
-
-// Header contains information about the first frame and block within that.
-type Header struct {
-	// SingleSegment specifies whether the data is to be decompressed into a
-	// single contiguous memory segment.
-	// It implies that WindowSize is invalid and that FrameContentSize is valid.
-	SingleSegment bool
-
-	// WindowSize is the window of data to keep while decoding.
-	// Will only be set if SingleSegment is false.
-	WindowSize uint64
-
-	// Dictionary ID.
-	// If 0, no dictionary.
-	DictionaryID uint32
-
-	// HasFCS specifies whether FrameContentSize has a valid value.
-	HasFCS bool
-
-	// FrameContentSize is the expected uncompressed size of the entire frame.
-	FrameContentSize uint64
-
-	// Skippable will be true if the frame is meant to be skipped.
-	// This implies that FirstBlock.OK is false.
-	Skippable bool
-
-	// SkippableID is the user-specific ID for the skippable frame.
-	// Valid values are between 0 to 15, inclusive.
-	SkippableID int
-
-	// SkippableSize is the length of the user data to skip following
-	// the header.
-	SkippableSize uint32
-
-	// HeaderSize is the raw size of the frame header.
-	//
-	// For normal frames, it includes the size of the magic number and
-	// the size of the header (per section 3.1.1.1).
-	// It does not include the size for any data blocks (section 3.1.1.2) nor
-	// the size for the trailing content checksum.
-	//
-	// For skippable frames, this counts the size of the magic number
-	// along with the size of the size field of the payload.
-	// It does not include the size of the skippable payload itself.
-	// The total frame size is the HeaderSize plus the SkippableSize.
-	HeaderSize int
-
-	// First block information.
-	FirstBlock struct {
-		// OK will be set if first block could be decoded.
-		OK bool
-
-		// Is this the last block of a frame?
-		Last bool
-
-		// Is the data compressed?
-		// If true CompressedSize will be populated.
-		// Unfortunately DecompressedSize cannot be determined
-		// without decoding the blocks.
-		Compressed bool
-
-		// DecompressedSize is the expected decompressed size of the block.
-		// Will be 0 if it cannot be determined.
-		DecompressedSize int
-
-		// CompressedSize of the data in the block.
-		// Does not include the block header.
-		// Will be equal to DecompressedSize if not Compressed.
-		CompressedSize int
-	}
-
-	// If set there is a checksum present for the block content.
-	// The checksum field at the end is always 4 bytes long.
-	HasCheckSum bool
-}
-
-// Decode the header from the beginning of the stream.
-// This will decode the frame header and the first block header if enough bytes are provided.
-// It is recommended to provide at least HeaderMaxSize bytes.
-// If the frame header cannot be read an error will be returned.
-// If there isn't enough input, io.ErrUnexpectedEOF is returned.
-// The FirstBlock.OK will indicate if enough information was available to decode the first block header.
-func (h *Header) Decode(in []byte) error {
-	_, err := h.DecodeAndStrip(in)
-	return err
-}
-
-// DecodeAndStrip will decode the header from the beginning of the stream
-// and on success return the remaining bytes.
-// This will decode the frame header and the first block header if enough bytes are provided.
-// It is recommended to provide at least HeaderMaxSize bytes.
-// If the frame header cannot be read an error will be returned.
-// If there isn't enough input, io.ErrUnexpectedEOF is returned.
-// The FirstBlock.OK will indicate if enough information was available to decode the first block header.
-func (h *Header) DecodeAndStrip(in []byte) (remain []byte, err error) {
-	*h = Header{}
-	if len(in) < 4 {
-		return nil, io.ErrUnexpectedEOF
-	}
-	h.HeaderSize += 4
-	b, in := in[:4], in[4:]
-	if string(b) != frameMagic {
-		if string(b[1:4]) != skippableFrameMagic || b[0]&0xf0 != 0x50 {
-			return nil, ErrMagicMismatch
-		}
-		if len(in) < 4 {
-			return nil, io.ErrUnexpectedEOF
-		}
-		h.HeaderSize += 4
-		h.Skippable = true
-		h.SkippableID = int(b[0] & 0xf)
-		h.SkippableSize = binary.LittleEndian.Uint32(in)
-		return in[4:], nil
-	}
-
-	// Read Window_Descriptor
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#window_descriptor
-	if len(in) < 1 {
-		return nil, io.ErrUnexpectedEOF
-	}
-	fhd, in := in[0], in[1:]
-	h.HeaderSize++
-	h.SingleSegment = fhd&(1<<5) != 0
-	h.HasCheckSum = fhd&(1<<2) != 0
-	if fhd&(1<<3) != 0 {
-		return nil, errors.New("reserved bit set on frame header")
-	}
-
-	if !h.SingleSegment {
-		if len(in) < 1 {
-			return nil, io.ErrUnexpectedEOF
-		}
-		var wd byte
-		wd, in = in[0], in[1:]
-		h.HeaderSize++
-		windowLog := 10 + (wd >> 3)
-		windowBase := uint64(1) << windowLog
-		windowAdd := (windowBase / 8) * uint64(wd&0x7)
-		h.WindowSize = windowBase + windowAdd
-	}
-
-	// Read Dictionary_ID
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#dictionary_id
-	if size := fhd & 3; size != 0 {
-		if size == 3 {
-			size = 4
-		}
-		if len(in) < int(size) {
-			return nil, io.ErrUnexpectedEOF
-		}
-		b, in = in[:size], in[size:]
-		h.HeaderSize += int(size)
-		switch len(b) {
-		case 1:
-			h.DictionaryID = uint32(b[0])
-		case 2:
-			h.DictionaryID = uint32(b[0]) | (uint32(b[1]) << 8)
-		case 4:
-			h.DictionaryID = uint32(b[0]) | (uint32(b[1]) << 8) | (uint32(b[2]) << 16) | (uint32(b[3]) << 24)
-		}
-	}
-
-	// Read Frame_Content_Size
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#frame_content_size
-	var fcsSize int
-	v := fhd >> 6
-	switch v {
-	case 0:
-		if h.SingleSegment {
-			fcsSize = 1
-		}
-	default:
-		fcsSize = 1 << v
-	}
-
-	if fcsSize > 0 {
-		h.HasFCS = true
-		if len(in) < fcsSize {
-			return nil, io.ErrUnexpectedEOF
-		}
-		b, in = in[:fcsSize], in[fcsSize:]
-		h.HeaderSize += int(fcsSize)
-		switch len(b) {
-		case 1:
-			h.FrameContentSize = uint64(b[0])
-		case 2:
-			// When FCS_Field_Size is 2, the offset of 256 is added.
-			h.FrameContentSize = uint64(b[0]) | (uint64(b[1]) << 8) + 256
-		case 4:
-			h.FrameContentSize = uint64(b[0]) | (uint64(b[1]) << 8) | (uint64(b[2]) << 16) | (uint64(b[3]) << 24)
-		case 8:
-			d1 := uint32(b[0]) | (uint32(b[1]) << 8) | (uint32(b[2]) << 16) | (uint32(b[3]) << 24)
-			d2 := uint32(b[4]) | (uint32(b[5]) << 8) | (uint32(b[6]) << 16) | (uint32(b[7]) << 24)
-			h.FrameContentSize = uint64(d1) | (uint64(d2) << 32)
-		}
-	}
-
-	// Frame Header done, we will not fail from now on.
-	if len(in) < 3 {
-		return in, nil
-	}
-	tmp := in[:3]
-	bh := uint32(tmp[0]) | (uint32(tmp[1]) << 8) | (uint32(tmp[2]) << 16)
-	h.FirstBlock.Last = bh&1 != 0
-	blockType := blockType((bh >> 1) & 3)
-	// find size.
-	cSize := int(bh >> 3)
-	switch blockType {
-	case blockTypeReserved:
-		return in, nil
-	case blockTypeRLE:
-		h.FirstBlock.Compressed = true
-		h.FirstBlock.DecompressedSize = cSize
-		h.FirstBlock.CompressedSize = 1
-	case blockTypeCompressed:
-		h.FirstBlock.Compressed = true
-		h.FirstBlock.CompressedSize = cSize
-	case blockTypeRaw:
-		h.FirstBlock.DecompressedSize = cSize
-		h.FirstBlock.CompressedSize = cSize
-	default:
-		panic("Invalid block type")
-	}
-
-	h.FirstBlock.OK = true
-	return in, nil
-}
-
-// AppendTo will append the encoded header to the dst slice.
-// There is no error checking performed on the header values.
-func (h *Header) AppendTo(dst []byte) ([]byte, error) {
-	if h.Skippable {
-		magic := [4]byte{0x50, 0x2a, 0x4d, 0x18}
-		magic[0] |= byte(h.SkippableID & 0xf)
-		dst = append(dst, magic[:]...)
-		f := h.SkippableSize
-		return append(dst, uint8(f), uint8(f>>8), uint8(f>>16), uint8(f>>24)), nil
-	}
-	f := frameHeader{
-		ContentSize:   h.FrameContentSize,
-		WindowSize:    uint32(h.WindowSize),
-		SingleSegment: h.SingleSegment,
-		Checksum:      h.HasCheckSum,
-		DictID:        h.DictionaryID,
-	}
-	return f.appendTo(dst), nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/decoder.go b/vendor/github.com/klauspost/compress/zstd/decoder.go
deleted file mode 100644
index bbca17234..000000000
--- a/vendor/github.com/klauspost/compress/zstd/decoder.go
+++ /dev/null
@@ -1,948 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"context"
-	"encoding/binary"
-	"io"
-	"sync"
-
-	"github.com/klauspost/compress/zstd/internal/xxhash"
-)
-
-// Decoder provides decoding of zstandard streams.
-// The decoder has been designed to operate without allocations after a warmup.
-// This means that you should store the decoder for best performance.
-// To re-use a stream decoder, use the Reset(r io.Reader) error to switch to another stream.
-// A decoder can safely be re-used even if the previous stream failed.
-// To release the resources, you must call the Close() function on a decoder.
-type Decoder struct {
-	o decoderOptions
-
-	// Unreferenced decoders, ready for use.
-	decoders chan *blockDec
-
-	// Current read position used for Reader functionality.
-	current decoderState
-
-	// sync stream decoding
-	syncStream struct {
-		decodedFrame uint64
-		br           readerWrapper
-		enabled      bool
-		inFrame      bool
-		dstBuf       []byte
-	}
-
-	frame *frameDec
-
-	// Custom dictionaries.
-	dicts map[uint32]*dict
-
-	// streamWg is the waitgroup for all streams
-	streamWg sync.WaitGroup
-}
-
-// decoderState is used for maintaining state when the decoder
-// is used for streaming.
-type decoderState struct {
-	// current block being written to stream.
-	decodeOutput
-
-	// output in order to be written to stream.
-	output chan decodeOutput
-
-	// cancel remaining output.
-	cancel context.CancelFunc
-
-	// crc of current frame
-	crc *xxhash.Digest
-
-	flushed bool
-}
-
-var (
-	// Check the interfaces we want to support.
-	_ = io.WriterTo(&Decoder{})
-	_ = io.Reader(&Decoder{})
-)
-
-// NewReader creates a new decoder.
-// A nil Reader can be provided in which case Reset can be used to start a decode.
-//
-// A Decoder can be used in two modes:
-//
-// 1) As a stream, or
-// 2) For stateless decoding using DecodeAll.
-//
-// Only a single stream can be decoded concurrently, but the same decoder
-// can run multiple concurrent stateless decodes. It is even possible to
-// use stateless decodes while a stream is being decoded.
-//
-// The Reset function can be used to initiate a new stream, which will considerably
-// reduce the allocations normally caused by NewReader.
-func NewReader(r io.Reader, opts ...DOption) (*Decoder, error) {
-	initPredefined()
-	var d Decoder
-	d.o.setDefault()
-	for _, o := range opts {
-		err := o(&d.o)
-		if err != nil {
-			return nil, err
-		}
-	}
-	d.current.crc = xxhash.New()
-	d.current.flushed = true
-
-	if r == nil {
-		d.current.err = ErrDecoderNilInput
-	}
-
-	// Transfer option dicts.
-	d.dicts = make(map[uint32]*dict, len(d.o.dicts))
-	for _, dc := range d.o.dicts {
-		d.dicts[dc.id] = dc
-	}
-	d.o.dicts = nil
-
-	// Create decoders
-	d.decoders = make(chan *blockDec, d.o.concurrent)
-	for i := 0; i < d.o.concurrent; i++ {
-		dec := newBlockDec(d.o.lowMem)
-		dec.localFrame = newFrameDec(d.o)
-		d.decoders <- dec
-	}
-
-	if r == nil {
-		return &d, nil
-	}
-	return &d, d.Reset(r)
-}
-
-// Read bytes from the decompressed stream into p.
-// Returns the number of bytes written and any error that occurred.
-// When the stream is done, io.EOF will be returned.
-func (d *Decoder) Read(p []byte) (int, error) {
-	var n int
-	for {
-		if len(d.current.b) > 0 {
-			filled := copy(p, d.current.b)
-			p = p[filled:]
-			d.current.b = d.current.b[filled:]
-			n += filled
-		}
-		if len(p) == 0 {
-			break
-		}
-		if len(d.current.b) == 0 {
-			// We have an error and no more data
-			if d.current.err != nil {
-				break
-			}
-			if !d.nextBlock(n == 0) {
-				return n, d.current.err
-			}
-		}
-	}
-	if len(d.current.b) > 0 {
-		if debugDecoder {
-			println("returning", n, "still bytes left:", len(d.current.b))
-		}
-		// Only return error at end of block
-		return n, nil
-	}
-	if d.current.err != nil {
-		d.drainOutput()
-	}
-	if debugDecoder {
-		println("returning", n, d.current.err, len(d.decoders))
-	}
-	return n, d.current.err
-}
-
-// Reset will reset the decoder the supplied stream after the current has finished processing.
-// Note that this functionality cannot be used after Close has been called.
-// Reset can be called with a nil reader to release references to the previous reader.
-// After being called with a nil reader, no other operations than Reset or DecodeAll or Close
-// should be used.
-func (d *Decoder) Reset(r io.Reader) error {
-	if d.current.err == ErrDecoderClosed {
-		return d.current.err
-	}
-
-	d.drainOutput()
-
-	d.syncStream.br.r = nil
-	if r == nil {
-		d.current.err = ErrDecoderNilInput
-		if len(d.current.b) > 0 {
-			d.current.b = d.current.b[:0]
-		}
-		d.current.flushed = true
-		return nil
-	}
-
-	// If bytes buffer and < 5MB, do sync decoding anyway.
-	if bb, ok := r.(byter); ok && bb.Len() < d.o.decodeBufsBelow && !d.o.limitToCap {
-		bb2 := bb
-		if debugDecoder {
-			println("*bytes.Buffer detected, doing sync decode, len:", bb.Len())
-		}
-		b := bb2.Bytes()
-		var dst []byte
-		if cap(d.syncStream.dstBuf) > 0 {
-			dst = d.syncStream.dstBuf[:0]
-		}
-
-		dst, err := d.DecodeAll(b, dst)
-		if err == nil {
-			err = io.EOF
-		}
-		// Save output buffer
-		d.syncStream.dstBuf = dst
-		d.current.b = dst
-		d.current.err = err
-		d.current.flushed = true
-		if debugDecoder {
-			println("sync decode to", len(dst), "bytes, err:", err)
-		}
-		return nil
-	}
-	// Remove current block.
-	d.stashDecoder()
-	d.current.decodeOutput = decodeOutput{}
-	d.current.err = nil
-	d.current.flushed = false
-	d.current.d = nil
-	d.syncStream.dstBuf = nil
-
-	// Ensure no-one else is still running...
-	d.streamWg.Wait()
-	if d.frame == nil {
-		d.frame = newFrameDec(d.o)
-	}
-
-	if d.o.concurrent == 1 {
-		return d.startSyncDecoder(r)
-	}
-
-	d.current.output = make(chan decodeOutput, d.o.concurrent)
-	ctx, cancel := context.WithCancel(context.Background())
-	d.current.cancel = cancel
-	d.streamWg.Add(1)
-	go d.startStreamDecoder(ctx, r, d.current.output)
-
-	return nil
-}
-
-// drainOutput will drain the output until errEndOfStream is sent.
-func (d *Decoder) drainOutput() {
-	if d.current.cancel != nil {
-		if debugDecoder {
-			println("cancelling current")
-		}
-		d.current.cancel()
-		d.current.cancel = nil
-	}
-	if d.current.d != nil {
-		if debugDecoder {
-			printf("re-adding current decoder %p, decoders: %d", d.current.d, len(d.decoders))
-		}
-		d.decoders <- d.current.d
-		d.current.d = nil
-		d.current.b = nil
-	}
-	if d.current.output == nil || d.current.flushed {
-		println("current already flushed")
-		return
-	}
-	for v := range d.current.output {
-		if v.d != nil {
-			if debugDecoder {
-				printf("re-adding decoder %p", v.d)
-			}
-			d.decoders <- v.d
-		}
-	}
-	d.current.output = nil
-	d.current.flushed = true
-}
-
-// WriteTo writes data to w until there's no more data to write or when an error occurs.
-// The return value n is the number of bytes written.
-// Any error encountered during the write is also returned.
-func (d *Decoder) WriteTo(w io.Writer) (int64, error) {
-	var n int64
-	for {
-		if len(d.current.b) > 0 {
-			n2, err2 := w.Write(d.current.b)
-			n += int64(n2)
-			if err2 != nil && (d.current.err == nil || d.current.err == io.EOF) {
-				d.current.err = err2
-			} else if n2 != len(d.current.b) {
-				d.current.err = io.ErrShortWrite
-			}
-		}
-		if d.current.err != nil {
-			break
-		}
-		d.nextBlock(true)
-	}
-	err := d.current.err
-	if err != nil {
-		d.drainOutput()
-	}
-	if err == io.EOF {
-		err = nil
-	}
-	return n, err
-}
-
-// DecodeAll allows stateless decoding of a blob of bytes.
-// Output will be appended to dst, so if the destination size is known
-// you can pre-allocate the destination slice to avoid allocations.
-// DecodeAll can be used concurrently.
-// The Decoder concurrency limits will be respected.
-func (d *Decoder) DecodeAll(input, dst []byte) ([]byte, error) {
-	if d.decoders == nil {
-		return dst, ErrDecoderClosed
-	}
-
-	// Grab a block decoder and frame decoder.
-	block := <-d.decoders
-	frame := block.localFrame
-	initialSize := len(dst)
-	defer func() {
-		if debugDecoder {
-			printf("re-adding decoder: %p", block)
-		}
-		frame.rawInput = nil
-		frame.bBuf = nil
-		if frame.history.decoders.br != nil {
-			frame.history.decoders.br.in = nil
-		}
-		d.decoders <- block
-	}()
-	frame.bBuf = input
-
-	for {
-		frame.history.reset()
-		err := frame.reset(&frame.bBuf)
-		if err != nil {
-			if err == io.EOF {
-				if debugDecoder {
-					println("frame reset return EOF")
-				}
-				return dst, nil
-			}
-			return dst, err
-		}
-		if err = d.setDict(frame); err != nil {
-			return nil, err
-		}
-		if frame.WindowSize > d.o.maxWindowSize {
-			if debugDecoder {
-				println("window size exceeded:", frame.WindowSize, ">", d.o.maxWindowSize)
-			}
-			return dst, ErrWindowSizeExceeded
-		}
-		if frame.FrameContentSize != fcsUnknown {
-			if frame.FrameContentSize > d.o.maxDecodedSize-uint64(len(dst)-initialSize) {
-				if debugDecoder {
-					println("decoder size exceeded; fcs:", frame.FrameContentSize, "> mcs:", d.o.maxDecodedSize-uint64(len(dst)-initialSize), "len:", len(dst))
-				}
-				return dst, ErrDecoderSizeExceeded
-			}
-			if d.o.limitToCap && frame.FrameContentSize > uint64(cap(dst)-len(dst)) {
-				if debugDecoder {
-					println("decoder size exceeded; fcs:", frame.FrameContentSize, "> (cap-len)", cap(dst)-len(dst))
-				}
-				return dst, ErrDecoderSizeExceeded
-			}
-			if cap(dst)-len(dst) < int(frame.FrameContentSize) {
-				dst2 := make([]byte, len(dst), len(dst)+int(frame.FrameContentSize)+compressedBlockOverAlloc)
-				copy(dst2, dst)
-				dst = dst2
-			}
-		}
-
-		if cap(dst) == 0 && !d.o.limitToCap {
-			// Allocate len(input) * 2 by default if nothing is provided
-			// and we didn't get frame content size.
-			size := len(input) * 2
-			// Cap to 1 MB.
-			if size > 1<<20 {
-				size = 1 << 20
-			}
-			if uint64(size) > d.o.maxDecodedSize {
-				size = int(d.o.maxDecodedSize)
-			}
-			dst = make([]byte, 0, size)
-		}
-
-		dst, err = frame.runDecoder(dst, block)
-		if err != nil {
-			return dst, err
-		}
-		if uint64(len(dst)-initialSize) > d.o.maxDecodedSize {
-			return dst, ErrDecoderSizeExceeded
-		}
-		if len(frame.bBuf) == 0 {
-			if debugDecoder {
-				println("frame dbuf empty")
-			}
-			break
-		}
-	}
-	return dst, nil
-}
-
-// nextBlock returns the next block.
-// If an error occurs d.err will be set.
-// Optionally the function can block for new output.
-// If non-blocking mode is used the returned boolean will be false
-// if no data was available without blocking.
-func (d *Decoder) nextBlock(blocking bool) (ok bool) {
-	if d.current.err != nil {
-		// Keep error state.
-		return false
-	}
-	d.current.b = d.current.b[:0]
-
-	// SYNC:
-	if d.syncStream.enabled {
-		if !blocking {
-			return false
-		}
-		ok = d.nextBlockSync()
-		if !ok {
-			d.stashDecoder()
-		}
-		return ok
-	}
-
-	//ASYNC:
-	d.stashDecoder()
-	if blocking {
-		d.current.decodeOutput, ok = <-d.current.output
-	} else {
-		select {
-		case d.current.decodeOutput, ok = <-d.current.output:
-		default:
-			return false
-		}
-	}
-	if !ok {
-		// This should not happen, so signal error state...
-		d.current.err = io.ErrUnexpectedEOF
-		return false
-	}
-	next := d.current.decodeOutput
-	if next.d != nil && next.d.async.newHist != nil {
-		d.current.crc.Reset()
-	}
-	if debugDecoder {
-		var tmp [4]byte
-		binary.LittleEndian.PutUint32(tmp[:], uint32(xxhash.Sum64(next.b)))
-		println("got", len(d.current.b), "bytes, error:", d.current.err, "data crc:", tmp)
-	}
-
-	if d.o.ignoreChecksum {
-		return true
-	}
-
-	if len(next.b) > 0 {
-		d.current.crc.Write(next.b)
-	}
-	if next.err == nil && next.d != nil && next.d.hasCRC {
-		got := uint32(d.current.crc.Sum64())
-		if got != next.d.checkCRC {
-			if debugDecoder {
-				printf("CRC Check Failed: %08x (got) != %08x (on stream)\n", got, next.d.checkCRC)
-			}
-			d.current.err = ErrCRCMismatch
-		} else {
-			if debugDecoder {
-				printf("CRC ok %08x\n", got)
-			}
-		}
-	}
-
-	return true
-}
-
-func (d *Decoder) nextBlockSync() (ok bool) {
-	if d.current.d == nil {
-		d.current.d = <-d.decoders
-	}
-	for len(d.current.b) == 0 {
-		if !d.syncStream.inFrame {
-			d.frame.history.reset()
-			d.current.err = d.frame.reset(&d.syncStream.br)
-			if d.current.err == nil {
-				d.current.err = d.setDict(d.frame)
-			}
-			if d.current.err != nil {
-				return false
-			}
-			if d.frame.WindowSize > d.o.maxDecodedSize || d.frame.WindowSize > d.o.maxWindowSize {
-				d.current.err = ErrDecoderSizeExceeded
-				return false
-			}
-
-			d.syncStream.decodedFrame = 0
-			d.syncStream.inFrame = true
-		}
-		d.current.err = d.frame.next(d.current.d)
-		if d.current.err != nil {
-			return false
-		}
-		d.frame.history.ensureBlock()
-		if debugDecoder {
-			println("History trimmed:", len(d.frame.history.b), "decoded already:", d.syncStream.decodedFrame)
-		}
-		histBefore := len(d.frame.history.b)
-		d.current.err = d.current.d.decodeBuf(&d.frame.history)
-
-		if d.current.err != nil {
-			println("error after:", d.current.err)
-			return false
-		}
-		d.current.b = d.frame.history.b[histBefore:]
-		if debugDecoder {
-			println("history after:", len(d.frame.history.b))
-		}
-
-		// Check frame size (before CRC)
-		d.syncStream.decodedFrame += uint64(len(d.current.b))
-		if d.syncStream.decodedFrame > d.frame.FrameContentSize {
-			if debugDecoder {
-				printf("DecodedFrame (%d) > FrameContentSize (%d)\n", d.syncStream.decodedFrame, d.frame.FrameContentSize)
-			}
-			d.current.err = ErrFrameSizeExceeded
-			return false
-		}
-
-		// Check FCS
-		if d.current.d.Last && d.frame.FrameContentSize != fcsUnknown && d.syncStream.decodedFrame != d.frame.FrameContentSize {
-			if debugDecoder {
-				printf("DecodedFrame (%d) != FrameContentSize (%d)\n", d.syncStream.decodedFrame, d.frame.FrameContentSize)
-			}
-			d.current.err = ErrFrameSizeMismatch
-			return false
-		}
-
-		// Update/Check CRC
-		if d.frame.HasCheckSum {
-			if !d.o.ignoreChecksum {
-				d.frame.crc.Write(d.current.b)
-			}
-			if d.current.d.Last {
-				if !d.o.ignoreChecksum {
-					d.current.err = d.frame.checkCRC()
-				} else {
-					d.current.err = d.frame.consumeCRC()
-				}
-				if d.current.err != nil {
-					println("CRC error:", d.current.err)
-					return false
-				}
-			}
-		}
-		d.syncStream.inFrame = !d.current.d.Last
-	}
-	return true
-}
-
-func (d *Decoder) stashDecoder() {
-	if d.current.d != nil {
-		if debugDecoder {
-			printf("re-adding current decoder %p", d.current.d)
-		}
-		d.decoders <- d.current.d
-		d.current.d = nil
-	}
-}
-
-// Close will release all resources.
-// It is NOT possible to reuse the decoder after this.
-func (d *Decoder) Close() {
-	if d.current.err == ErrDecoderClosed {
-		return
-	}
-	d.drainOutput()
-	if d.current.cancel != nil {
-		d.current.cancel()
-		d.streamWg.Wait()
-		d.current.cancel = nil
-	}
-	if d.decoders != nil {
-		close(d.decoders)
-		for dec := range d.decoders {
-			dec.Close()
-		}
-		d.decoders = nil
-	}
-	if d.current.d != nil {
-		d.current.d.Close()
-		d.current.d = nil
-	}
-	d.current.err = ErrDecoderClosed
-}
-
-// IOReadCloser returns the decoder as an io.ReadCloser for convenience.
-// Any changes to the decoder will be reflected, so the returned ReadCloser
-// can be reused along with the decoder.
-// io.WriterTo is also supported by the returned ReadCloser.
-func (d *Decoder) IOReadCloser() io.ReadCloser {
-	return closeWrapper{d: d}
-}
-
-// closeWrapper wraps a function call as a closer.
-type closeWrapper struct {
-	d *Decoder
-}
-
-// WriteTo forwards WriteTo calls to the decoder.
-func (c closeWrapper) WriteTo(w io.Writer) (n int64, err error) {
-	return c.d.WriteTo(w)
-}
-
-// Read forwards read calls to the decoder.
-func (c closeWrapper) Read(p []byte) (n int, err error) {
-	return c.d.Read(p)
-}
-
-// Close closes the decoder.
-func (c closeWrapper) Close() error {
-	c.d.Close()
-	return nil
-}
-
-type decodeOutput struct {
-	d   *blockDec
-	b   []byte
-	err error
-}
-
-func (d *Decoder) startSyncDecoder(r io.Reader) error {
-	d.frame.history.reset()
-	d.syncStream.br = readerWrapper{r: r}
-	d.syncStream.inFrame = false
-	d.syncStream.enabled = true
-	d.syncStream.decodedFrame = 0
-	return nil
-}
-
-// Create Decoder:
-// ASYNC:
-// Spawn 3 go routines.
-// 0: Read frames and decode block literals.
-// 1: Decode sequences.
-// 2: Execute sequences, send to output.
-func (d *Decoder) startStreamDecoder(ctx context.Context, r io.Reader, output chan decodeOutput) {
-	defer d.streamWg.Done()
-	br := readerWrapper{r: r}
-
-	var seqDecode = make(chan *blockDec, d.o.concurrent)
-	var seqExecute = make(chan *blockDec, d.o.concurrent)
-
-	// Async 1: Decode sequences...
-	go func() {
-		var hist history
-		var hasErr bool
-
-		for block := range seqDecode {
-			if hasErr {
-				if block != nil {
-					seqExecute <- block
-				}
-				continue
-			}
-			if block.async.newHist != nil {
-				if debugDecoder {
-					println("Async 1: new history, recent:", block.async.newHist.recentOffsets)
-				}
-				hist.reset()
-				hist.decoders = block.async.newHist.decoders
-				hist.recentOffsets = block.async.newHist.recentOffsets
-				hist.windowSize = block.async.newHist.windowSize
-				if block.async.newHist.dict != nil {
-					hist.setDict(block.async.newHist.dict)
-				}
-			}
-			if block.err != nil || block.Type != blockTypeCompressed {
-				hasErr = block.err != nil
-				seqExecute <- block
-				continue
-			}
-
-			hist.decoders.literals = block.async.literals
-			block.err = block.prepareSequences(block.async.seqData, &hist)
-			if debugDecoder && block.err != nil {
-				println("prepareSequences returned:", block.err)
-			}
-			hasErr = block.err != nil
-			if block.err == nil {
-				block.err = block.decodeSequences(&hist)
-				if debugDecoder && block.err != nil {
-					println("decodeSequences returned:", block.err)
-				}
-				hasErr = block.err != nil
-				//				block.async.sequence = hist.decoders.seq[:hist.decoders.nSeqs]
-				block.async.seqSize = hist.decoders.seqSize
-			}
-			seqExecute <- block
-		}
-		close(seqExecute)
-		hist.reset()
-	}()
-
-	var wg sync.WaitGroup
-	wg.Add(1)
-
-	// Async 3: Execute sequences...
-	frameHistCache := d.frame.history.b
-	go func() {
-		var hist history
-		var decodedFrame uint64
-		var fcs uint64
-		var hasErr bool
-		for block := range seqExecute {
-			out := decodeOutput{err: block.err, d: block}
-			if block.err != nil || hasErr {
-				hasErr = true
-				output <- out
-				continue
-			}
-			if block.async.newHist != nil {
-				if debugDecoder {
-					println("Async 2: new history")
-				}
-				hist.reset()
-				hist.windowSize = block.async.newHist.windowSize
-				hist.allocFrameBuffer = block.async.newHist.allocFrameBuffer
-				if block.async.newHist.dict != nil {
-					hist.setDict(block.async.newHist.dict)
-				}
-
-				if cap(hist.b) < hist.allocFrameBuffer {
-					if cap(frameHistCache) >= hist.allocFrameBuffer {
-						hist.b = frameHistCache
-					} else {
-						hist.b = make([]byte, 0, hist.allocFrameBuffer)
-						println("Alloc history sized", hist.allocFrameBuffer)
-					}
-				}
-				hist.b = hist.b[:0]
-				fcs = block.async.fcs
-				decodedFrame = 0
-			}
-			do := decodeOutput{err: block.err, d: block}
-			switch block.Type {
-			case blockTypeRLE:
-				if debugDecoder {
-					println("add rle block length:", block.RLESize)
-				}
-
-				if cap(block.dst) < int(block.RLESize) {
-					if block.lowMem {
-						block.dst = make([]byte, block.RLESize)
-					} else {
-						block.dst = make([]byte, maxCompressedBlockSize)
-					}
-				}
-				block.dst = block.dst[:block.RLESize]
-				v := block.data[0]
-				for i := range block.dst {
-					block.dst[i] = v
-				}
-				hist.append(block.dst)
-				do.b = block.dst
-			case blockTypeRaw:
-				if debugDecoder {
-					println("add raw block length:", len(block.data))
-				}
-				hist.append(block.data)
-				do.b = block.data
-			case blockTypeCompressed:
-				if debugDecoder {
-					println("execute with history length:", len(hist.b), "window:", hist.windowSize)
-				}
-				hist.decoders.seqSize = block.async.seqSize
-				hist.decoders.literals = block.async.literals
-				do.err = block.executeSequences(&hist)
-				hasErr = do.err != nil
-				if debugDecoder && hasErr {
-					println("executeSequences returned:", do.err)
-				}
-				do.b = block.dst
-			}
-			if !hasErr {
-				decodedFrame += uint64(len(do.b))
-				if decodedFrame > fcs {
-					println("fcs exceeded", block.Last, fcs, decodedFrame)
-					do.err = ErrFrameSizeExceeded
-					hasErr = true
-				} else if block.Last && fcs != fcsUnknown && decodedFrame != fcs {
-					do.err = ErrFrameSizeMismatch
-					hasErr = true
-				} else {
-					if debugDecoder {
-						println("fcs ok", block.Last, fcs, decodedFrame)
-					}
-				}
-			}
-			output <- do
-		}
-		close(output)
-		frameHistCache = hist.b
-		wg.Done()
-		if debugDecoder {
-			println("decoder goroutines finished")
-		}
-		hist.reset()
-	}()
-
-	var hist history
-decodeStream:
-	for {
-		var hasErr bool
-		hist.reset()
-		decodeBlock := func(block *blockDec) {
-			if hasErr {
-				if block != nil {
-					seqDecode <- block
-				}
-				return
-			}
-			if block.err != nil || block.Type != blockTypeCompressed {
-				hasErr = block.err != nil
-				seqDecode <- block
-				return
-			}
-
-			remain, err := block.decodeLiterals(block.data, &hist)
-			block.err = err
-			hasErr = block.err != nil
-			if err == nil {
-				block.async.literals = hist.decoders.literals
-				block.async.seqData = remain
-			} else if debugDecoder {
-				println("decodeLiterals error:", err)
-			}
-			seqDecode <- block
-		}
-		frame := d.frame
-		if debugDecoder {
-			println("New frame...")
-		}
-		var historySent bool
-		frame.history.reset()
-		err := frame.reset(&br)
-		if debugDecoder && err != nil {
-			println("Frame decoder returned", err)
-		}
-		if err == nil {
-			err = d.setDict(frame)
-		}
-		if err == nil && d.frame.WindowSize > d.o.maxWindowSize {
-			if debugDecoder {
-				println("decoder size exceeded, fws:", d.frame.WindowSize, "> mws:", d.o.maxWindowSize)
-			}
-
-			err = ErrDecoderSizeExceeded
-		}
-		if err != nil {
-			select {
-			case <-ctx.Done():
-			case dec := <-d.decoders:
-				dec.sendErr(err)
-				decodeBlock(dec)
-			}
-			break decodeStream
-		}
-
-		// Go through all blocks of the frame.
-		for {
-			var dec *blockDec
-			select {
-			case <-ctx.Done():
-				break decodeStream
-			case dec = <-d.decoders:
-				// Once we have a decoder, we MUST return it.
-			}
-			err := frame.next(dec)
-			if !historySent {
-				h := frame.history
-				if debugDecoder {
-					println("Alloc History:", h.allocFrameBuffer)
-				}
-				hist.reset()
-				if h.dict != nil {
-					hist.setDict(h.dict)
-				}
-				dec.async.newHist = &h
-				dec.async.fcs = frame.FrameContentSize
-				historySent = true
-			} else {
-				dec.async.newHist = nil
-			}
-			if debugDecoder && err != nil {
-				println("next block returned error:", err)
-			}
-			dec.err = err
-			dec.hasCRC = false
-			if dec.Last && frame.HasCheckSum && err == nil {
-				crc, err := frame.rawInput.readSmall(4)
-				if len(crc) < 4 {
-					if err == nil {
-						err = io.ErrUnexpectedEOF
-
-					}
-					println("CRC missing?", err)
-					dec.err = err
-				} else {
-					dec.checkCRC = binary.LittleEndian.Uint32(crc)
-					dec.hasCRC = true
-					if debugDecoder {
-						printf("found crc to check: %08x\n", dec.checkCRC)
-					}
-				}
-			}
-			err = dec.err
-			last := dec.Last
-			decodeBlock(dec)
-			if err != nil {
-				break decodeStream
-			}
-			if last {
-				break
-			}
-		}
-	}
-	close(seqDecode)
-	wg.Wait()
-	hist.reset()
-	d.frame.history.b = frameHistCache
-}
-
-func (d *Decoder) setDict(frame *frameDec) (err error) {
-	dict, ok := d.dicts[frame.DictionaryID]
-	if ok {
-		if debugDecoder {
-			println("setting dict", frame.DictionaryID)
-		}
-		frame.history.setDict(dict)
-	} else if frame.DictionaryID != 0 {
-		// A zero or missing dictionary id is ambiguous:
-		// either dictionary zero, or no dictionary. In particular,
-		// zstd --patch-from uses this id for the source file,
-		// so only return an error if the dictionary id is not zero.
-		err = ErrUnknownDictionary
-	}
-	return err
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/decoder_options.go b/vendor/github.com/klauspost/compress/zstd/decoder_options.go
deleted file mode 100644
index 774c5f00f..000000000
--- a/vendor/github.com/klauspost/compress/zstd/decoder_options.go
+++ /dev/null
@@ -1,169 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"errors"
-	"fmt"
-	"math/bits"
-	"runtime"
-)
-
-// DOption is an option for creating a decoder.
-type DOption func(*decoderOptions) error
-
-// options retains accumulated state of multiple options.
-type decoderOptions struct {
-	lowMem          bool
-	concurrent      int
-	maxDecodedSize  uint64
-	maxWindowSize   uint64
-	dicts           []*dict
-	ignoreChecksum  bool
-	limitToCap      bool
-	decodeBufsBelow int
-}
-
-func (o *decoderOptions) setDefault() {
-	*o = decoderOptions{
-		// use less ram: true for now, but may change.
-		lowMem:          true,
-		concurrent:      runtime.GOMAXPROCS(0),
-		maxWindowSize:   MaxWindowSize,
-		decodeBufsBelow: 128 << 10,
-	}
-	if o.concurrent > 4 {
-		o.concurrent = 4
-	}
-	o.maxDecodedSize = 64 << 30
-}
-
-// WithDecoderLowmem will set whether to use a lower amount of memory,
-// but possibly have to allocate more while running.
-func WithDecoderLowmem(b bool) DOption {
-	return func(o *decoderOptions) error { o.lowMem = b; return nil }
-}
-
-// WithDecoderConcurrency sets the number of created decoders.
-// When decoding block with DecodeAll, this will limit the number
-// of possible concurrently running decodes.
-// When decoding streams, this will limit the number of
-// inflight blocks.
-// When decoding streams and setting maximum to 1,
-// no async decoding will be done.
-// When a value of 0 is provided GOMAXPROCS will be used.
-// By default this will be set to 4 or GOMAXPROCS, whatever is lower.
-func WithDecoderConcurrency(n int) DOption {
-	return func(o *decoderOptions) error {
-		if n < 0 {
-			return errors.New("concurrency must be at least 1")
-		}
-		if n == 0 {
-			o.concurrent = runtime.GOMAXPROCS(0)
-		} else {
-			o.concurrent = n
-		}
-		return nil
-	}
-}
-
-// WithDecoderMaxMemory allows to set a maximum decoded size for in-memory
-// non-streaming operations or maximum window size for streaming operations.
-// This can be used to control memory usage of potentially hostile content.
-// Maximum is 1 << 63 bytes. Default is 64GiB.
-func WithDecoderMaxMemory(n uint64) DOption {
-	return func(o *decoderOptions) error {
-		if n == 0 {
-			return errors.New("WithDecoderMaxMemory must be at least 1")
-		}
-		if n > 1<<63 {
-			return errors.New("WithDecoderMaxmemory must be less than 1 << 63")
-		}
-		o.maxDecodedSize = n
-		return nil
-	}
-}
-
-// WithDecoderDicts allows to register one or more dictionaries for the decoder.
-//
-// Each slice in dict must be in the [dictionary format] produced by
-// "zstd --train" from the Zstandard reference implementation.
-//
-// If several dictionaries with the same ID are provided, the last one will be used.
-//
-// [dictionary format]: https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#dictionary-format
-func WithDecoderDicts(dicts ...[]byte) DOption {
-	return func(o *decoderOptions) error {
-		for _, b := range dicts {
-			d, err := loadDict(b)
-			if err != nil {
-				return err
-			}
-			o.dicts = append(o.dicts, d)
-		}
-		return nil
-	}
-}
-
-// WithDecoderDictRaw registers a dictionary that may be used by the decoder.
-// The slice content can be arbitrary data.
-func WithDecoderDictRaw(id uint32, content []byte) DOption {
-	return func(o *decoderOptions) error {
-		if bits.UintSize > 32 && uint(len(content)) > dictMaxLength {
-			return fmt.Errorf("dictionary of size %d > 2GiB too large", len(content))
-		}
-		o.dicts = append(o.dicts, &dict{id: id, content: content, offsets: [3]int{1, 4, 8}})
-		return nil
-	}
-}
-
-// WithDecoderMaxWindow allows to set a maximum window size for decodes.
-// This allows rejecting packets that will cause big memory usage.
-// The Decoder will likely allocate more memory based on the WithDecoderLowmem setting.
-// If WithDecoderMaxMemory is set to a lower value, that will be used.
-// Default is 512MB, Maximum is ~3.75 TB as per zstandard spec.
-func WithDecoderMaxWindow(size uint64) DOption {
-	return func(o *decoderOptions) error {
-		if size < MinWindowSize {
-			return errors.New("WithMaxWindowSize must be at least 1KB, 1024 bytes")
-		}
-		if size > (1<<41)+7*(1<<38) {
-			return errors.New("WithMaxWindowSize must be less than (1<<41) + 7*(1<<38) ~ 3.75TB")
-		}
-		o.maxWindowSize = size
-		return nil
-	}
-}
-
-// WithDecodeAllCapLimit will limit DecodeAll to decoding cap(dst)-len(dst) bytes,
-// or any size set in WithDecoderMaxMemory.
-// This can be used to limit decoding to a specific maximum output size.
-// Disabled by default.
-func WithDecodeAllCapLimit(b bool) DOption {
-	return func(o *decoderOptions) error {
-		o.limitToCap = b
-		return nil
-	}
-}
-
-// WithDecodeBuffersBelow will fully decode readers that have a
-// `Bytes() []byte` and `Len() int` interface similar to bytes.Buffer.
-// This typically uses less allocations but will have the full decompressed object in memory.
-// Note that DecodeAllCapLimit will disable this, as well as giving a size of 0 or less.
-// Default is 128KiB.
-func WithDecodeBuffersBelow(size int) DOption {
-	return func(o *decoderOptions) error {
-		o.decodeBufsBelow = size
-		return nil
-	}
-}
-
-// IgnoreChecksum allows to forcibly ignore checksum checking.
-func IgnoreChecksum(b bool) DOption {
-	return func(o *decoderOptions) error {
-		o.ignoreChecksum = b
-		return nil
-	}
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/dict.go b/vendor/github.com/klauspost/compress/zstd/dict.go
deleted file mode 100644
index b7b83164b..000000000
--- a/vendor/github.com/klauspost/compress/zstd/dict.go
+++ /dev/null
@@ -1,565 +0,0 @@
-package zstd
-
-import (
-	"bytes"
-	"encoding/binary"
-	"errors"
-	"fmt"
-	"io"
-	"math"
-	"sort"
-
-	"github.com/klauspost/compress/huff0"
-)
-
-type dict struct {
-	id uint32
-
-	litEnc              *huff0.Scratch
-	llDec, ofDec, mlDec sequenceDec
-	offsets             [3]int
-	content             []byte
-}
-
-const dictMagic = "\x37\xa4\x30\xec"
-
-// Maximum dictionary size for the reference implementation (1.5.3) is 2 GiB.
-const dictMaxLength = 1 << 31
-
-// ID returns the dictionary id or 0 if d is nil.
-func (d *dict) ID() uint32 {
-	if d == nil {
-		return 0
-	}
-	return d.id
-}
-
-// ContentSize returns the dictionary content size or 0 if d is nil.
-func (d *dict) ContentSize() int {
-	if d == nil {
-		return 0
-	}
-	return len(d.content)
-}
-
-// Content returns the dictionary content.
-func (d *dict) Content() []byte {
-	if d == nil {
-		return nil
-	}
-	return d.content
-}
-
-// Offsets returns the initial offsets.
-func (d *dict) Offsets() [3]int {
-	if d == nil {
-		return [3]int{}
-	}
-	return d.offsets
-}
-
-// LitEncoder returns the literal encoder.
-func (d *dict) LitEncoder() *huff0.Scratch {
-	if d == nil {
-		return nil
-	}
-	return d.litEnc
-}
-
-// Load a dictionary as described in
-// https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md#dictionary-format
-func loadDict(b []byte) (*dict, error) {
-	// Check static field size.
-	if len(b) <= 8+(3*4) {
-		return nil, io.ErrUnexpectedEOF
-	}
-	d := dict{
-		llDec: sequenceDec{fse: &fseDecoder{}},
-		ofDec: sequenceDec{fse: &fseDecoder{}},
-		mlDec: sequenceDec{fse: &fseDecoder{}},
-	}
-	if string(b[:4]) != dictMagic {
-		return nil, ErrMagicMismatch
-	}
-	d.id = binary.LittleEndian.Uint32(b[4:8])
-	if d.id == 0 {
-		return nil, errors.New("dictionaries cannot have ID 0")
-	}
-
-	// Read literal table
-	var err error
-	d.litEnc, b, err = huff0.ReadTable(b[8:], nil)
-	if err != nil {
-		return nil, fmt.Errorf("loading literal table: %w", err)
-	}
-	d.litEnc.Reuse = huff0.ReusePolicyMust
-
-	br := byteReader{
-		b:   b,
-		off: 0,
-	}
-	readDec := func(i tableIndex, dec *fseDecoder) error {
-		if err := dec.readNCount(&br, uint16(maxTableSymbol[i])); err != nil {
-			return err
-		}
-		if br.overread() {
-			return io.ErrUnexpectedEOF
-		}
-		err = dec.transform(symbolTableX[i])
-		if err != nil {
-			println("Transform table error:", err)
-			return err
-		}
-		if debugDecoder || debugEncoder {
-			println("Read table ok", "symbolLen:", dec.symbolLen)
-		}
-		// Set decoders as predefined so they aren't reused.
-		dec.preDefined = true
-		return nil
-	}
-
-	if err := readDec(tableOffsets, d.ofDec.fse); err != nil {
-		return nil, err
-	}
-	if err := readDec(tableMatchLengths, d.mlDec.fse); err != nil {
-		return nil, err
-	}
-	if err := readDec(tableLiteralLengths, d.llDec.fse); err != nil {
-		return nil, err
-	}
-	if br.remain() < 12 {
-		return nil, io.ErrUnexpectedEOF
-	}
-
-	d.offsets[0] = int(br.Uint32())
-	br.advance(4)
-	d.offsets[1] = int(br.Uint32())
-	br.advance(4)
-	d.offsets[2] = int(br.Uint32())
-	br.advance(4)
-	if d.offsets[0] <= 0 || d.offsets[1] <= 0 || d.offsets[2] <= 0 {
-		return nil, errors.New("invalid offset in dictionary")
-	}
-	d.content = make([]byte, br.remain())
-	copy(d.content, br.unread())
-	if d.offsets[0] > len(d.content) || d.offsets[1] > len(d.content) || d.offsets[2] > len(d.content) {
-		return nil, fmt.Errorf("initial offset bigger than dictionary content size %d, offsets: %v", len(d.content), d.offsets)
-	}
-
-	return &d, nil
-}
-
-// InspectDictionary loads a zstd dictionary and provides functions to inspect the content.
-func InspectDictionary(b []byte) (interface {
-	ID() uint32
-	ContentSize() int
-	Content() []byte
-	Offsets() [3]int
-	LitEncoder() *huff0.Scratch
-}, error) {
-	initPredefined()
-	d, err := loadDict(b)
-	return d, err
-}
-
-type BuildDictOptions struct {
-	// Dictionary ID.
-	ID uint32
-
-	// Content to use to create dictionary tables.
-	Contents [][]byte
-
-	// History to use for all blocks.
-	History []byte
-
-	// Offsets to use.
-	Offsets [3]int
-
-	// CompatV155 will make the dictionary compatible with Zstd v1.5.5 and earlier.
-	// See https://github.com/facebook/zstd/issues/3724
-	CompatV155 bool
-
-	// Use the specified encoder level.
-	// The dictionary will be built using the specified encoder level,
-	// which will reflect speed and make the dictionary tailored for that level.
-	// If not set SpeedBestCompression will be used.
-	Level EncoderLevel
-
-	// DebugOut will write stats and other details here if set.
-	DebugOut io.Writer
-}
-
-func BuildDict(o BuildDictOptions) ([]byte, error) {
-	initPredefined()
-	hist := o.History
-	contents := o.Contents
-	debug := o.DebugOut != nil
-	println := func(args ...interface{}) {
-		if o.DebugOut != nil {
-			fmt.Fprintln(o.DebugOut, args...)
-		}
-	}
-	printf := func(s string, args ...interface{}) {
-		if o.DebugOut != nil {
-			fmt.Fprintf(o.DebugOut, s, args...)
-		}
-	}
-	print := func(args ...interface{}) {
-		if o.DebugOut != nil {
-			fmt.Fprint(o.DebugOut, args...)
-		}
-	}
-
-	if int64(len(hist)) > dictMaxLength {
-		return nil, fmt.Errorf("dictionary of size %d > %d", len(hist), int64(dictMaxLength))
-	}
-	if len(hist) < 8 {
-		return nil, fmt.Errorf("dictionary of size %d < %d", len(hist), 8)
-	}
-	if len(contents) == 0 {
-		return nil, errors.New("no content provided")
-	}
-	d := dict{
-		id:      o.ID,
-		litEnc:  nil,
-		llDec:   sequenceDec{},
-		ofDec:   sequenceDec{},
-		mlDec:   sequenceDec{},
-		offsets: o.Offsets,
-		content: hist,
-	}
-	block := blockEnc{lowMem: false}
-	block.init()
-	enc := encoder(&bestFastEncoder{fastBase: fastBase{maxMatchOff: int32(maxMatchLen), bufferReset: math.MaxInt32 - int32(maxMatchLen*2), lowMem: false}})
-	if o.Level != 0 {
-		eOpts := encoderOptions{
-			level:      o.Level,
-			blockSize:  maxMatchLen,
-			windowSize: maxMatchLen,
-			dict:       &d,
-			lowMem:     false,
-		}
-		enc = eOpts.encoder()
-	} else {
-		o.Level = SpeedBestCompression
-	}
-	var (
-		remain [256]int
-		ll     [256]int
-		ml     [256]int
-		of     [256]int
-	)
-	addValues := func(dst *[256]int, src []byte) {
-		for _, v := range src {
-			dst[v]++
-		}
-	}
-	addHist := func(dst *[256]int, src *[256]uint32) {
-		for i, v := range src {
-			dst[i] += int(v)
-		}
-	}
-	seqs := 0
-	nUsed := 0
-	litTotal := 0
-	newOffsets := make(map[uint32]int, 1000)
-	for _, b := range contents {
-		block.reset(nil)
-		if len(b) < 8 {
-			continue
-		}
-		nUsed++
-		enc.Reset(&d, true)
-		enc.Encode(&block, b)
-		addValues(&remain, block.literals)
-		litTotal += len(block.literals)
-		if len(block.sequences) == 0 {
-			continue
-		}
-		seqs += len(block.sequences)
-		block.genCodes()
-		addHist(&ll, block.coders.llEnc.Histogram())
-		addHist(&ml, block.coders.mlEnc.Histogram())
-		addHist(&of, block.coders.ofEnc.Histogram())
-		for i, seq := range block.sequences {
-			if i > 3 {
-				break
-			}
-			offset := seq.offset
-			if offset == 0 {
-				continue
-			}
-			if int(offset) >= len(o.History) {
-				continue
-			}
-			if offset > 3 {
-				newOffsets[offset-3]++
-			} else {
-				newOffsets[uint32(o.Offsets[offset-1])]++
-			}
-		}
-	}
-	// Find most used offsets.
-	var sortedOffsets []uint32
-	for k := range newOffsets {
-		sortedOffsets = append(sortedOffsets, k)
-	}
-	sort.Slice(sortedOffsets, func(i, j int) bool {
-		a, b := sortedOffsets[i], sortedOffsets[j]
-		if a == b {
-			// Prefer the longer offset
-			return sortedOffsets[i] > sortedOffsets[j]
-		}
-		return newOffsets[sortedOffsets[i]] > newOffsets[sortedOffsets[j]]
-	})
-	if len(sortedOffsets) > 3 {
-		if debug {
-			print("Offsets:")
-			for i, v := range sortedOffsets {
-				if i > 20 {
-					break
-				}
-				printf("[%d: %d],", v, newOffsets[v])
-			}
-			println("")
-		}
-
-		sortedOffsets = sortedOffsets[:3]
-	}
-	for i, v := range sortedOffsets {
-		o.Offsets[i] = int(v)
-	}
-	if debug {
-		println("New repeat offsets", o.Offsets)
-	}
-
-	if nUsed == 0 || seqs == 0 {
-		return nil, fmt.Errorf("%d blocks, %d sequences found", nUsed, seqs)
-	}
-	if debug {
-		println("Sequences:", seqs, "Blocks:", nUsed, "Literals:", litTotal)
-	}
-	if seqs/nUsed < 512 {
-		// Use 512 as minimum.
-		nUsed = seqs / 512
-		if nUsed == 0 {
-			nUsed = 1
-		}
-	}
-	copyHist := func(dst *fseEncoder, src *[256]int) ([]byte, error) {
-		hist := dst.Histogram()
-		var maxSym uint8
-		var maxCount int
-		var fakeLength int
-		for i, v := range src {
-			if v > 0 {
-				v = v / nUsed
-				if v == 0 {
-					v = 1
-				}
-			}
-			if v > maxCount {
-				maxCount = v
-			}
-			if v != 0 {
-				maxSym = uint8(i)
-			}
-			fakeLength += v
-			hist[i] = uint32(v)
-		}
-
-		// Ensure we aren't trying to represent RLE.
-		if maxCount == fakeLength {
-			for i := range hist {
-				if uint8(i) == maxSym {
-					fakeLength++
-					maxSym++
-					hist[i+1] = 1
-					if maxSym > 1 {
-						break
-					}
-				}
-				if hist[0] == 0 {
-					fakeLength++
-					hist[i] = 1
-					if maxSym > 1 {
-						break
-					}
-				}
-			}
-		}
-
-		dst.HistogramFinished(maxSym, maxCount)
-		dst.reUsed = false
-		dst.useRLE = false
-		err := dst.normalizeCount(fakeLength)
-		if err != nil {
-			return nil, err
-		}
-		if debug {
-			println("RAW:", dst.count[:maxSym+1], "NORM:", dst.norm[:maxSym+1], "LEN:", fakeLength)
-		}
-		return dst.writeCount(nil)
-	}
-	if debug {
-		print("Literal lengths: ")
-	}
-	llTable, err := copyHist(block.coders.llEnc, &ll)
-	if err != nil {
-		return nil, err
-	}
-	if debug {
-		print("Match lengths: ")
-	}
-	mlTable, err := copyHist(block.coders.mlEnc, &ml)
-	if err != nil {
-		return nil, err
-	}
-	if debug {
-		print("Offsets: ")
-	}
-	ofTable, err := copyHist(block.coders.ofEnc, &of)
-	if err != nil {
-		return nil, err
-	}
-
-	// Literal table
-	avgSize := litTotal
-	if avgSize > huff0.BlockSizeMax/2 {
-		avgSize = huff0.BlockSizeMax / 2
-	}
-	huffBuff := make([]byte, 0, avgSize)
-	// Target size
-	div := litTotal / avgSize
-	if div < 1 {
-		div = 1
-	}
-	if debug {
-		println("Huffman weights:")
-	}
-	for i, n := range remain[:] {
-		if n > 0 {
-			n = n / div
-			// Allow all entries to be represented.
-			if n == 0 {
-				n = 1
-			}
-			huffBuff = append(huffBuff, bytes.Repeat([]byte{byte(i)}, n)...)
-			if debug {
-				printf("[%d: %d], ", i, n)
-			}
-		}
-	}
-	if o.CompatV155 && remain[255]/div == 0 {
-		huffBuff = append(huffBuff, 255)
-	}
-	scratch := &huff0.Scratch{TableLog: 11}
-	for tries := 0; tries < 255; tries++ {
-		scratch = &huff0.Scratch{TableLog: 11}
-		_, _, err = huff0.Compress1X(huffBuff, scratch)
-		if err == nil {
-			break
-		}
-		if debug {
-			printf("Try %d: Huffman error: %v\n", tries+1, err)
-		}
-		huffBuff = huffBuff[:0]
-		if tries == 250 {
-			if debug {
-				println("Huffman: Bailing out with predefined table")
-			}
-
-			// Bail out.... Just generate something
-			huffBuff = append(huffBuff, bytes.Repeat([]byte{255}, 10000)...)
-			for i := 0; i < 128; i++ {
-				huffBuff = append(huffBuff, byte(i))
-			}
-			continue
-		}
-		if errors.Is(err, huff0.ErrIncompressible) {
-			// Try truncating least common.
-			for i, n := range remain[:] {
-				if n > 0 {
-					n = n / (div * (i + 1))
-					if n > 0 {
-						huffBuff = append(huffBuff, bytes.Repeat([]byte{byte(i)}, n)...)
-					}
-				}
-			}
-			if o.CompatV155 && len(huffBuff) > 0 && huffBuff[len(huffBuff)-1] != 255 {
-				huffBuff = append(huffBuff, 255)
-			}
-			if len(huffBuff) == 0 {
-				huffBuff = append(huffBuff, 0, 255)
-			}
-		}
-		if errors.Is(err, huff0.ErrUseRLE) {
-			for i, n := range remain[:] {
-				n = n / (div * (i + 1))
-				// Allow all entries to be represented.
-				if n == 0 {
-					n = 1
-				}
-				huffBuff = append(huffBuff, bytes.Repeat([]byte{byte(i)}, n)...)
-			}
-		}
-	}
-
-	var out bytes.Buffer
-	out.Write([]byte(dictMagic))
-	out.Write(binary.LittleEndian.AppendUint32(nil, o.ID))
-	out.Write(scratch.OutTable)
-	if debug {
-		println("huff table:", len(scratch.OutTable), "bytes")
-		println("of table:", len(ofTable), "bytes")
-		println("ml table:", len(mlTable), "bytes")
-		println("ll table:", len(llTable), "bytes")
-	}
-	out.Write(ofTable)
-	out.Write(mlTable)
-	out.Write(llTable)
-	out.Write(binary.LittleEndian.AppendUint32(nil, uint32(o.Offsets[0])))
-	out.Write(binary.LittleEndian.AppendUint32(nil, uint32(o.Offsets[1])))
-	out.Write(binary.LittleEndian.AppendUint32(nil, uint32(o.Offsets[2])))
-	out.Write(hist)
-	if debug {
-		_, err := loadDict(out.Bytes())
-		if err != nil {
-			panic(err)
-		}
-		i, err := InspectDictionary(out.Bytes())
-		if err != nil {
-			panic(err)
-		}
-		println("ID:", i.ID())
-		println("Content size:", i.ContentSize())
-		println("Encoder:", i.LitEncoder() != nil)
-		println("Offsets:", i.Offsets())
-		var totalSize int
-		for _, b := range contents {
-			totalSize += len(b)
-		}
-
-		encWith := func(opts ...EOption) int {
-			enc, err := NewWriter(nil, opts...)
-			if err != nil {
-				panic(err)
-			}
-			defer enc.Close()
-			var dst []byte
-			var totalSize int
-			for _, b := range contents {
-				dst = enc.EncodeAll(b, dst[:0])
-				totalSize += len(dst)
-			}
-			return totalSize
-		}
-		plain := encWith(WithEncoderLevel(o.Level))
-		withDict := encWith(WithEncoderLevel(o.Level), WithEncoderDict(out.Bytes()))
-		println("Input size:", totalSize)
-		println("Plain Compressed:", plain)
-		println("Dict Compressed:", withDict)
-		println("Saved:", plain-withDict, (plain-withDict)/len(contents), "bytes per input (rounded down)")
-	}
-	return out.Bytes(), nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/enc_base.go b/vendor/github.com/klauspost/compress/zstd/enc_base.go
deleted file mode 100644
index 5ca46038a..000000000
--- a/vendor/github.com/klauspost/compress/zstd/enc_base.go
+++ /dev/null
@@ -1,173 +0,0 @@
-package zstd
-
-import (
-	"fmt"
-	"math/bits"
-
-	"github.com/klauspost/compress/zstd/internal/xxhash"
-)
-
-const (
-	dictShardBits = 6
-)
-
-type fastBase struct {
-	// cur is the offset at the start of hist
-	cur int32
-	// maximum offset. Should be at least 2x block size.
-	maxMatchOff int32
-	bufferReset int32
-	hist        []byte
-	crc         *xxhash.Digest
-	tmp         [8]byte
-	blk         *blockEnc
-	lastDictID  uint32
-	lowMem      bool
-}
-
-// CRC returns the underlying CRC writer.
-func (e *fastBase) CRC() *xxhash.Digest {
-	return e.crc
-}
-
-// AppendCRC will append the CRC to the destination slice and return it.
-func (e *fastBase) AppendCRC(dst []byte) []byte {
-	crc := e.crc.Sum(e.tmp[:0])
-	dst = append(dst, crc[7], crc[6], crc[5], crc[4])
-	return dst
-}
-
-// WindowSize returns the window size of the encoder,
-// or a window size small enough to contain the input size, if > 0.
-func (e *fastBase) WindowSize(size int64) int32 {
-	if size > 0 && size < int64(e.maxMatchOff) {
-		b := int32(1) << uint(bits.Len(uint(size)))
-		// Keep minimum window.
-		if b < 1024 {
-			b = 1024
-		}
-		return b
-	}
-	return e.maxMatchOff
-}
-
-// Block returns the current block.
-func (e *fastBase) Block() *blockEnc {
-	return e.blk
-}
-
-func (e *fastBase) addBlock(src []byte) int32 {
-	if debugAsserts && e.cur > e.bufferReset {
-		panic(fmt.Sprintf("ecur (%d) > buffer reset (%d)", e.cur, e.bufferReset))
-	}
-	// check if we have space already
-	if len(e.hist)+len(src) > cap(e.hist) {
-		if cap(e.hist) == 0 {
-			e.ensureHist(len(src))
-		} else {
-			if cap(e.hist) < int(e.maxMatchOff+maxCompressedBlockSize) {
-				panic(fmt.Errorf("unexpected buffer cap %d, want at least %d with window %d", cap(e.hist), e.maxMatchOff+maxCompressedBlockSize, e.maxMatchOff))
-			}
-			// Move down
-			offset := int32(len(e.hist)) - e.maxMatchOff
-			copy(e.hist[0:e.maxMatchOff], e.hist[offset:])
-			e.cur += offset
-			e.hist = e.hist[:e.maxMatchOff]
-		}
-	}
-	s := int32(len(e.hist))
-	e.hist = append(e.hist, src...)
-	return s
-}
-
-// ensureHist will ensure that history can keep at least this many bytes.
-func (e *fastBase) ensureHist(n int) {
-	if cap(e.hist) >= n {
-		return
-	}
-	l := e.maxMatchOff
-	if (e.lowMem && e.maxMatchOff > maxCompressedBlockSize) || e.maxMatchOff <= maxCompressedBlockSize {
-		l += maxCompressedBlockSize
-	} else {
-		l += e.maxMatchOff
-	}
-	// Make it at least 1MB.
-	if l < 1<<20 && !e.lowMem {
-		l = 1 << 20
-	}
-	// Make it at least the requested size.
-	if l < int32(n) {
-		l = int32(n)
-	}
-	e.hist = make([]byte, 0, l)
-}
-
-// useBlock will replace the block with the provided one,
-// but transfer recent offsets from the previous.
-func (e *fastBase) UseBlock(enc *blockEnc) {
-	enc.reset(e.blk)
-	e.blk = enc
-}
-
-func (e *fastBase) matchlen(s, t int32, src []byte) int32 {
-	if debugAsserts {
-		if s < 0 {
-			err := fmt.Sprintf("s (%d) < 0", s)
-			panic(err)
-		}
-		if t < 0 {
-			err := fmt.Sprintf("s (%d) < 0", s)
-			panic(err)
-		}
-		if s-t > e.maxMatchOff {
-			err := fmt.Sprintf("s (%d) - t (%d) > maxMatchOff (%d)", s, t, e.maxMatchOff)
-			panic(err)
-		}
-		if len(src)-int(s) > maxCompressedBlockSize {
-			panic(fmt.Sprintf("len(src)-s (%d) > maxCompressedBlockSize (%d)", len(src)-int(s), maxCompressedBlockSize))
-		}
-	}
-	return int32(matchLen(src[s:], src[t:]))
-}
-
-// Reset the encoding table.
-func (e *fastBase) resetBase(d *dict, singleBlock bool) {
-	if e.blk == nil {
-		e.blk = &blockEnc{lowMem: e.lowMem}
-		e.blk.init()
-	} else {
-		e.blk.reset(nil)
-	}
-	e.blk.initNewEncode()
-	if e.crc == nil {
-		e.crc = xxhash.New()
-	} else {
-		e.crc.Reset()
-	}
-	e.blk.dictLitEnc = nil
-	if d != nil {
-		low := e.lowMem
-		if singleBlock {
-			e.lowMem = true
-		}
-		e.ensureHist(d.ContentSize() + maxCompressedBlockSize)
-		e.lowMem = low
-	}
-
-	// We offset current position so everything will be out of reach.
-	// If above reset line, history will be purged.
-	if e.cur < e.bufferReset {
-		e.cur += e.maxMatchOff + int32(len(e.hist))
-	}
-	e.hist = e.hist[:0]
-	if d != nil {
-		// Set offsets (currently not used)
-		for i, off := range d.offsets {
-			e.blk.recentOffsets[i] = uint32(off)
-			e.blk.prevRecentOffsets[i] = e.blk.recentOffsets[i]
-		}
-		// Transfer litenc.
-		e.blk.dictLitEnc = d.litEnc
-		e.hist = append(e.hist, d.content...)
-	}
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/enc_best.go b/vendor/github.com/klauspost/compress/zstd/enc_best.go
deleted file mode 100644
index 4613724e9..000000000
--- a/vendor/github.com/klauspost/compress/zstd/enc_best.go
+++ /dev/null
@@ -1,560 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"bytes"
-	"fmt"
-
-	"github.com/klauspost/compress"
-)
-
-const (
-	bestLongTableBits = 22                     // Bits used in the long match table
-	bestLongTableSize = 1 << bestLongTableBits // Size of the table
-	bestLongLen       = 8                      // Bytes used for table hash
-
-	// Note: Increasing the short table bits or making the hash shorter
-	// can actually lead to compression degradation since it will 'steal' more from the
-	// long match table and match offsets are quite big.
-	// This greatly depends on the type of input.
-	bestShortTableBits = 18                      // Bits used in the short match table
-	bestShortTableSize = 1 << bestShortTableBits // Size of the table
-	bestShortLen       = 4                       // Bytes used for table hash
-
-)
-
-type match struct {
-	offset int32
-	s      int32
-	length int32
-	rep    int32
-	est    int32
-}
-
-const highScore = maxMatchLen * 8
-
-// estBits will estimate output bits from predefined tables.
-func (m *match) estBits(bitsPerByte int32) {
-	mlc := mlCode(uint32(m.length - zstdMinMatch))
-	var ofc uint8
-	if m.rep < 0 {
-		ofc = ofCode(uint32(m.s-m.offset) + 3)
-	} else {
-		ofc = ofCode(uint32(m.rep) & 3)
-	}
-	// Cost, excluding
-	ofTT, mlTT := fsePredefEnc[tableOffsets].ct.symbolTT[ofc], fsePredefEnc[tableMatchLengths].ct.symbolTT[mlc]
-
-	// Add cost of match encoding...
-	m.est = int32(ofTT.outBits + mlTT.outBits)
-	m.est += int32(ofTT.deltaNbBits>>16 + mlTT.deltaNbBits>>16)
-	// Subtract savings compared to literal encoding...
-	m.est -= (m.length * bitsPerByte) >> 10
-	if m.est > 0 {
-		// Unlikely gain..
-		m.length = 0
-		m.est = highScore
-	}
-}
-
-// bestFastEncoder uses 2 tables, one for short matches (5 bytes) and one for long matches.
-// The long match table contains the previous entry with the same hash,
-// effectively making it a "chain" of length 2.
-// When we find a long match we choose between the two values and select the longest.
-// When we find a short match, after checking the long, we check if we can find a long at n+1
-// and that it is longer (lazy matching).
-type bestFastEncoder struct {
-	fastBase
-	table         [bestShortTableSize]prevEntry
-	longTable     [bestLongTableSize]prevEntry
-	dictTable     []prevEntry
-	dictLongTable []prevEntry
-}
-
-// Encode improves compression...
-func (e *bestFastEncoder) Encode(blk *blockEnc, src []byte) {
-	const (
-		// Input margin is the number of bytes we read (8)
-		// and the maximum we will read ahead (2)
-		inputMargin            = 8 + 4
-		minNonLiteralBlockSize = 16
-	)
-
-	// Protect against e.cur wraparound.
-	for e.cur >= e.bufferReset-int32(len(e.hist)) {
-		if len(e.hist) == 0 {
-			e.table = [bestShortTableSize]prevEntry{}
-			e.longTable = [bestLongTableSize]prevEntry{}
-			e.cur = e.maxMatchOff
-			break
-		}
-		// Shift down everything in the table that isn't already too far away.
-		minOff := e.cur + int32(len(e.hist)) - e.maxMatchOff
-		for i := range e.table[:] {
-			v := e.table[i].offset
-			v2 := e.table[i].prev
-			if v < minOff {
-				v = 0
-				v2 = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-				if v2 < minOff {
-					v2 = 0
-				} else {
-					v2 = v2 - e.cur + e.maxMatchOff
-				}
-			}
-			e.table[i] = prevEntry{
-				offset: v,
-				prev:   v2,
-			}
-		}
-		for i := range e.longTable[:] {
-			v := e.longTable[i].offset
-			v2 := e.longTable[i].prev
-			if v < minOff {
-				v = 0
-				v2 = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-				if v2 < minOff {
-					v2 = 0
-				} else {
-					v2 = v2 - e.cur + e.maxMatchOff
-				}
-			}
-			e.longTable[i] = prevEntry{
-				offset: v,
-				prev:   v2,
-			}
-		}
-		e.cur = e.maxMatchOff
-		break
-	}
-
-	// Add block to history
-	s := e.addBlock(src)
-	blk.size = len(src)
-
-	// Check RLE first
-	if len(src) > zstdMinMatch {
-		ml := matchLen(src[1:], src)
-		if ml == len(src)-1 {
-			blk.literals = append(blk.literals, src[0])
-			blk.sequences = append(blk.sequences, seq{litLen: 1, matchLen: uint32(len(src)-1) - zstdMinMatch, offset: 1 + 3})
-			return
-		}
-	}
-
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Use this to estimate literal cost.
-	// Scaled by 10 bits.
-	bitsPerByte := int32((compress.ShannonEntropyBits(src) * 1024) / len(src))
-	// Huffman can never go < 1 bit/byte
-	if bitsPerByte < 1024 {
-		bitsPerByte = 1024
-	}
-
-	// Override src
-	src = e.hist
-	sLimit := int32(len(src)) - inputMargin
-	const kSearchStrength = 10
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-	offset3 := int32(blk.recentOffsets[2])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		// We allow the encoder to optionally turn off repeat offsets across blocks
-		canRepeat := len(blk.sequences) > 2
-
-		if debugAsserts && canRepeat && offset1 == 0 {
-			panic("offset0 was 0")
-		}
-
-		const goodEnough = 250
-
-		cv := load6432(src, s)
-
-		nextHashL := hashLen(cv, bestLongTableBits, bestLongLen)
-		nextHashS := hashLen(cv, bestShortTableBits, bestShortLen)
-		candidateL := e.longTable[nextHashL]
-		candidateS := e.table[nextHashS]
-
-		// Set m to a match at offset if it looks like that will improve compression.
-		improve := func(m *match, offset int32, s int32, first uint32, rep int32) {
-			delta := s - offset
-			if delta >= e.maxMatchOff || delta <= 0 || load3232(src, offset) != first {
-				return
-			}
-			// Try to quick reject if we already have a long match.
-			if m.length > 16 {
-				left := len(src) - int(m.s+m.length)
-				// If we are too close to the end, keep as is.
-				if left <= 0 {
-					return
-				}
-				checkLen := m.length - (s - m.s) - 8
-				if left > 2 && checkLen > 4 {
-					// Check 4 bytes, 4 bytes from the end of the current match.
-					a := load3232(src, offset+checkLen)
-					b := load3232(src, s+checkLen)
-					if a != b {
-						return
-					}
-				}
-			}
-			l := 4 + e.matchlen(s+4, offset+4, src)
-			if m.rep <= 0 {
-				// Extend candidate match backwards as far as possible.
-				// Do not extend repeats as we can assume they are optimal
-				// and offsets change if s == nextEmit.
-				tMin := s - e.maxMatchOff
-				if tMin < 0 {
-					tMin = 0
-				}
-				for offset > tMin && s > nextEmit && src[offset-1] == src[s-1] && l < maxMatchLength {
-					s--
-					offset--
-					l++
-				}
-			}
-			if debugAsserts {
-				if offset >= s {
-					panic(fmt.Sprintf("offset: %d - s:%d - rep: %d - cur :%d - max: %d", offset, s, rep, e.cur, e.maxMatchOff))
-				}
-				if !bytes.Equal(src[s:s+l], src[offset:offset+l]) {
-					panic(fmt.Sprintf("second match mismatch: %v != %v, first: %08x", src[s:s+4], src[offset:offset+4], first))
-				}
-			}
-			cand := match{offset: offset, s: s, length: l, rep: rep}
-			cand.estBits(bitsPerByte)
-			if m.est >= highScore || cand.est-m.est+(cand.s-m.s)*bitsPerByte>>10 < 0 {
-				*m = cand
-			}
-		}
-
-		best := match{s: s, est: highScore}
-		improve(&best, candidateL.offset-e.cur, s, uint32(cv), -1)
-		improve(&best, candidateL.prev-e.cur, s, uint32(cv), -1)
-		improve(&best, candidateS.offset-e.cur, s, uint32(cv), -1)
-		improve(&best, candidateS.prev-e.cur, s, uint32(cv), -1)
-
-		if canRepeat && best.length < goodEnough {
-			if s == nextEmit {
-				// Check repeats straight after a match.
-				improve(&best, s-offset2, s, uint32(cv), 1|4)
-				improve(&best, s-offset3, s, uint32(cv), 2|4)
-				if offset1 > 1 {
-					improve(&best, s-(offset1-1), s, uint32(cv), 3|4)
-				}
-			}
-
-			// If either no match or a non-repeat match, check at + 1
-			if best.rep <= 0 {
-				cv32 := uint32(cv >> 8)
-				spp := s + 1
-				improve(&best, spp-offset1, spp, cv32, 1)
-				improve(&best, spp-offset2, spp, cv32, 2)
-				improve(&best, spp-offset3, spp, cv32, 3)
-				if best.rep < 0 {
-					cv32 = uint32(cv >> 24)
-					spp += 2
-					improve(&best, spp-offset1, spp, cv32, 1)
-					improve(&best, spp-offset2, spp, cv32, 2)
-					improve(&best, spp-offset3, spp, cv32, 3)
-				}
-			}
-		}
-		// Load next and check...
-		e.longTable[nextHashL] = prevEntry{offset: s + e.cur, prev: candidateL.offset}
-		e.table[nextHashS] = prevEntry{offset: s + e.cur, prev: candidateS.offset}
-		index0 := s + 1
-
-		// Look far ahead, unless we have a really long match already...
-		if best.length < goodEnough {
-			// No match found, move forward on input, no need to check forward...
-			if best.length < 4 {
-				s += 1 + (s-nextEmit)>>(kSearchStrength-1)
-				if s >= sLimit {
-					break encodeLoop
-				}
-				continue
-			}
-
-			candidateS = e.table[hashLen(cv>>8, bestShortTableBits, bestShortLen)]
-			cv = load6432(src, s+1)
-			cv2 := load6432(src, s+2)
-			candidateL = e.longTable[hashLen(cv, bestLongTableBits, bestLongLen)]
-			candidateL2 := e.longTable[hashLen(cv2, bestLongTableBits, bestLongLen)]
-
-			// Short at s+1
-			improve(&best, candidateS.offset-e.cur, s+1, uint32(cv), -1)
-			// Long at s+1, s+2
-			improve(&best, candidateL.offset-e.cur, s+1, uint32(cv), -1)
-			improve(&best, candidateL.prev-e.cur, s+1, uint32(cv), -1)
-			improve(&best, candidateL2.offset-e.cur, s+2, uint32(cv2), -1)
-			improve(&best, candidateL2.prev-e.cur, s+2, uint32(cv2), -1)
-			if false {
-				// Short at s+3.
-				// Too often worse...
-				improve(&best, e.table[hashLen(cv2>>8, bestShortTableBits, bestShortLen)].offset-e.cur, s+3, uint32(cv2>>8), -1)
-			}
-
-			// Start check at a fixed offset to allow for a few mismatches.
-			// For this compression level 2 yields the best results.
-			// We cannot do this if we have already indexed this position.
-			const skipBeginning = 2
-			if best.s > s-skipBeginning {
-				// See if we can find a better match by checking where the current best ends.
-				// Use that offset to see if we can find a better full match.
-				if sAt := best.s + best.length; sAt < sLimit {
-					nextHashL := hashLen(load6432(src, sAt), bestLongTableBits, bestLongLen)
-					candidateEnd := e.longTable[nextHashL]
-
-					if off := candidateEnd.offset - e.cur - best.length + skipBeginning; off >= 0 {
-						improve(&best, off, best.s+skipBeginning, load3232(src, best.s+skipBeginning), -1)
-						if off := candidateEnd.prev - e.cur - best.length + skipBeginning; off >= 0 {
-							improve(&best, off, best.s+skipBeginning, load3232(src, best.s+skipBeginning), -1)
-						}
-					}
-				}
-			}
-		}
-
-		if debugAsserts {
-			if best.offset >= best.s {
-				panic(fmt.Sprintf("best.offset > s: %d >= %d", best.offset, best.s))
-			}
-			if best.s < nextEmit {
-				panic(fmt.Sprintf("s %d < nextEmit %d", best.s, nextEmit))
-			}
-			if best.offset < s-e.maxMatchOff {
-				panic(fmt.Sprintf("best.offset < s-e.maxMatchOff: %d < %d", best.offset, s-e.maxMatchOff))
-			}
-			if !bytes.Equal(src[best.s:best.s+best.length], src[best.offset:best.offset+best.length]) {
-				panic(fmt.Sprintf("match mismatch: %v != %v", src[best.s:best.s+best.length], src[best.offset:best.offset+best.length]))
-			}
-		}
-
-		// We have a match, we can store the forward value
-		s = best.s
-		if best.rep > 0 {
-			var seq seq
-			seq.matchLen = uint32(best.length - zstdMinMatch)
-			addLiterals(&seq, best.s)
-
-			// Repeat. If bit 4 is set, this is a non-lit repeat.
-			seq.offset = uint32(best.rep & 3)
-			if debugSequences {
-				println("repeat sequence", seq, "next s:", best.s, "off:", best.s-best.offset)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Index old s + 1 -> s - 1
-			s = best.s + best.length
-			nextEmit = s
-
-			// Index skipped...
-			end := s
-			if s > sLimit+4 {
-				end = sLimit + 4
-			}
-			off := index0 + e.cur
-			for index0 < end {
-				cv0 := load6432(src, index0)
-				h0 := hashLen(cv0, bestLongTableBits, bestLongLen)
-				h1 := hashLen(cv0, bestShortTableBits, bestShortLen)
-				e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-				e.table[h1] = prevEntry{offset: off, prev: e.table[h1].offset}
-				off++
-				index0++
-			}
-
-			switch best.rep {
-			case 2, 4 | 1:
-				offset1, offset2 = offset2, offset1
-			case 3, 4 | 2:
-				offset1, offset2, offset3 = offset3, offset1, offset2
-			case 4 | 3:
-				offset1, offset2, offset3 = offset1-1, offset1, offset2
-			}
-			if s >= sLimit {
-				if debugEncoder {
-					println("repeat ended", s, best.length)
-				}
-				break encodeLoop
-			}
-			continue
-		}
-
-		// A 4-byte match has been found. Update recent offsets.
-		// We'll later see if more than 4 bytes.
-		t := best.offset
-		offset1, offset2, offset3 = s-t, offset1, offset2
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && int(offset1) > len(src) {
-			panic("invalid offset")
-		}
-
-		// Write our sequence
-		var seq seq
-		l := best.length
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-
-		// Index old s + 1 -> s - 1 or sLimit
-		end := s
-		if s > sLimit-4 {
-			end = sLimit - 4
-		}
-
-		off := index0 + e.cur
-		for index0 < end {
-			cv0 := load6432(src, index0)
-			h0 := hashLen(cv0, bestLongTableBits, bestLongLen)
-			h1 := hashLen(cv0, bestShortTableBits, bestShortLen)
-			e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-			e.table[h1] = prevEntry{offset: off, prev: e.table[h1].offset}
-			index0++
-			off++
-		}
-		if s >= sLimit {
-			break encodeLoop
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	blk.recentOffsets[0] = uint32(offset1)
-	blk.recentOffsets[1] = uint32(offset2)
-	blk.recentOffsets[2] = uint32(offset3)
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-}
-
-// EncodeNoHist will encode a block with no history and no following blocks.
-// Most notable difference is that src will not be copied for history and
-// we do not need to check for max match length.
-func (e *bestFastEncoder) EncodeNoHist(blk *blockEnc, src []byte) {
-	e.ensureHist(len(src))
-	e.Encode(blk, src)
-}
-
-// Reset will reset and set a dictionary if not nil
-func (e *bestFastEncoder) Reset(d *dict, singleBlock bool) {
-	e.resetBase(d, singleBlock)
-	if d == nil {
-		return
-	}
-	// Init or copy dict table
-	if len(e.dictTable) != len(e.table) || d.id != e.lastDictID {
-		if len(e.dictTable) != len(e.table) {
-			e.dictTable = make([]prevEntry, len(e.table))
-		}
-		end := int32(len(d.content)) - 8 + e.maxMatchOff
-		for i := e.maxMatchOff; i < end; i += 4 {
-			const hashLog = bestShortTableBits
-
-			cv := load6432(d.content, i-e.maxMatchOff)
-			nextHash := hashLen(cv, hashLog, bestShortLen)      // 0 -> 4
-			nextHash1 := hashLen(cv>>8, hashLog, bestShortLen)  // 1 -> 5
-			nextHash2 := hashLen(cv>>16, hashLog, bestShortLen) // 2 -> 6
-			nextHash3 := hashLen(cv>>24, hashLog, bestShortLen) // 3 -> 7
-			e.dictTable[nextHash] = prevEntry{
-				prev:   e.dictTable[nextHash].offset,
-				offset: i,
-			}
-			e.dictTable[nextHash1] = prevEntry{
-				prev:   e.dictTable[nextHash1].offset,
-				offset: i + 1,
-			}
-			e.dictTable[nextHash2] = prevEntry{
-				prev:   e.dictTable[nextHash2].offset,
-				offset: i + 2,
-			}
-			e.dictTable[nextHash3] = prevEntry{
-				prev:   e.dictTable[nextHash3].offset,
-				offset: i + 3,
-			}
-		}
-		e.lastDictID = d.id
-	}
-
-	// Init or copy dict table
-	if len(e.dictLongTable) != len(e.longTable) || d.id != e.lastDictID {
-		if len(e.dictLongTable) != len(e.longTable) {
-			e.dictLongTable = make([]prevEntry, len(e.longTable))
-		}
-		if len(d.content) >= 8 {
-			cv := load6432(d.content, 0)
-			h := hashLen(cv, bestLongTableBits, bestLongLen)
-			e.dictLongTable[h] = prevEntry{
-				offset: e.maxMatchOff,
-				prev:   e.dictLongTable[h].offset,
-			}
-
-			end := int32(len(d.content)) - 8 + e.maxMatchOff
-			off := 8 // First to read
-			for i := e.maxMatchOff + 1; i < end; i++ {
-				cv = cv>>8 | (uint64(d.content[off]) << 56)
-				h := hashLen(cv, bestLongTableBits, bestLongLen)
-				e.dictLongTable[h] = prevEntry{
-					offset: i,
-					prev:   e.dictLongTable[h].offset,
-				}
-				off++
-			}
-		}
-		e.lastDictID = d.id
-	}
-	// Reset table to initial state
-	copy(e.longTable[:], e.dictLongTable)
-
-	e.cur = e.maxMatchOff
-	// Reset table to initial state
-	copy(e.table[:], e.dictTable)
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/enc_better.go b/vendor/github.com/klauspost/compress/zstd/enc_better.go
deleted file mode 100644
index 84a79fde7..000000000
--- a/vendor/github.com/klauspost/compress/zstd/enc_better.go
+++ /dev/null
@@ -1,1252 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import "fmt"
-
-const (
-	betterLongTableBits = 19                       // Bits used in the long match table
-	betterLongTableSize = 1 << betterLongTableBits // Size of the table
-	betterLongLen       = 8                        // Bytes used for table hash
-
-	// Note: Increasing the short table bits or making the hash shorter
-	// can actually lead to compression degradation since it will 'steal' more from the
-	// long match table and match offsets are quite big.
-	// This greatly depends on the type of input.
-	betterShortTableBits = 13                        // Bits used in the short match table
-	betterShortTableSize = 1 << betterShortTableBits // Size of the table
-	betterShortLen       = 5                         // Bytes used for table hash
-
-	betterLongTableShardCnt  = 1 << (betterLongTableBits - dictShardBits)    // Number of shards in the table
-	betterLongTableShardSize = betterLongTableSize / betterLongTableShardCnt // Size of an individual shard
-
-	betterShortTableShardCnt  = 1 << (betterShortTableBits - dictShardBits)     // Number of shards in the table
-	betterShortTableShardSize = betterShortTableSize / betterShortTableShardCnt // Size of an individual shard
-)
-
-type prevEntry struct {
-	offset int32
-	prev   int32
-}
-
-// betterFastEncoder uses 2 tables, one for short matches (5 bytes) and one for long matches.
-// The long match table contains the previous entry with the same hash,
-// effectively making it a "chain" of length 2.
-// When we find a long match we choose between the two values and select the longest.
-// When we find a short match, after checking the long, we check if we can find a long at n+1
-// and that it is longer (lazy matching).
-type betterFastEncoder struct {
-	fastBase
-	table     [betterShortTableSize]tableEntry
-	longTable [betterLongTableSize]prevEntry
-}
-
-type betterFastEncoderDict struct {
-	betterFastEncoder
-	dictTable            []tableEntry
-	dictLongTable        []prevEntry
-	shortTableShardDirty [betterShortTableShardCnt]bool
-	longTableShardDirty  [betterLongTableShardCnt]bool
-	allDirty             bool
-}
-
-// Encode improves compression...
-func (e *betterFastEncoder) Encode(blk *blockEnc, src []byte) {
-	const (
-		// Input margin is the number of bytes we read (8)
-		// and the maximum we will read ahead (2)
-		inputMargin            = 8 + 2
-		minNonLiteralBlockSize = 16
-	)
-
-	// Protect against e.cur wraparound.
-	for e.cur >= e.bufferReset-int32(len(e.hist)) {
-		if len(e.hist) == 0 {
-			e.table = [betterShortTableSize]tableEntry{}
-			e.longTable = [betterLongTableSize]prevEntry{}
-			e.cur = e.maxMatchOff
-			break
-		}
-		// Shift down everything in the table that isn't already too far away.
-		minOff := e.cur + int32(len(e.hist)) - e.maxMatchOff
-		for i := range e.table[:] {
-			v := e.table[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.table[i].offset = v
-		}
-		for i := range e.longTable[:] {
-			v := e.longTable[i].offset
-			v2 := e.longTable[i].prev
-			if v < minOff {
-				v = 0
-				v2 = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-				if v2 < minOff {
-					v2 = 0
-				} else {
-					v2 = v2 - e.cur + e.maxMatchOff
-				}
-			}
-			e.longTable[i] = prevEntry{
-				offset: v,
-				prev:   v2,
-			}
-		}
-		e.cur = e.maxMatchOff
-		break
-	}
-	// Add block to history
-	s := e.addBlock(src)
-	blk.size = len(src)
-
-	// Check RLE first
-	if len(src) > zstdMinMatch {
-		ml := matchLen(src[1:], src)
-		if ml == len(src)-1 {
-			blk.literals = append(blk.literals, src[0])
-			blk.sequences = append(blk.sequences, seq{litLen: 1, matchLen: uint32(len(src)-1) - zstdMinMatch, offset: 1 + 3})
-			return
-		}
-	}
-
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Override src
-	src = e.hist
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 1.
-	const stepSize = 1
-
-	const kSearchStrength = 9
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		var t int32
-		// We allow the encoder to optionally turn off repeat offsets across blocks
-		canRepeat := len(blk.sequences) > 2
-		var matched, index0 int32
-
-		for {
-			if debugAsserts && canRepeat && offset1 == 0 {
-				panic("offset0 was 0")
-			}
-
-			nextHashL := hashLen(cv, betterLongTableBits, betterLongLen)
-			nextHashS := hashLen(cv, betterShortTableBits, betterShortLen)
-			candidateL := e.longTable[nextHashL]
-			candidateS := e.table[nextHashS]
-
-			const repOff = 1
-			repIndex := s - offset1 + repOff
-			off := s + e.cur
-			e.longTable[nextHashL] = prevEntry{offset: off, prev: candidateL.offset}
-			e.table[nextHashS] = tableEntry{offset: off, val: uint32(cv)}
-			index0 = s + 1
-
-			if canRepeat {
-				if repIndex >= 0 && load3232(src, repIndex) == uint32(cv>>(repOff*8)) {
-					// Consider history as well.
-					var seq seq
-					length := 4 + e.matchlen(s+4+repOff, repIndex+4, src)
-
-					seq.matchLen = uint32(length - zstdMinMatch)
-
-					// We might be able to match backwards.
-					// Extend as long as we can.
-					start := s + repOff
-					// We end the search early, so we don't risk 0 literals
-					// and have to do special offset treatment.
-					startLimit := nextEmit + 1
-
-					tMin := s - e.maxMatchOff
-					if tMin < 0 {
-						tMin = 0
-					}
-					for repIndex > tMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch-1 {
-						repIndex--
-						start--
-						seq.matchLen++
-					}
-					addLiterals(&seq, start)
-
-					// rep 0
-					seq.offset = 1
-					if debugSequences {
-						println("repeat sequence", seq, "next s:", s)
-					}
-					blk.sequences = append(blk.sequences, seq)
-
-					// Index match start+1 (long) -> s - 1
-					index0 := s + repOff
-					s += length + repOff
-
-					nextEmit = s
-					if s >= sLimit {
-						if debugEncoder {
-							println("repeat ended", s, length)
-
-						}
-						break encodeLoop
-					}
-					// Index skipped...
-					for index0 < s-1 {
-						cv0 := load6432(src, index0)
-						cv1 := cv0 >> 8
-						h0 := hashLen(cv0, betterLongTableBits, betterLongLen)
-						off := index0 + e.cur
-						e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-						e.table[hashLen(cv1, betterShortTableBits, betterShortLen)] = tableEntry{offset: off + 1, val: uint32(cv1)}
-						index0 += 2
-					}
-					cv = load6432(src, s)
-					continue
-				}
-				const repOff2 = 1
-
-				// We deviate from the reference encoder and also check offset 2.
-				// Still slower and not much better, so disabled.
-				// repIndex = s - offset2 + repOff2
-				if false && repIndex >= 0 && load6432(src, repIndex) == load6432(src, s+repOff) {
-					// Consider history as well.
-					var seq seq
-					length := 8 + e.matchlen(s+8+repOff2, repIndex+8, src)
-
-					seq.matchLen = uint32(length - zstdMinMatch)
-
-					// We might be able to match backwards.
-					// Extend as long as we can.
-					start := s + repOff2
-					// We end the search early, so we don't risk 0 literals
-					// and have to do special offset treatment.
-					startLimit := nextEmit + 1
-
-					tMin := s - e.maxMatchOff
-					if tMin < 0 {
-						tMin = 0
-					}
-					for repIndex > tMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch-1 {
-						repIndex--
-						start--
-						seq.matchLen++
-					}
-					addLiterals(&seq, start)
-
-					// rep 2
-					seq.offset = 2
-					if debugSequences {
-						println("repeat sequence 2", seq, "next s:", s)
-					}
-					blk.sequences = append(blk.sequences, seq)
-
-					s += length + repOff2
-					nextEmit = s
-					if s >= sLimit {
-						if debugEncoder {
-							println("repeat ended", s, length)
-
-						}
-						break encodeLoop
-					}
-
-					// Index skipped...
-					for index0 < s-1 {
-						cv0 := load6432(src, index0)
-						cv1 := cv0 >> 8
-						h0 := hashLen(cv0, betterLongTableBits, betterLongLen)
-						off := index0 + e.cur
-						e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-						e.table[hashLen(cv1, betterShortTableBits, betterShortLen)] = tableEntry{offset: off + 1, val: uint32(cv1)}
-						index0 += 2
-					}
-					cv = load6432(src, s)
-					// Swap offsets
-					offset1, offset2 = offset2, offset1
-					continue
-				}
-			}
-			// Find the offsets of our two matches.
-			coffsetL := candidateL.offset - e.cur
-			coffsetLP := candidateL.prev - e.cur
-
-			// Check if we have a long match.
-			if s-coffsetL < e.maxMatchOff && cv == load6432(src, coffsetL) {
-				// Found a long match, at least 8 bytes.
-				matched = e.matchlen(s+8, coffsetL+8, src) + 8
-				t = coffsetL
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugMatches {
-					println("long match")
-				}
-
-				if s-coffsetLP < e.maxMatchOff && cv == load6432(src, coffsetLP) {
-					// Found a long match, at least 8 bytes.
-					prevMatch := e.matchlen(s+8, coffsetLP+8, src) + 8
-					if prevMatch > matched {
-						matched = prevMatch
-						t = coffsetLP
-					}
-					if debugAsserts && s <= t {
-						panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-					}
-					if debugAsserts && s-t > e.maxMatchOff {
-						panic("s - t >e.maxMatchOff")
-					}
-					if debugMatches {
-						println("long match")
-					}
-				}
-				break
-			}
-
-			// Check if we have a long match on prev.
-			if s-coffsetLP < e.maxMatchOff && cv == load6432(src, coffsetLP) {
-				// Found a long match, at least 8 bytes.
-				matched = e.matchlen(s+8, coffsetLP+8, src) + 8
-				t = coffsetLP
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugMatches {
-					println("long match")
-				}
-				break
-			}
-
-			coffsetS := candidateS.offset - e.cur
-
-			// Check if we have a short match.
-			if s-coffsetS < e.maxMatchOff && uint32(cv) == candidateS.val {
-				// found a regular match
-				matched = e.matchlen(s+4, coffsetS+4, src) + 4
-
-				// See if we can find a long match at s+1
-				const checkAt = 1
-				cv := load6432(src, s+checkAt)
-				nextHashL = hashLen(cv, betterLongTableBits, betterLongLen)
-				candidateL = e.longTable[nextHashL]
-				coffsetL = candidateL.offset - e.cur
-
-				// We can store it, since we have at least a 4 byte match.
-				e.longTable[nextHashL] = prevEntry{offset: s + checkAt + e.cur, prev: candidateL.offset}
-				if s-coffsetL < e.maxMatchOff && cv == load6432(src, coffsetL) {
-					// Found a long match, at least 8 bytes.
-					matchedNext := e.matchlen(s+8+checkAt, coffsetL+8, src) + 8
-					if matchedNext > matched {
-						t = coffsetL
-						s += checkAt
-						matched = matchedNext
-						if debugMatches {
-							println("long match (after short)")
-						}
-						break
-					}
-				}
-
-				// Check prev long...
-				coffsetL = candidateL.prev - e.cur
-				if s-coffsetL < e.maxMatchOff && cv == load6432(src, coffsetL) {
-					// Found a long match, at least 8 bytes.
-					matchedNext := e.matchlen(s+8+checkAt, coffsetL+8, src) + 8
-					if matchedNext > matched {
-						t = coffsetL
-						s += checkAt
-						matched = matchedNext
-						if debugMatches {
-							println("prev long match (after short)")
-						}
-						break
-					}
-				}
-				t = coffsetS
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				if debugMatches {
-					println("short match")
-				}
-				break
-			}
-
-			// No match found, move forward in input.
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-
-		// Try to find a better match by searching for a long match at the end of the current best match
-		if s+matched < sLimit {
-			// Allow some bytes at the beginning to mismatch.
-			// Sweet spot is around 3 bytes, but depends on input.
-			// The skipped bytes are tested in Extend backwards,
-			// and still picked up as part of the match if they do.
-			const skipBeginning = 3
-
-			nextHashL := hashLen(load6432(src, s+matched), betterLongTableBits, betterLongLen)
-			s2 := s + skipBeginning
-			cv := load3232(src, s2)
-			candidateL := e.longTable[nextHashL]
-			coffsetL := candidateL.offset - e.cur - matched + skipBeginning
-			if coffsetL >= 0 && coffsetL < s2 && s2-coffsetL < e.maxMatchOff && cv == load3232(src, coffsetL) {
-				// Found a long match, at least 4 bytes.
-				matchedNext := e.matchlen(s2+4, coffsetL+4, src) + 4
-				if matchedNext > matched {
-					t = coffsetL
-					s = s2
-					matched = matchedNext
-					if debugMatches {
-						println("long match at end-of-match")
-					}
-				}
-			}
-
-			// Check prev long...
-			if true {
-				coffsetL = candidateL.prev - e.cur - matched + skipBeginning
-				if coffsetL >= 0 && coffsetL < s2 && s2-coffsetL < e.maxMatchOff && cv == load3232(src, coffsetL) {
-					// Found a long match, at least 4 bytes.
-					matchedNext := e.matchlen(s2+4, coffsetL+4, src) + 4
-					if matchedNext > matched {
-						t = coffsetL
-						s = s2
-						matched = matchedNext
-						if debugMatches {
-							println("prev long match at end-of-match")
-						}
-					}
-				}
-			}
-		}
-		// A match has been found. Update recent offsets.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && canRepeat && int(offset1) > len(src) {
-			panic("invalid offset")
-		}
-
-		// Extend the n-byte match as long as possible.
-		l := matched
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] && l < maxMatchLength {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-
-		// Index match start+1 (long) -> s - 1
-		off := index0 + e.cur
-		for index0 < s-1 {
-			cv0 := load6432(src, index0)
-			cv1 := cv0 >> 8
-			h0 := hashLen(cv0, betterLongTableBits, betterLongLen)
-			e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-			e.table[hashLen(cv1, betterShortTableBits, betterShortLen)] = tableEntry{offset: off + 1, val: uint32(cv1)}
-			index0 += 2
-			off += 2
-		}
-
-		cv = load6432(src, s)
-		if !canRepeat {
-			continue
-		}
-
-		// Check offset 2
-		for {
-			o2 := s - offset2
-			if load3232(src, o2) != uint32(cv) {
-				// Do regular search
-				break
-			}
-
-			// Store this, since we have it.
-			nextHashL := hashLen(cv, betterLongTableBits, betterLongLen)
-			nextHashS := hashLen(cv, betterShortTableBits, betterShortLen)
-
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			l := 4 + e.matchlen(s+4, o2+4, src)
-
-			e.longTable[nextHashL] = prevEntry{offset: s + e.cur, prev: e.longTable[nextHashL].offset}
-			e.table[nextHashS] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				// Finished
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	blk.recentOffsets[0] = uint32(offset1)
-	blk.recentOffsets[1] = uint32(offset2)
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-}
-
-// EncodeNoHist will encode a block with no history and no following blocks.
-// Most notable difference is that src will not be copied for history and
-// we do not need to check for max match length.
-func (e *betterFastEncoder) EncodeNoHist(blk *blockEnc, src []byte) {
-	e.ensureHist(len(src))
-	e.Encode(blk, src)
-}
-
-// Encode improves compression...
-func (e *betterFastEncoderDict) Encode(blk *blockEnc, src []byte) {
-	const (
-		// Input margin is the number of bytes we read (8)
-		// and the maximum we will read ahead (2)
-		inputMargin            = 8 + 2
-		minNonLiteralBlockSize = 16
-	)
-
-	// Protect against e.cur wraparound.
-	for e.cur >= e.bufferReset-int32(len(e.hist)) {
-		if len(e.hist) == 0 {
-			for i := range e.table[:] {
-				e.table[i] = tableEntry{}
-			}
-			for i := range e.longTable[:] {
-				e.longTable[i] = prevEntry{}
-			}
-			e.cur = e.maxMatchOff
-			e.allDirty = true
-			break
-		}
-		// Shift down everything in the table that isn't already too far away.
-		minOff := e.cur + int32(len(e.hist)) - e.maxMatchOff
-		for i := range e.table[:] {
-			v := e.table[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.table[i].offset = v
-		}
-		for i := range e.longTable[:] {
-			v := e.longTable[i].offset
-			v2 := e.longTable[i].prev
-			if v < minOff {
-				v = 0
-				v2 = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-				if v2 < minOff {
-					v2 = 0
-				} else {
-					v2 = v2 - e.cur + e.maxMatchOff
-				}
-			}
-			e.longTable[i] = prevEntry{
-				offset: v,
-				prev:   v2,
-			}
-		}
-		e.allDirty = true
-		e.cur = e.maxMatchOff
-		break
-	}
-
-	s := e.addBlock(src)
-	blk.size = len(src)
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Override src
-	src = e.hist
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 1.
-	const stepSize = 1
-
-	const kSearchStrength = 9
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		var t int32
-		// We allow the encoder to optionally turn off repeat offsets across blocks
-		canRepeat := len(blk.sequences) > 2
-		var matched, index0 int32
-
-		for {
-			if debugAsserts && canRepeat && offset1 == 0 {
-				panic("offset0 was 0")
-			}
-
-			nextHashL := hashLen(cv, betterLongTableBits, betterLongLen)
-			nextHashS := hashLen(cv, betterShortTableBits, betterShortLen)
-			candidateL := e.longTable[nextHashL]
-			candidateS := e.table[nextHashS]
-
-			const repOff = 1
-			repIndex := s - offset1 + repOff
-			off := s + e.cur
-			e.longTable[nextHashL] = prevEntry{offset: off, prev: candidateL.offset}
-			e.markLongShardDirty(nextHashL)
-			e.table[nextHashS] = tableEntry{offset: off, val: uint32(cv)}
-			e.markShortShardDirty(nextHashS)
-			index0 = s + 1
-
-			if canRepeat {
-				if repIndex >= 0 && load3232(src, repIndex) == uint32(cv>>(repOff*8)) {
-					// Consider history as well.
-					var seq seq
-					length := 4 + e.matchlen(s+4+repOff, repIndex+4, src)
-
-					seq.matchLen = uint32(length - zstdMinMatch)
-
-					// We might be able to match backwards.
-					// Extend as long as we can.
-					start := s + repOff
-					// We end the search early, so we don't risk 0 literals
-					// and have to do special offset treatment.
-					startLimit := nextEmit + 1
-
-					tMin := s - e.maxMatchOff
-					if tMin < 0 {
-						tMin = 0
-					}
-					for repIndex > tMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch-1 {
-						repIndex--
-						start--
-						seq.matchLen++
-					}
-					addLiterals(&seq, start)
-
-					// rep 0
-					seq.offset = 1
-					if debugSequences {
-						println("repeat sequence", seq, "next s:", s)
-					}
-					blk.sequences = append(blk.sequences, seq)
-
-					// Index match start+1 (long) -> s - 1
-					s += length + repOff
-
-					nextEmit = s
-					if s >= sLimit {
-						if debugEncoder {
-							println("repeat ended", s, length)
-
-						}
-						break encodeLoop
-					}
-					// Index skipped...
-					for index0 < s-1 {
-						cv0 := load6432(src, index0)
-						cv1 := cv0 >> 8
-						h0 := hashLen(cv0, betterLongTableBits, betterLongLen)
-						off := index0 + e.cur
-						e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-						e.markLongShardDirty(h0)
-						h1 := hashLen(cv1, betterShortTableBits, betterShortLen)
-						e.table[h1] = tableEntry{offset: off + 1, val: uint32(cv1)}
-						e.markShortShardDirty(h1)
-						index0 += 2
-					}
-					cv = load6432(src, s)
-					continue
-				}
-				const repOff2 = 1
-
-				// We deviate from the reference encoder and also check offset 2.
-				// Still slower and not much better, so disabled.
-				// repIndex = s - offset2 + repOff2
-				if false && repIndex >= 0 && load6432(src, repIndex) == load6432(src, s+repOff) {
-					// Consider history as well.
-					var seq seq
-					length := 8 + e.matchlen(s+8+repOff2, repIndex+8, src)
-
-					seq.matchLen = uint32(length - zstdMinMatch)
-
-					// We might be able to match backwards.
-					// Extend as long as we can.
-					start := s + repOff2
-					// We end the search early, so we don't risk 0 literals
-					// and have to do special offset treatment.
-					startLimit := nextEmit + 1
-
-					tMin := s - e.maxMatchOff
-					if tMin < 0 {
-						tMin = 0
-					}
-					for repIndex > tMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch-1 {
-						repIndex--
-						start--
-						seq.matchLen++
-					}
-					addLiterals(&seq, start)
-
-					// rep 2
-					seq.offset = 2
-					if debugSequences {
-						println("repeat sequence 2", seq, "next s:", s)
-					}
-					blk.sequences = append(blk.sequences, seq)
-
-					s += length + repOff2
-					nextEmit = s
-					if s >= sLimit {
-						if debugEncoder {
-							println("repeat ended", s, length)
-
-						}
-						break encodeLoop
-					}
-
-					// Index skipped...
-					for index0 < s-1 {
-						cv0 := load6432(src, index0)
-						cv1 := cv0 >> 8
-						h0 := hashLen(cv0, betterLongTableBits, betterLongLen)
-						off := index0 + e.cur
-						e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-						e.markLongShardDirty(h0)
-						h1 := hashLen(cv1, betterShortTableBits, betterShortLen)
-						e.table[h1] = tableEntry{offset: off + 1, val: uint32(cv1)}
-						e.markShortShardDirty(h1)
-						index0 += 2
-					}
-					cv = load6432(src, s)
-					// Swap offsets
-					offset1, offset2 = offset2, offset1
-					continue
-				}
-			}
-			// Find the offsets of our two matches.
-			coffsetL := candidateL.offset - e.cur
-			coffsetLP := candidateL.prev - e.cur
-
-			// Check if we have a long match.
-			if s-coffsetL < e.maxMatchOff && cv == load6432(src, coffsetL) {
-				// Found a long match, at least 8 bytes.
-				matched = e.matchlen(s+8, coffsetL+8, src) + 8
-				t = coffsetL
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugMatches {
-					println("long match")
-				}
-
-				if s-coffsetLP < e.maxMatchOff && cv == load6432(src, coffsetLP) {
-					// Found a long match, at least 8 bytes.
-					prevMatch := e.matchlen(s+8, coffsetLP+8, src) + 8
-					if prevMatch > matched {
-						matched = prevMatch
-						t = coffsetLP
-					}
-					if debugAsserts && s <= t {
-						panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-					}
-					if debugAsserts && s-t > e.maxMatchOff {
-						panic("s - t >e.maxMatchOff")
-					}
-					if debugMatches {
-						println("long match")
-					}
-				}
-				break
-			}
-
-			// Check if we have a long match on prev.
-			if s-coffsetLP < e.maxMatchOff && cv == load6432(src, coffsetLP) {
-				// Found a long match, at least 8 bytes.
-				matched = e.matchlen(s+8, coffsetLP+8, src) + 8
-				t = coffsetLP
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugMatches {
-					println("long match")
-				}
-				break
-			}
-
-			coffsetS := candidateS.offset - e.cur
-
-			// Check if we have a short match.
-			if s-coffsetS < e.maxMatchOff && uint32(cv) == candidateS.val {
-				// found a regular match
-				matched = e.matchlen(s+4, coffsetS+4, src) + 4
-
-				// See if we can find a long match at s+1
-				const checkAt = 1
-				cv := load6432(src, s+checkAt)
-				nextHashL = hashLen(cv, betterLongTableBits, betterLongLen)
-				candidateL = e.longTable[nextHashL]
-				coffsetL = candidateL.offset - e.cur
-
-				// We can store it, since we have at least a 4 byte match.
-				e.longTable[nextHashL] = prevEntry{offset: s + checkAt + e.cur, prev: candidateL.offset}
-				e.markLongShardDirty(nextHashL)
-				if s-coffsetL < e.maxMatchOff && cv == load6432(src, coffsetL) {
-					// Found a long match, at least 8 bytes.
-					matchedNext := e.matchlen(s+8+checkAt, coffsetL+8, src) + 8
-					if matchedNext > matched {
-						t = coffsetL
-						s += checkAt
-						matched = matchedNext
-						if debugMatches {
-							println("long match (after short)")
-						}
-						break
-					}
-				}
-
-				// Check prev long...
-				coffsetL = candidateL.prev - e.cur
-				if s-coffsetL < e.maxMatchOff && cv == load6432(src, coffsetL) {
-					// Found a long match, at least 8 bytes.
-					matchedNext := e.matchlen(s+8+checkAt, coffsetL+8, src) + 8
-					if matchedNext > matched {
-						t = coffsetL
-						s += checkAt
-						matched = matchedNext
-						if debugMatches {
-							println("prev long match (after short)")
-						}
-						break
-					}
-				}
-				t = coffsetS
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				if debugMatches {
-					println("short match")
-				}
-				break
-			}
-
-			// No match found, move forward in input.
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-		// Try to find a better match by searching for a long match at the end of the current best match
-		if s+matched < sLimit {
-			nextHashL := hashLen(load6432(src, s+matched), betterLongTableBits, betterLongLen)
-			cv := load3232(src, s)
-			candidateL := e.longTable[nextHashL]
-			coffsetL := candidateL.offset - e.cur - matched
-			if coffsetL >= 0 && coffsetL < s && s-coffsetL < e.maxMatchOff && cv == load3232(src, coffsetL) {
-				// Found a long match, at least 4 bytes.
-				matchedNext := e.matchlen(s+4, coffsetL+4, src) + 4
-				if matchedNext > matched {
-					t = coffsetL
-					matched = matchedNext
-					if debugMatches {
-						println("long match at end-of-match")
-					}
-				}
-			}
-
-			// Check prev long...
-			if true {
-				coffsetL = candidateL.prev - e.cur - matched
-				if coffsetL >= 0 && coffsetL < s && s-coffsetL < e.maxMatchOff && cv == load3232(src, coffsetL) {
-					// Found a long match, at least 4 bytes.
-					matchedNext := e.matchlen(s+4, coffsetL+4, src) + 4
-					if matchedNext > matched {
-						t = coffsetL
-						matched = matchedNext
-						if debugMatches {
-							println("prev long match at end-of-match")
-						}
-					}
-				}
-			}
-		}
-		// A match has been found. Update recent offsets.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && canRepeat && int(offset1) > len(src) {
-			panic("invalid offset")
-		}
-
-		// Extend the n-byte match as long as possible.
-		l := matched
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] && l < maxMatchLength {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-
-		// Index match start+1 (long) -> s - 1
-		off := index0 + e.cur
-		for index0 < s-1 {
-			cv0 := load6432(src, index0)
-			cv1 := cv0 >> 8
-			h0 := hashLen(cv0, betterLongTableBits, betterLongLen)
-			e.longTable[h0] = prevEntry{offset: off, prev: e.longTable[h0].offset}
-			e.markLongShardDirty(h0)
-			h1 := hashLen(cv1, betterShortTableBits, betterShortLen)
-			e.table[h1] = tableEntry{offset: off + 1, val: uint32(cv1)}
-			e.markShortShardDirty(h1)
-			index0 += 2
-			off += 2
-		}
-
-		cv = load6432(src, s)
-		if !canRepeat {
-			continue
-		}
-
-		// Check offset 2
-		for {
-			o2 := s - offset2
-			if load3232(src, o2) != uint32(cv) {
-				// Do regular search
-				break
-			}
-
-			// Store this, since we have it.
-			nextHashL := hashLen(cv, betterLongTableBits, betterLongLen)
-			nextHashS := hashLen(cv, betterShortTableBits, betterShortLen)
-
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			l := 4 + e.matchlen(s+4, o2+4, src)
-
-			e.longTable[nextHashL] = prevEntry{offset: s + e.cur, prev: e.longTable[nextHashL].offset}
-			e.markLongShardDirty(nextHashL)
-			e.table[nextHashS] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.markShortShardDirty(nextHashS)
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				// Finished
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	blk.recentOffsets[0] = uint32(offset1)
-	blk.recentOffsets[1] = uint32(offset2)
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-}
-
-// ResetDict will reset and set a dictionary if not nil
-func (e *betterFastEncoder) Reset(d *dict, singleBlock bool) {
-	e.resetBase(d, singleBlock)
-	if d != nil {
-		panic("betterFastEncoder: Reset with dict")
-	}
-}
-
-// ResetDict will reset and set a dictionary if not nil
-func (e *betterFastEncoderDict) Reset(d *dict, singleBlock bool) {
-	e.resetBase(d, singleBlock)
-	if d == nil {
-		return
-	}
-	// Init or copy dict table
-	if len(e.dictTable) != len(e.table) || d.id != e.lastDictID {
-		if len(e.dictTable) != len(e.table) {
-			e.dictTable = make([]tableEntry, len(e.table))
-		}
-		end := int32(len(d.content)) - 8 + e.maxMatchOff
-		for i := e.maxMatchOff; i < end; i += 4 {
-			const hashLog = betterShortTableBits
-
-			cv := load6432(d.content, i-e.maxMatchOff)
-			nextHash := hashLen(cv, hashLog, betterShortLen)      // 0 -> 4
-			nextHash1 := hashLen(cv>>8, hashLog, betterShortLen)  // 1 -> 5
-			nextHash2 := hashLen(cv>>16, hashLog, betterShortLen) // 2 -> 6
-			nextHash3 := hashLen(cv>>24, hashLog, betterShortLen) // 3 -> 7
-			e.dictTable[nextHash] = tableEntry{
-				val:    uint32(cv),
-				offset: i,
-			}
-			e.dictTable[nextHash1] = tableEntry{
-				val:    uint32(cv >> 8),
-				offset: i + 1,
-			}
-			e.dictTable[nextHash2] = tableEntry{
-				val:    uint32(cv >> 16),
-				offset: i + 2,
-			}
-			e.dictTable[nextHash3] = tableEntry{
-				val:    uint32(cv >> 24),
-				offset: i + 3,
-			}
-		}
-		e.lastDictID = d.id
-		e.allDirty = true
-	}
-
-	// Init or copy dict table
-	if len(e.dictLongTable) != len(e.longTable) || d.id != e.lastDictID {
-		if len(e.dictLongTable) != len(e.longTable) {
-			e.dictLongTable = make([]prevEntry, len(e.longTable))
-		}
-		if len(d.content) >= 8 {
-			cv := load6432(d.content, 0)
-			h := hashLen(cv, betterLongTableBits, betterLongLen)
-			e.dictLongTable[h] = prevEntry{
-				offset: e.maxMatchOff,
-				prev:   e.dictLongTable[h].offset,
-			}
-
-			end := int32(len(d.content)) - 8 + e.maxMatchOff
-			off := 8 // First to read
-			for i := e.maxMatchOff + 1; i < end; i++ {
-				cv = cv>>8 | (uint64(d.content[off]) << 56)
-				h := hashLen(cv, betterLongTableBits, betterLongLen)
-				e.dictLongTable[h] = prevEntry{
-					offset: i,
-					prev:   e.dictLongTable[h].offset,
-				}
-				off++
-			}
-		}
-		e.lastDictID = d.id
-		e.allDirty = true
-	}
-
-	// Reset table to initial state
-	{
-		dirtyShardCnt := 0
-		if !e.allDirty {
-			for i := range e.shortTableShardDirty {
-				if e.shortTableShardDirty[i] {
-					dirtyShardCnt++
-				}
-			}
-		}
-		const shardCnt = betterShortTableShardCnt
-		const shardSize = betterShortTableShardSize
-		if e.allDirty || dirtyShardCnt > shardCnt*4/6 {
-			copy(e.table[:], e.dictTable)
-			for i := range e.shortTableShardDirty {
-				e.shortTableShardDirty[i] = false
-			}
-		} else {
-			for i := range e.shortTableShardDirty {
-				if !e.shortTableShardDirty[i] {
-					continue
-				}
-
-				copy(e.table[i*shardSize:(i+1)*shardSize], e.dictTable[i*shardSize:(i+1)*shardSize])
-				e.shortTableShardDirty[i] = false
-			}
-		}
-	}
-	{
-		dirtyShardCnt := 0
-		if !e.allDirty {
-			for i := range e.shortTableShardDirty {
-				if e.shortTableShardDirty[i] {
-					dirtyShardCnt++
-				}
-			}
-		}
-		const shardCnt = betterLongTableShardCnt
-		const shardSize = betterLongTableShardSize
-		if e.allDirty || dirtyShardCnt > shardCnt*4/6 {
-			copy(e.longTable[:], e.dictLongTable)
-			for i := range e.longTableShardDirty {
-				e.longTableShardDirty[i] = false
-			}
-		} else {
-			for i := range e.longTableShardDirty {
-				if !e.longTableShardDirty[i] {
-					continue
-				}
-
-				copy(e.longTable[i*shardSize:(i+1)*shardSize], e.dictLongTable[i*shardSize:(i+1)*shardSize])
-				e.longTableShardDirty[i] = false
-			}
-		}
-	}
-	e.cur = e.maxMatchOff
-	e.allDirty = false
-}
-
-func (e *betterFastEncoderDict) markLongShardDirty(entryNum uint32) {
-	e.longTableShardDirty[entryNum/betterLongTableShardSize] = true
-}
-
-func (e *betterFastEncoderDict) markShortShardDirty(entryNum uint32) {
-	e.shortTableShardDirty[entryNum/betterShortTableShardSize] = true
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/enc_dfast.go b/vendor/github.com/klauspost/compress/zstd/enc_dfast.go
deleted file mode 100644
index d36be7bd8..000000000
--- a/vendor/github.com/klauspost/compress/zstd/enc_dfast.go
+++ /dev/null
@@ -1,1123 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import "fmt"
-
-const (
-	dFastLongTableBits = 17                      // Bits used in the long match table
-	dFastLongTableSize = 1 << dFastLongTableBits // Size of the table
-	dFastLongTableMask = dFastLongTableSize - 1  // Mask for table indices. Redundant, but can eliminate bounds checks.
-	dFastLongLen       = 8                       // Bytes used for table hash
-
-	dLongTableShardCnt  = 1 << (dFastLongTableBits - dictShardBits) // Number of shards in the table
-	dLongTableShardSize = dFastLongTableSize / tableShardCnt        // Size of an individual shard
-
-	dFastShortTableBits = tableBits                // Bits used in the short match table
-	dFastShortTableSize = 1 << dFastShortTableBits // Size of the table
-	dFastShortTableMask = dFastShortTableSize - 1  // Mask for table indices. Redundant, but can eliminate bounds checks.
-	dFastShortLen       = 5                        // Bytes used for table hash
-
-)
-
-type doubleFastEncoder struct {
-	fastEncoder
-	longTable [dFastLongTableSize]tableEntry
-}
-
-type doubleFastEncoderDict struct {
-	fastEncoderDict
-	longTable           [dFastLongTableSize]tableEntry
-	dictLongTable       []tableEntry
-	longTableShardDirty [dLongTableShardCnt]bool
-}
-
-// Encode mimmics functionality in zstd_dfast.c
-func (e *doubleFastEncoder) Encode(blk *blockEnc, src []byte) {
-	const (
-		// Input margin is the number of bytes we read (8)
-		// and the maximum we will read ahead (2)
-		inputMargin            = 8 + 2
-		minNonLiteralBlockSize = 16
-	)
-
-	// Protect against e.cur wraparound.
-	for e.cur >= e.bufferReset-int32(len(e.hist)) {
-		if len(e.hist) == 0 {
-			e.table = [dFastShortTableSize]tableEntry{}
-			e.longTable = [dFastLongTableSize]tableEntry{}
-			e.cur = e.maxMatchOff
-			break
-		}
-		// Shift down everything in the table that isn't already too far away.
-		minOff := e.cur + int32(len(e.hist)) - e.maxMatchOff
-		for i := range e.table[:] {
-			v := e.table[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.table[i].offset = v
-		}
-		for i := range e.longTable[:] {
-			v := e.longTable[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.longTable[i].offset = v
-		}
-		e.cur = e.maxMatchOff
-		break
-	}
-
-	s := e.addBlock(src)
-	blk.size = len(src)
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Override src
-	src = e.hist
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 1.
-	const stepSize = 1
-
-	const kSearchStrength = 8
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		var t int32
-		// We allow the encoder to optionally turn off repeat offsets across blocks
-		canRepeat := len(blk.sequences) > 2
-
-		for {
-			if debugAsserts && canRepeat && offset1 == 0 {
-				panic("offset0 was 0")
-			}
-
-			nextHashL := hashLen(cv, dFastLongTableBits, dFastLongLen)
-			nextHashS := hashLen(cv, dFastShortTableBits, dFastShortLen)
-			candidateL := e.longTable[nextHashL]
-			candidateS := e.table[nextHashS]
-
-			const repOff = 1
-			repIndex := s - offset1 + repOff
-			entry := tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.longTable[nextHashL] = entry
-			e.table[nextHashS] = entry
-
-			if canRepeat {
-				if repIndex >= 0 && load3232(src, repIndex) == uint32(cv>>(repOff*8)) {
-					// Consider history as well.
-					var seq seq
-					length := 4 + e.matchlen(s+4+repOff, repIndex+4, src)
-
-					seq.matchLen = uint32(length - zstdMinMatch)
-
-					// We might be able to match backwards.
-					// Extend as long as we can.
-					start := s + repOff
-					// We end the search early, so we don't risk 0 literals
-					// and have to do special offset treatment.
-					startLimit := nextEmit + 1
-
-					tMin := s - e.maxMatchOff
-					if tMin < 0 {
-						tMin = 0
-					}
-					for repIndex > tMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch-1 {
-						repIndex--
-						start--
-						seq.matchLen++
-					}
-					addLiterals(&seq, start)
-
-					// rep 0
-					seq.offset = 1
-					if debugSequences {
-						println("repeat sequence", seq, "next s:", s)
-					}
-					blk.sequences = append(blk.sequences, seq)
-					s += length + repOff
-					nextEmit = s
-					if s >= sLimit {
-						if debugEncoder {
-							println("repeat ended", s, length)
-
-						}
-						break encodeLoop
-					}
-					cv = load6432(src, s)
-					continue
-				}
-			}
-			// Find the offsets of our two matches.
-			coffsetL := s - (candidateL.offset - e.cur)
-			coffsetS := s - (candidateS.offset - e.cur)
-
-			// Check if we have a long match.
-			if coffsetL < e.maxMatchOff && uint32(cv) == candidateL.val {
-				// Found a long match, likely at least 8 bytes.
-				// Reference encoder checks all 8 bytes, we only check 4,
-				// but the likelihood of both the first 4 bytes and the hash matching should be enough.
-				t = candidateL.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugMatches {
-					println("long match")
-				}
-				break
-			}
-
-			// Check if we have a short match.
-			if coffsetS < e.maxMatchOff && uint32(cv) == candidateS.val {
-				// found a regular match
-				// See if we can find a long match at s+1
-				const checkAt = 1
-				cv := load6432(src, s+checkAt)
-				nextHashL = hashLen(cv, dFastLongTableBits, dFastLongLen)
-				candidateL = e.longTable[nextHashL]
-				coffsetL = s - (candidateL.offset - e.cur) + checkAt
-
-				// We can store it, since we have at least a 4 byte match.
-				e.longTable[nextHashL] = tableEntry{offset: s + checkAt + e.cur, val: uint32(cv)}
-				if coffsetL < e.maxMatchOff && uint32(cv) == candidateL.val {
-					// Found a long match, likely at least 8 bytes.
-					// Reference encoder checks all 8 bytes, we only check 4,
-					// but the likelihood of both the first 4 bytes and the hash matching should be enough.
-					t = candidateL.offset - e.cur
-					s += checkAt
-					if debugMatches {
-						println("long match (after short)")
-					}
-					break
-				}
-
-				t = candidateS.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				if debugMatches {
-					println("short match")
-				}
-				break
-			}
-
-			// No match found, move forward in input.
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-
-		// A 4-byte match has been found. Update recent offsets.
-		// We'll later see if more than 4 bytes.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && canRepeat && int(offset1) > len(src) {
-			panic("invalid offset")
-		}
-
-		// Extend the 4-byte match as long as possible.
-		l := e.matchlen(s+4, t+4, src) + 4
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] && l < maxMatchLength {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-
-		// Index match start+1 (long) and start+2 (short)
-		index0 := s - l + 1
-		// Index match end-2 (long) and end-1 (short)
-		index1 := s - 2
-
-		cv0 := load6432(src, index0)
-		cv1 := load6432(src, index1)
-		te0 := tableEntry{offset: index0 + e.cur, val: uint32(cv0)}
-		te1 := tableEntry{offset: index1 + e.cur, val: uint32(cv1)}
-		e.longTable[hashLen(cv0, dFastLongTableBits, dFastLongLen)] = te0
-		e.longTable[hashLen(cv1, dFastLongTableBits, dFastLongLen)] = te1
-		cv0 >>= 8
-		cv1 >>= 8
-		te0.offset++
-		te1.offset++
-		te0.val = uint32(cv0)
-		te1.val = uint32(cv1)
-		e.table[hashLen(cv0, dFastShortTableBits, dFastShortLen)] = te0
-		e.table[hashLen(cv1, dFastShortTableBits, dFastShortLen)] = te1
-
-		cv = load6432(src, s)
-
-		if !canRepeat {
-			continue
-		}
-
-		// Check offset 2
-		for {
-			o2 := s - offset2
-			if load3232(src, o2) != uint32(cv) {
-				// Do regular search
-				break
-			}
-
-			// Store this, since we have it.
-			nextHashS := hashLen(cv, dFastShortTableBits, dFastShortLen)
-			nextHashL := hashLen(cv, dFastLongTableBits, dFastLongLen)
-
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			l := 4 + e.matchlen(s+4, o2+4, src)
-
-			entry := tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.longTable[nextHashL] = entry
-			e.table[nextHashS] = entry
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				// Finished
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	blk.recentOffsets[0] = uint32(offset1)
-	blk.recentOffsets[1] = uint32(offset2)
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-}
-
-// EncodeNoHist will encode a block with no history and no following blocks.
-// Most notable difference is that src will not be copied for history and
-// we do not need to check for max match length.
-func (e *doubleFastEncoder) EncodeNoHist(blk *blockEnc, src []byte) {
-	const (
-		// Input margin is the number of bytes we read (8)
-		// and the maximum we will read ahead (2)
-		inputMargin            = 8 + 2
-		minNonLiteralBlockSize = 16
-	)
-
-	// Protect against e.cur wraparound.
-	if e.cur >= e.bufferReset {
-		for i := range e.table[:] {
-			e.table[i] = tableEntry{}
-		}
-		for i := range e.longTable[:] {
-			e.longTable[i] = tableEntry{}
-		}
-		e.cur = e.maxMatchOff
-	}
-
-	s := int32(0)
-	blk.size = len(src)
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Override src
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 1.
-	const stepSize = 1
-
-	const kSearchStrength = 8
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		var t int32
-		for {
-
-			nextHashL := hashLen(cv, dFastLongTableBits, dFastLongLen)
-			nextHashS := hashLen(cv, dFastShortTableBits, dFastShortLen)
-			candidateL := e.longTable[nextHashL]
-			candidateS := e.table[nextHashS]
-
-			const repOff = 1
-			repIndex := s - offset1 + repOff
-			entry := tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.longTable[nextHashL] = entry
-			e.table[nextHashS] = entry
-
-			if len(blk.sequences) > 2 {
-				if load3232(src, repIndex) == uint32(cv>>(repOff*8)) {
-					// Consider history as well.
-					var seq seq
-					//length := 4 + e.matchlen(s+4+repOff, repIndex+4, src)
-					length := 4 + int32(matchLen(src[s+4+repOff:], src[repIndex+4:]))
-
-					seq.matchLen = uint32(length - zstdMinMatch)
-
-					// We might be able to match backwards.
-					// Extend as long as we can.
-					start := s + repOff
-					// We end the search early, so we don't risk 0 literals
-					// and have to do special offset treatment.
-					startLimit := nextEmit + 1
-
-					tMin := s - e.maxMatchOff
-					if tMin < 0 {
-						tMin = 0
-					}
-					for repIndex > tMin && start > startLimit && src[repIndex-1] == src[start-1] {
-						repIndex--
-						start--
-						seq.matchLen++
-					}
-					addLiterals(&seq, start)
-
-					// rep 0
-					seq.offset = 1
-					if debugSequences {
-						println("repeat sequence", seq, "next s:", s)
-					}
-					blk.sequences = append(blk.sequences, seq)
-					s += length + repOff
-					nextEmit = s
-					if s >= sLimit {
-						if debugEncoder {
-							println("repeat ended", s, length)
-
-						}
-						break encodeLoop
-					}
-					cv = load6432(src, s)
-					continue
-				}
-			}
-			// Find the offsets of our two matches.
-			coffsetL := s - (candidateL.offset - e.cur)
-			coffsetS := s - (candidateS.offset - e.cur)
-
-			// Check if we have a long match.
-			if coffsetL < e.maxMatchOff && uint32(cv) == candidateL.val {
-				// Found a long match, likely at least 8 bytes.
-				// Reference encoder checks all 8 bytes, we only check 4,
-				// but the likelihood of both the first 4 bytes and the hash matching should be enough.
-				t = candidateL.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d). cur: %d", s, t, e.cur))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugMatches {
-					println("long match")
-				}
-				break
-			}
-
-			// Check if we have a short match.
-			if coffsetS < e.maxMatchOff && uint32(cv) == candidateS.val {
-				// found a regular match
-				// See if we can find a long match at s+1
-				const checkAt = 1
-				cv := load6432(src, s+checkAt)
-				nextHashL = hashLen(cv, dFastLongTableBits, dFastLongLen)
-				candidateL = e.longTable[nextHashL]
-				coffsetL = s - (candidateL.offset - e.cur) + checkAt
-
-				// We can store it, since we have at least a 4 byte match.
-				e.longTable[nextHashL] = tableEntry{offset: s + checkAt + e.cur, val: uint32(cv)}
-				if coffsetL < e.maxMatchOff && uint32(cv) == candidateL.val {
-					// Found a long match, likely at least 8 bytes.
-					// Reference encoder checks all 8 bytes, we only check 4,
-					// but the likelihood of both the first 4 bytes and the hash matching should be enough.
-					t = candidateL.offset - e.cur
-					s += checkAt
-					if debugMatches {
-						println("long match (after short)")
-					}
-					break
-				}
-
-				t = candidateS.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				if debugMatches {
-					println("short match")
-				}
-				break
-			}
-
-			// No match found, move forward in input.
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-
-		// A 4-byte match has been found. Update recent offsets.
-		// We'll later see if more than 4 bytes.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		// Extend the 4-byte match as long as possible.
-		//l := e.matchlen(s+4, t+4, src) + 4
-		l := int32(matchLen(src[s+4:], src[t+4:])) + 4
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-
-		// Index match start+1 (long) and start+2 (short)
-		index0 := s - l + 1
-		// Index match end-2 (long) and end-1 (short)
-		index1 := s - 2
-
-		cv0 := load6432(src, index0)
-		cv1 := load6432(src, index1)
-		te0 := tableEntry{offset: index0 + e.cur, val: uint32(cv0)}
-		te1 := tableEntry{offset: index1 + e.cur, val: uint32(cv1)}
-		e.longTable[hashLen(cv0, dFastLongTableBits, dFastLongLen)] = te0
-		e.longTable[hashLen(cv1, dFastLongTableBits, dFastLongLen)] = te1
-		cv0 >>= 8
-		cv1 >>= 8
-		te0.offset++
-		te1.offset++
-		te0.val = uint32(cv0)
-		te1.val = uint32(cv1)
-		e.table[hashLen(cv0, dFastShortTableBits, dFastShortLen)] = te0
-		e.table[hashLen(cv1, dFastShortTableBits, dFastShortLen)] = te1
-
-		cv = load6432(src, s)
-
-		if len(blk.sequences) <= 2 {
-			continue
-		}
-
-		// Check offset 2
-		for {
-			o2 := s - offset2
-			if load3232(src, o2) != uint32(cv) {
-				// Do regular search
-				break
-			}
-
-			// Store this, since we have it.
-			nextHashS := hashLen(cv1>>8, dFastShortTableBits, dFastShortLen)
-			nextHashL := hashLen(cv, dFastLongTableBits, dFastLongLen)
-
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			//l := 4 + e.matchlen(s+4, o2+4, src)
-			l := 4 + int32(matchLen(src[s+4:], src[o2+4:]))
-
-			entry := tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.longTable[nextHashL] = entry
-			e.table[nextHashS] = entry
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				// Finished
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-
-	// We do not store history, so we must offset e.cur to avoid false matches for next user.
-	if e.cur < e.bufferReset {
-		e.cur += int32(len(src))
-	}
-}
-
-// Encode will encode the content, with a dictionary if initialized for it.
-func (e *doubleFastEncoderDict) Encode(blk *blockEnc, src []byte) {
-	const (
-		// Input margin is the number of bytes we read (8)
-		// and the maximum we will read ahead (2)
-		inputMargin            = 8 + 2
-		minNonLiteralBlockSize = 16
-	)
-
-	// Protect against e.cur wraparound.
-	for e.cur >= e.bufferReset-int32(len(e.hist)) {
-		if len(e.hist) == 0 {
-			for i := range e.table[:] {
-				e.table[i] = tableEntry{}
-			}
-			for i := range e.longTable[:] {
-				e.longTable[i] = tableEntry{}
-			}
-			e.markAllShardsDirty()
-			e.cur = e.maxMatchOff
-			break
-		}
-		// Shift down everything in the table that isn't already too far away.
-		minOff := e.cur + int32(len(e.hist)) - e.maxMatchOff
-		for i := range e.table[:] {
-			v := e.table[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.table[i].offset = v
-		}
-		for i := range e.longTable[:] {
-			v := e.longTable[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.longTable[i].offset = v
-		}
-		e.markAllShardsDirty()
-		e.cur = e.maxMatchOff
-		break
-	}
-
-	s := e.addBlock(src)
-	blk.size = len(src)
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Override src
-	src = e.hist
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 1.
-	const stepSize = 1
-
-	const kSearchStrength = 8
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		var t int32
-		// We allow the encoder to optionally turn off repeat offsets across blocks
-		canRepeat := len(blk.sequences) > 2
-
-		for {
-			if debugAsserts && canRepeat && offset1 == 0 {
-				panic("offset0 was 0")
-			}
-
-			nextHashL := hashLen(cv, dFastLongTableBits, dFastLongLen)
-			nextHashS := hashLen(cv, dFastShortTableBits, dFastShortLen)
-			candidateL := e.longTable[nextHashL]
-			candidateS := e.table[nextHashS]
-
-			const repOff = 1
-			repIndex := s - offset1 + repOff
-			entry := tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.longTable[nextHashL] = entry
-			e.markLongShardDirty(nextHashL)
-			e.table[nextHashS] = entry
-			e.markShardDirty(nextHashS)
-
-			if canRepeat {
-				if repIndex >= 0 && load3232(src, repIndex) == uint32(cv>>(repOff*8)) {
-					// Consider history as well.
-					var seq seq
-					length := 4 + e.matchlen(s+4+repOff, repIndex+4, src)
-
-					seq.matchLen = uint32(length - zstdMinMatch)
-
-					// We might be able to match backwards.
-					// Extend as long as we can.
-					start := s + repOff
-					// We end the search early, so we don't risk 0 literals
-					// and have to do special offset treatment.
-					startLimit := nextEmit + 1
-
-					tMin := s - e.maxMatchOff
-					if tMin < 0 {
-						tMin = 0
-					}
-					for repIndex > tMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch-1 {
-						repIndex--
-						start--
-						seq.matchLen++
-					}
-					addLiterals(&seq, start)
-
-					// rep 0
-					seq.offset = 1
-					if debugSequences {
-						println("repeat sequence", seq, "next s:", s)
-					}
-					blk.sequences = append(blk.sequences, seq)
-					s += length + repOff
-					nextEmit = s
-					if s >= sLimit {
-						if debugEncoder {
-							println("repeat ended", s, length)
-
-						}
-						break encodeLoop
-					}
-					cv = load6432(src, s)
-					continue
-				}
-			}
-			// Find the offsets of our two matches.
-			coffsetL := s - (candidateL.offset - e.cur)
-			coffsetS := s - (candidateS.offset - e.cur)
-
-			// Check if we have a long match.
-			if coffsetL < e.maxMatchOff && uint32(cv) == candidateL.val {
-				// Found a long match, likely at least 8 bytes.
-				// Reference encoder checks all 8 bytes, we only check 4,
-				// but the likelihood of both the first 4 bytes and the hash matching should be enough.
-				t = candidateL.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugMatches {
-					println("long match")
-				}
-				break
-			}
-
-			// Check if we have a short match.
-			if coffsetS < e.maxMatchOff && uint32(cv) == candidateS.val {
-				// found a regular match
-				// See if we can find a long match at s+1
-				const checkAt = 1
-				cv := load6432(src, s+checkAt)
-				nextHashL = hashLen(cv, dFastLongTableBits, dFastLongLen)
-				candidateL = e.longTable[nextHashL]
-				coffsetL = s - (candidateL.offset - e.cur) + checkAt
-
-				// We can store it, since we have at least a 4 byte match.
-				e.longTable[nextHashL] = tableEntry{offset: s + checkAt + e.cur, val: uint32(cv)}
-				e.markLongShardDirty(nextHashL)
-				if coffsetL < e.maxMatchOff && uint32(cv) == candidateL.val {
-					// Found a long match, likely at least 8 bytes.
-					// Reference encoder checks all 8 bytes, we only check 4,
-					// but the likelihood of both the first 4 bytes and the hash matching should be enough.
-					t = candidateL.offset - e.cur
-					s += checkAt
-					if debugMatches {
-						println("long match (after short)")
-					}
-					break
-				}
-
-				t = candidateS.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				if debugMatches {
-					println("short match")
-				}
-				break
-			}
-
-			// No match found, move forward in input.
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-
-		// A 4-byte match has been found. Update recent offsets.
-		// We'll later see if more than 4 bytes.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && canRepeat && int(offset1) > len(src) {
-			panic("invalid offset")
-		}
-
-		// Extend the 4-byte match as long as possible.
-		l := e.matchlen(s+4, t+4, src) + 4
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] && l < maxMatchLength {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-
-		// Index match start+1 (long) and start+2 (short)
-		index0 := s - l + 1
-		// Index match end-2 (long) and end-1 (short)
-		index1 := s - 2
-
-		cv0 := load6432(src, index0)
-		cv1 := load6432(src, index1)
-		te0 := tableEntry{offset: index0 + e.cur, val: uint32(cv0)}
-		te1 := tableEntry{offset: index1 + e.cur, val: uint32(cv1)}
-		longHash1 := hashLen(cv0, dFastLongTableBits, dFastLongLen)
-		longHash2 := hashLen(cv1, dFastLongTableBits, dFastLongLen)
-		e.longTable[longHash1] = te0
-		e.longTable[longHash2] = te1
-		e.markLongShardDirty(longHash1)
-		e.markLongShardDirty(longHash2)
-		cv0 >>= 8
-		cv1 >>= 8
-		te0.offset++
-		te1.offset++
-		te0.val = uint32(cv0)
-		te1.val = uint32(cv1)
-		hashVal1 := hashLen(cv0, dFastShortTableBits, dFastShortLen)
-		hashVal2 := hashLen(cv1, dFastShortTableBits, dFastShortLen)
-		e.table[hashVal1] = te0
-		e.markShardDirty(hashVal1)
-		e.table[hashVal2] = te1
-		e.markShardDirty(hashVal2)
-
-		cv = load6432(src, s)
-
-		if !canRepeat {
-			continue
-		}
-
-		// Check offset 2
-		for {
-			o2 := s - offset2
-			if load3232(src, o2) != uint32(cv) {
-				// Do regular search
-				break
-			}
-
-			// Store this, since we have it.
-			nextHashL := hashLen(cv, dFastLongTableBits, dFastLongLen)
-			nextHashS := hashLen(cv, dFastShortTableBits, dFastShortLen)
-
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			l := 4 + e.matchlen(s+4, o2+4, src)
-
-			entry := tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.longTable[nextHashL] = entry
-			e.markLongShardDirty(nextHashL)
-			e.table[nextHashS] = entry
-			e.markShardDirty(nextHashS)
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				// Finished
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	blk.recentOffsets[0] = uint32(offset1)
-	blk.recentOffsets[1] = uint32(offset2)
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-	// If we encoded more than 64K mark all dirty.
-	if len(src) > 64<<10 {
-		e.markAllShardsDirty()
-	}
-}
-
-// ResetDict will reset and set a dictionary if not nil
-func (e *doubleFastEncoder) Reset(d *dict, singleBlock bool) {
-	e.fastEncoder.Reset(d, singleBlock)
-	if d != nil {
-		panic("doubleFastEncoder: Reset with dict not supported")
-	}
-}
-
-// ResetDict will reset and set a dictionary if not nil
-func (e *doubleFastEncoderDict) Reset(d *dict, singleBlock bool) {
-	allDirty := e.allDirty
-	e.fastEncoderDict.Reset(d, singleBlock)
-	if d == nil {
-		return
-	}
-
-	// Init or copy dict table
-	if len(e.dictLongTable) != len(e.longTable) || d.id != e.lastDictID {
-		if len(e.dictLongTable) != len(e.longTable) {
-			e.dictLongTable = make([]tableEntry, len(e.longTable))
-		}
-		if len(d.content) >= 8 {
-			cv := load6432(d.content, 0)
-			e.dictLongTable[hashLen(cv, dFastLongTableBits, dFastLongLen)] = tableEntry{
-				val:    uint32(cv),
-				offset: e.maxMatchOff,
-			}
-			end := int32(len(d.content)) - 8 + e.maxMatchOff
-			for i := e.maxMatchOff + 1; i < end; i++ {
-				cv = cv>>8 | (uint64(d.content[i-e.maxMatchOff+7]) << 56)
-				e.dictLongTable[hashLen(cv, dFastLongTableBits, dFastLongLen)] = tableEntry{
-					val:    uint32(cv),
-					offset: i,
-				}
-			}
-		}
-		e.lastDictID = d.id
-		allDirty = true
-	}
-	// Reset table to initial state
-	e.cur = e.maxMatchOff
-
-	dirtyShardCnt := 0
-	if !allDirty {
-		for i := range e.longTableShardDirty {
-			if e.longTableShardDirty[i] {
-				dirtyShardCnt++
-			}
-		}
-	}
-
-	if allDirty || dirtyShardCnt > dLongTableShardCnt/2 {
-		//copy(e.longTable[:], e.dictLongTable)
-		e.longTable = *(*[dFastLongTableSize]tableEntry)(e.dictLongTable)
-		for i := range e.longTableShardDirty {
-			e.longTableShardDirty[i] = false
-		}
-		return
-	}
-	for i := range e.longTableShardDirty {
-		if !e.longTableShardDirty[i] {
-			continue
-		}
-
-		// copy(e.longTable[i*dLongTableShardSize:(i+1)*dLongTableShardSize], e.dictLongTable[i*dLongTableShardSize:(i+1)*dLongTableShardSize])
-		*(*[dLongTableShardSize]tableEntry)(e.longTable[i*dLongTableShardSize:]) = *(*[dLongTableShardSize]tableEntry)(e.dictLongTable[i*dLongTableShardSize:])
-
-		e.longTableShardDirty[i] = false
-	}
-}
-
-func (e *doubleFastEncoderDict) markLongShardDirty(entryNum uint32) {
-	e.longTableShardDirty[entryNum/dLongTableShardSize] = true
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/enc_fast.go b/vendor/github.com/klauspost/compress/zstd/enc_fast.go
deleted file mode 100644
index f45a3da7d..000000000
--- a/vendor/github.com/klauspost/compress/zstd/enc_fast.go
+++ /dev/null
@@ -1,891 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"fmt"
-)
-
-const (
-	tableBits        = 15                               // Bits used in the table
-	tableSize        = 1 << tableBits                   // Size of the table
-	tableShardCnt    = 1 << (tableBits - dictShardBits) // Number of shards in the table
-	tableShardSize   = tableSize / tableShardCnt        // Size of an individual shard
-	tableFastHashLen = 6
-	tableMask        = tableSize - 1 // Mask for table indices. Redundant, but can eliminate bounds checks.
-	maxMatchLength   = 131074
-)
-
-type tableEntry struct {
-	val    uint32
-	offset int32
-}
-
-type fastEncoder struct {
-	fastBase
-	table [tableSize]tableEntry
-}
-
-type fastEncoderDict struct {
-	fastEncoder
-	dictTable       []tableEntry
-	tableShardDirty [tableShardCnt]bool
-	allDirty        bool
-}
-
-// Encode mimmics functionality in zstd_fast.c
-func (e *fastEncoder) Encode(blk *blockEnc, src []byte) {
-	const (
-		inputMargin            = 8
-		minNonLiteralBlockSize = 1 + 1 + inputMargin
-	)
-
-	// Protect against e.cur wraparound.
-	for e.cur >= e.bufferReset-int32(len(e.hist)) {
-		if len(e.hist) == 0 {
-			for i := range e.table[:] {
-				e.table[i] = tableEntry{}
-			}
-			e.cur = e.maxMatchOff
-			break
-		}
-		// Shift down everything in the table that isn't already too far away.
-		minOff := e.cur + int32(len(e.hist)) - e.maxMatchOff
-		for i := range e.table[:] {
-			v := e.table[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.table[i].offset = v
-		}
-		e.cur = e.maxMatchOff
-		break
-	}
-
-	s := e.addBlock(src)
-	blk.size = len(src)
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Override src
-	src = e.hist
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 2.
-	const stepSize = 2
-
-	// TEMPLATE
-	const hashLog = tableBits
-	// seems global, but would be nice to tweak.
-	const kSearchStrength = 6
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		// t will contain the match offset when we find one.
-		// When existing the search loop, we have already checked 4 bytes.
-		var t int32
-
-		// We will not use repeat offsets across blocks.
-		// By not using them for the first 3 matches
-		canRepeat := len(blk.sequences) > 2
-
-		for {
-			if debugAsserts && canRepeat && offset1 == 0 {
-				panic("offset0 was 0")
-			}
-
-			nextHash := hashLen(cv, hashLog, tableFastHashLen)
-			nextHash2 := hashLen(cv>>8, hashLog, tableFastHashLen)
-			candidate := e.table[nextHash]
-			candidate2 := e.table[nextHash2]
-			repIndex := s - offset1 + 2
-
-			e.table[nextHash] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.table[nextHash2] = tableEntry{offset: s + e.cur + 1, val: uint32(cv >> 8)}
-
-			if canRepeat && repIndex >= 0 && load3232(src, repIndex) == uint32(cv>>16) {
-				// Consider history as well.
-				var seq seq
-				length := 4 + e.matchlen(s+6, repIndex+4, src)
-				seq.matchLen = uint32(length - zstdMinMatch)
-
-				// We might be able to match backwards.
-				// Extend as long as we can.
-				start := s + 2
-				// We end the search early, so we don't risk 0 literals
-				// and have to do special offset treatment.
-				startLimit := nextEmit + 1
-
-				sMin := s - e.maxMatchOff
-				if sMin < 0 {
-					sMin = 0
-				}
-				for repIndex > sMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch {
-					repIndex--
-					start--
-					seq.matchLen++
-				}
-				addLiterals(&seq, start)
-
-				// rep 0
-				seq.offset = 1
-				if debugSequences {
-					println("repeat sequence", seq, "next s:", s)
-				}
-				blk.sequences = append(blk.sequences, seq)
-				s += length + 2
-				nextEmit = s
-				if s >= sLimit {
-					if debugEncoder {
-						println("repeat ended", s, length)
-
-					}
-					break encodeLoop
-				}
-				cv = load6432(src, s)
-				continue
-			}
-			coffset0 := s - (candidate.offset - e.cur)
-			coffset1 := s - (candidate2.offset - e.cur) + 1
-			if coffset0 < e.maxMatchOff && uint32(cv) == candidate.val {
-				// found a regular match
-				t = candidate.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				break
-			}
-
-			if coffset1 < e.maxMatchOff && uint32(cv>>8) == candidate2.val {
-				// found a regular match
-				t = candidate2.offset - e.cur
-				s++
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				break
-			}
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-		// A 4-byte match has been found. We'll later see if more than 4 bytes.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && canRepeat && int(offset1) > len(src) {
-			panic("invalid offset")
-		}
-
-		// Extend the 4-byte match as long as possible.
-		l := e.matchlen(s+4, t+4, src) + 4
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] && l < maxMatchLength {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence.
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		// Don't use repeat offsets
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-		cv = load6432(src, s)
-
-		// Check offset 2
-		if o2 := s - offset2; canRepeat && load3232(src, o2) == uint32(cv) {
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			l := 4 + e.matchlen(s+4, o2+4, src)
-
-			// Store this, since we have it.
-			nextHash := hashLen(cv, hashLog, tableFastHashLen)
-			e.table[nextHash] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				break encodeLoop
-			}
-			// Prepare next loop.
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	blk.recentOffsets[0] = uint32(offset1)
-	blk.recentOffsets[1] = uint32(offset2)
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-}
-
-// EncodeNoHist will encode a block with no history and no following blocks.
-// Most notable difference is that src will not be copied for history and
-// we do not need to check for max match length.
-func (e *fastEncoder) EncodeNoHist(blk *blockEnc, src []byte) {
-	const (
-		inputMargin            = 8
-		minNonLiteralBlockSize = 1 + 1 + inputMargin
-	)
-	if debugEncoder {
-		if len(src) > maxCompressedBlockSize {
-			panic("src too big")
-		}
-	}
-
-	// Protect against e.cur wraparound.
-	if e.cur >= e.bufferReset {
-		for i := range e.table[:] {
-			e.table[i] = tableEntry{}
-		}
-		e.cur = e.maxMatchOff
-	}
-
-	s := int32(0)
-	blk.size = len(src)
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 2.
-	const stepSize = 2
-
-	// TEMPLATE
-	const hashLog = tableBits
-	// seems global, but would be nice to tweak.
-	const kSearchStrength = 6
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		// t will contain the match offset when we find one.
-		// When existing the search loop, we have already checked 4 bytes.
-		var t int32
-
-		// We will not use repeat offsets across blocks.
-		// By not using them for the first 3 matches
-
-		for {
-			nextHash := hashLen(cv, hashLog, tableFastHashLen)
-			nextHash2 := hashLen(cv>>8, hashLog, tableFastHashLen)
-			candidate := e.table[nextHash]
-			candidate2 := e.table[nextHash2]
-			repIndex := s - offset1 + 2
-
-			e.table[nextHash] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.table[nextHash2] = tableEntry{offset: s + e.cur + 1, val: uint32(cv >> 8)}
-
-			if len(blk.sequences) > 2 && load3232(src, repIndex) == uint32(cv>>16) {
-				// Consider history as well.
-				var seq seq
-				length := 4 + e.matchlen(s+6, repIndex+4, src)
-
-				seq.matchLen = uint32(length - zstdMinMatch)
-
-				// We might be able to match backwards.
-				// Extend as long as we can.
-				start := s + 2
-				// We end the search early, so we don't risk 0 literals
-				// and have to do special offset treatment.
-				startLimit := nextEmit + 1
-
-				sMin := s - e.maxMatchOff
-				if sMin < 0 {
-					sMin = 0
-				}
-				for repIndex > sMin && start > startLimit && src[repIndex-1] == src[start-1] {
-					repIndex--
-					start--
-					seq.matchLen++
-				}
-				addLiterals(&seq, start)
-
-				// rep 0
-				seq.offset = 1
-				if debugSequences {
-					println("repeat sequence", seq, "next s:", s)
-				}
-				blk.sequences = append(blk.sequences, seq)
-				s += length + 2
-				nextEmit = s
-				if s >= sLimit {
-					if debugEncoder {
-						println("repeat ended", s, length)
-
-					}
-					break encodeLoop
-				}
-				cv = load6432(src, s)
-				continue
-			}
-			coffset0 := s - (candidate.offset - e.cur)
-			coffset1 := s - (candidate2.offset - e.cur) + 1
-			if coffset0 < e.maxMatchOff && uint32(cv) == candidate.val {
-				// found a regular match
-				t = candidate.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic(fmt.Sprintf("t (%d) < 0, candidate.offset: %d, e.cur: %d, coffset0: %d, e.maxMatchOff: %d", t, candidate.offset, e.cur, coffset0, e.maxMatchOff))
-				}
-				break
-			}
-
-			if coffset1 < e.maxMatchOff && uint32(cv>>8) == candidate2.val {
-				// found a regular match
-				t = candidate2.offset - e.cur
-				s++
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				break
-			}
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-		// A 4-byte match has been found. We'll later see if more than 4 bytes.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && t < 0 {
-			panic(fmt.Sprintf("t (%d) < 0 ", t))
-		}
-		// Extend the 4-byte match as long as possible.
-		l := e.matchlen(s+4, t+4, src) + 4
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence.
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		// Don't use repeat offsets
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-		cv = load6432(src, s)
-
-		// Check offset 2
-		if o2 := s - offset2; len(blk.sequences) > 2 && load3232(src, o2) == uint32(cv) {
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			l := 4 + e.matchlen(s+4, o2+4, src)
-
-			// Store this, since we have it.
-			nextHash := hashLen(cv, hashLog, tableFastHashLen)
-			e.table[nextHash] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				break encodeLoop
-			}
-			// Prepare next loop.
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-	// We do not store history, so we must offset e.cur to avoid false matches for next user.
-	if e.cur < e.bufferReset {
-		e.cur += int32(len(src))
-	}
-}
-
-// Encode will encode the content, with a dictionary if initialized for it.
-func (e *fastEncoderDict) Encode(blk *blockEnc, src []byte) {
-	const (
-		inputMargin            = 8
-		minNonLiteralBlockSize = 1 + 1 + inputMargin
-	)
-	if e.allDirty || len(src) > 32<<10 {
-		e.fastEncoder.Encode(blk, src)
-		e.allDirty = true
-		return
-	}
-	// Protect against e.cur wraparound.
-	for e.cur >= e.bufferReset-int32(len(e.hist)) {
-		if len(e.hist) == 0 {
-			e.table = [tableSize]tableEntry{}
-			e.cur = e.maxMatchOff
-			break
-		}
-		// Shift down everything in the table that isn't already too far away.
-		minOff := e.cur + int32(len(e.hist)) - e.maxMatchOff
-		for i := range e.table[:] {
-			v := e.table[i].offset
-			if v < minOff {
-				v = 0
-			} else {
-				v = v - e.cur + e.maxMatchOff
-			}
-			e.table[i].offset = v
-		}
-		e.cur = e.maxMatchOff
-		break
-	}
-
-	s := e.addBlock(src)
-	blk.size = len(src)
-	if len(src) < minNonLiteralBlockSize {
-		blk.extraLits = len(src)
-		blk.literals = blk.literals[:len(src)]
-		copy(blk.literals, src)
-		return
-	}
-
-	// Override src
-	src = e.hist
-	sLimit := int32(len(src)) - inputMargin
-	// stepSize is the number of bytes to skip on every main loop iteration.
-	// It should be >= 2.
-	const stepSize = 2
-
-	// TEMPLATE
-	const hashLog = tableBits
-	// seems global, but would be nice to tweak.
-	const kSearchStrength = 7
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := s
-	cv := load6432(src, s)
-
-	// Relative offsets
-	offset1 := int32(blk.recentOffsets[0])
-	offset2 := int32(blk.recentOffsets[1])
-
-	addLiterals := func(s *seq, until int32) {
-		if until == nextEmit {
-			return
-		}
-		blk.literals = append(blk.literals, src[nextEmit:until]...)
-		s.litLen = uint32(until - nextEmit)
-	}
-	if debugEncoder {
-		println("recent offsets:", blk.recentOffsets)
-	}
-
-encodeLoop:
-	for {
-		// t will contain the match offset when we find one.
-		// When existing the search loop, we have already checked 4 bytes.
-		var t int32
-
-		// We will not use repeat offsets across blocks.
-		// By not using them for the first 3 matches
-		canRepeat := len(blk.sequences) > 2
-
-		for {
-			if debugAsserts && canRepeat && offset1 == 0 {
-				panic("offset0 was 0")
-			}
-
-			nextHash := hashLen(cv, hashLog, tableFastHashLen)
-			nextHash2 := hashLen(cv>>8, hashLog, tableFastHashLen)
-			candidate := e.table[nextHash]
-			candidate2 := e.table[nextHash2]
-			repIndex := s - offset1 + 2
-
-			e.table[nextHash] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.markShardDirty(nextHash)
-			e.table[nextHash2] = tableEntry{offset: s + e.cur + 1, val: uint32(cv >> 8)}
-			e.markShardDirty(nextHash2)
-
-			if canRepeat && repIndex >= 0 && load3232(src, repIndex) == uint32(cv>>16) {
-				// Consider history as well.
-				var seq seq
-				length := 4 + e.matchlen(s+6, repIndex+4, src)
-
-				seq.matchLen = uint32(length - zstdMinMatch)
-
-				// We might be able to match backwards.
-				// Extend as long as we can.
-				start := s + 2
-				// We end the search early, so we don't risk 0 literals
-				// and have to do special offset treatment.
-				startLimit := nextEmit + 1
-
-				sMin := s - e.maxMatchOff
-				if sMin < 0 {
-					sMin = 0
-				}
-				for repIndex > sMin && start > startLimit && src[repIndex-1] == src[start-1] && seq.matchLen < maxMatchLength-zstdMinMatch {
-					repIndex--
-					start--
-					seq.matchLen++
-				}
-				addLiterals(&seq, start)
-
-				// rep 0
-				seq.offset = 1
-				if debugSequences {
-					println("repeat sequence", seq, "next s:", s)
-				}
-				blk.sequences = append(blk.sequences, seq)
-				s += length + 2
-				nextEmit = s
-				if s >= sLimit {
-					if debugEncoder {
-						println("repeat ended", s, length)
-
-					}
-					break encodeLoop
-				}
-				cv = load6432(src, s)
-				continue
-			}
-			coffset0 := s - (candidate.offset - e.cur)
-			coffset1 := s - (candidate2.offset - e.cur) + 1
-			if coffset0 < e.maxMatchOff && uint32(cv) == candidate.val {
-				// found a regular match
-				t = candidate.offset - e.cur
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				break
-			}
-
-			if coffset1 < e.maxMatchOff && uint32(cv>>8) == candidate2.val {
-				// found a regular match
-				t = candidate2.offset - e.cur
-				s++
-				if debugAsserts && s <= t {
-					panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-				}
-				if debugAsserts && s-t > e.maxMatchOff {
-					panic("s - t >e.maxMatchOff")
-				}
-				if debugAsserts && t < 0 {
-					panic("t<0")
-				}
-				break
-			}
-			s += stepSize + ((s - nextEmit) >> (kSearchStrength - 1))
-			if s >= sLimit {
-				break encodeLoop
-			}
-			cv = load6432(src, s)
-		}
-		// A 4-byte match has been found. We'll later see if more than 4 bytes.
-		offset2 = offset1
-		offset1 = s - t
-
-		if debugAsserts && s <= t {
-			panic(fmt.Sprintf("s (%d) <= t (%d)", s, t))
-		}
-
-		if debugAsserts && canRepeat && int(offset1) > len(src) {
-			panic("invalid offset")
-		}
-
-		// Extend the 4-byte match as long as possible.
-		l := e.matchlen(s+4, t+4, src) + 4
-
-		// Extend backwards
-		tMin := s - e.maxMatchOff
-		if tMin < 0 {
-			tMin = 0
-		}
-		for t > tMin && s > nextEmit && src[t-1] == src[s-1] && l < maxMatchLength {
-			s--
-			t--
-			l++
-		}
-
-		// Write our sequence.
-		var seq seq
-		seq.litLen = uint32(s - nextEmit)
-		seq.matchLen = uint32(l - zstdMinMatch)
-		if seq.litLen > 0 {
-			blk.literals = append(blk.literals, src[nextEmit:s]...)
-		}
-		// Don't use repeat offsets
-		seq.offset = uint32(s-t) + 3
-		s += l
-		if debugSequences {
-			println("sequence", seq, "next s:", s)
-		}
-		blk.sequences = append(blk.sequences, seq)
-		nextEmit = s
-		if s >= sLimit {
-			break encodeLoop
-		}
-		cv = load6432(src, s)
-
-		// Check offset 2
-		if o2 := s - offset2; canRepeat && load3232(src, o2) == uint32(cv) {
-			// We have at least 4 byte match.
-			// No need to check backwards. We come straight from a match
-			l := 4 + e.matchlen(s+4, o2+4, src)
-
-			// Store this, since we have it.
-			nextHash := hashLen(cv, hashLog, tableFastHashLen)
-			e.table[nextHash] = tableEntry{offset: s + e.cur, val: uint32(cv)}
-			e.markShardDirty(nextHash)
-			seq.matchLen = uint32(l) - zstdMinMatch
-			seq.litLen = 0
-			// Since litlen is always 0, this is offset 1.
-			seq.offset = 1
-			s += l
-			nextEmit = s
-			if debugSequences {
-				println("sequence", seq, "next s:", s)
-			}
-			blk.sequences = append(blk.sequences, seq)
-
-			// Swap offset 1 and 2.
-			offset1, offset2 = offset2, offset1
-			if s >= sLimit {
-				break encodeLoop
-			}
-			// Prepare next loop.
-			cv = load6432(src, s)
-		}
-	}
-
-	if int(nextEmit) < len(src) {
-		blk.literals = append(blk.literals, src[nextEmit:]...)
-		blk.extraLits = len(src) - int(nextEmit)
-	}
-	blk.recentOffsets[0] = uint32(offset1)
-	blk.recentOffsets[1] = uint32(offset2)
-	if debugEncoder {
-		println("returning, recent offsets:", blk.recentOffsets, "extra literals:", blk.extraLits)
-	}
-}
-
-// ResetDict will reset and set a dictionary if not nil
-func (e *fastEncoder) Reset(d *dict, singleBlock bool) {
-	e.resetBase(d, singleBlock)
-	if d != nil {
-		panic("fastEncoder: Reset with dict")
-	}
-}
-
-// ResetDict will reset and set a dictionary if not nil
-func (e *fastEncoderDict) Reset(d *dict, singleBlock bool) {
-	e.resetBase(d, singleBlock)
-	if d == nil {
-		return
-	}
-
-	// Init or copy dict table
-	if len(e.dictTable) != len(e.table) || d.id != e.lastDictID {
-		if len(e.dictTable) != len(e.table) {
-			e.dictTable = make([]tableEntry, len(e.table))
-		}
-		if true {
-			end := e.maxMatchOff + int32(len(d.content)) - 8
-			for i := e.maxMatchOff; i < end; i += 2 {
-				const hashLog = tableBits
-
-				cv := load6432(d.content, i-e.maxMatchOff)
-				nextHash := hashLen(cv, hashLog, tableFastHashLen)     // 0 -> 6
-				nextHash1 := hashLen(cv>>8, hashLog, tableFastHashLen) // 1 -> 7
-				e.dictTable[nextHash] = tableEntry{
-					val:    uint32(cv),
-					offset: i,
-				}
-				e.dictTable[nextHash1] = tableEntry{
-					val:    uint32(cv >> 8),
-					offset: i + 1,
-				}
-			}
-		}
-		e.lastDictID = d.id
-		e.allDirty = true
-	}
-
-	e.cur = e.maxMatchOff
-	dirtyShardCnt := 0
-	if !e.allDirty {
-		for i := range e.tableShardDirty {
-			if e.tableShardDirty[i] {
-				dirtyShardCnt++
-			}
-		}
-	}
-
-	const shardCnt = tableShardCnt
-	const shardSize = tableShardSize
-	if e.allDirty || dirtyShardCnt > shardCnt*4/6 {
-		//copy(e.table[:], e.dictTable)
-		e.table = *(*[tableSize]tableEntry)(e.dictTable)
-		for i := range e.tableShardDirty {
-			e.tableShardDirty[i] = false
-		}
-		e.allDirty = false
-		return
-	}
-	for i := range e.tableShardDirty {
-		if !e.tableShardDirty[i] {
-			continue
-		}
-
-		//copy(e.table[i*shardSize:(i+1)*shardSize], e.dictTable[i*shardSize:(i+1)*shardSize])
-		*(*[shardSize]tableEntry)(e.table[i*shardSize:]) = *(*[shardSize]tableEntry)(e.dictTable[i*shardSize:])
-		e.tableShardDirty[i] = false
-	}
-	e.allDirty = false
-}
-
-func (e *fastEncoderDict) markAllShardsDirty() {
-	e.allDirty = true
-}
-
-func (e *fastEncoderDict) markShardDirty(entryNum uint32) {
-	e.tableShardDirty[entryNum/tableShardSize] = true
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/encoder.go b/vendor/github.com/klauspost/compress/zstd/encoder.go
deleted file mode 100644
index 8f8223cd3..000000000
--- a/vendor/github.com/klauspost/compress/zstd/encoder.go
+++ /dev/null
@@ -1,642 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"crypto/rand"
-	"errors"
-	"fmt"
-	"io"
-	"math"
-	rdebug "runtime/debug"
-	"sync"
-
-	"github.com/klauspost/compress/zstd/internal/xxhash"
-)
-
-// Encoder provides encoding to Zstandard.
-// An Encoder can be used for either compressing a stream via the
-// io.WriteCloser interface supported by the Encoder or as multiple independent
-// tasks via the EncodeAll function.
-// Smaller encodes are encouraged to use the EncodeAll function.
-// Use NewWriter to create a new instance.
-type Encoder struct {
-	o        encoderOptions
-	encoders chan encoder
-	state    encoderState
-	init     sync.Once
-}
-
-type encoder interface {
-	Encode(blk *blockEnc, src []byte)
-	EncodeNoHist(blk *blockEnc, src []byte)
-	Block() *blockEnc
-	CRC() *xxhash.Digest
-	AppendCRC([]byte) []byte
-	WindowSize(size int64) int32
-	UseBlock(*blockEnc)
-	Reset(d *dict, singleBlock bool)
-}
-
-type encoderState struct {
-	w                io.Writer
-	filling          []byte
-	current          []byte
-	previous         []byte
-	encoder          encoder
-	writing          *blockEnc
-	err              error
-	writeErr         error
-	nWritten         int64
-	nInput           int64
-	frameContentSize int64
-	headerWritten    bool
-	eofWritten       bool
-	fullFrameWritten bool
-
-	// This waitgroup indicates an encode is running.
-	wg sync.WaitGroup
-	// This waitgroup indicates we have a block encoding/writing.
-	wWg sync.WaitGroup
-}
-
-// NewWriter will create a new Zstandard encoder.
-// If the encoder will be used for encoding blocks a nil writer can be used.
-func NewWriter(w io.Writer, opts ...EOption) (*Encoder, error) {
-	initPredefined()
-	var e Encoder
-	e.o.setDefault()
-	for _, o := range opts {
-		err := o(&e.o)
-		if err != nil {
-			return nil, err
-		}
-	}
-	if w != nil {
-		e.Reset(w)
-	}
-	return &e, nil
-}
-
-func (e *Encoder) initialize() {
-	if e.o.concurrent == 0 {
-		e.o.setDefault()
-	}
-	e.encoders = make(chan encoder, e.o.concurrent)
-	for i := 0; i < e.o.concurrent; i++ {
-		enc := e.o.encoder()
-		e.encoders <- enc
-	}
-}
-
-// Reset will re-initialize the writer and new writes will encode to the supplied writer
-// as a new, independent stream.
-func (e *Encoder) Reset(w io.Writer) {
-	s := &e.state
-	s.wg.Wait()
-	s.wWg.Wait()
-	if cap(s.filling) == 0 {
-		s.filling = make([]byte, 0, e.o.blockSize)
-	}
-	if e.o.concurrent > 1 {
-		if cap(s.current) == 0 {
-			s.current = make([]byte, 0, e.o.blockSize)
-		}
-		if cap(s.previous) == 0 {
-			s.previous = make([]byte, 0, e.o.blockSize)
-		}
-		s.current = s.current[:0]
-		s.previous = s.previous[:0]
-		if s.writing == nil {
-			s.writing = &blockEnc{lowMem: e.o.lowMem}
-			s.writing.init()
-		}
-		s.writing.initNewEncode()
-	}
-	if s.encoder == nil {
-		s.encoder = e.o.encoder()
-	}
-	s.filling = s.filling[:0]
-	s.encoder.Reset(e.o.dict, false)
-	s.headerWritten = false
-	s.eofWritten = false
-	s.fullFrameWritten = false
-	s.w = w
-	s.err = nil
-	s.nWritten = 0
-	s.nInput = 0
-	s.writeErr = nil
-	s.frameContentSize = 0
-}
-
-// ResetContentSize will reset and set a content size for the next stream.
-// If the bytes written does not match the size given an error will be returned
-// when calling Close().
-// This is removed when Reset is called.
-// Sizes <= 0 results in no content size set.
-func (e *Encoder) ResetContentSize(w io.Writer, size int64) {
-	e.Reset(w)
-	if size >= 0 {
-		e.state.frameContentSize = size
-	}
-}
-
-// Write data to the encoder.
-// Input data will be buffered and as the buffer fills up
-// content will be compressed and written to the output.
-// When done writing, use Close to flush the remaining output
-// and write CRC if requested.
-func (e *Encoder) Write(p []byte) (n int, err error) {
-	s := &e.state
-	if s.eofWritten {
-		return 0, ErrEncoderClosed
-	}
-	for len(p) > 0 {
-		if len(p)+len(s.filling) < e.o.blockSize {
-			if e.o.crc {
-				_, _ = s.encoder.CRC().Write(p)
-			}
-			s.filling = append(s.filling, p...)
-			return n + len(p), nil
-		}
-		add := p
-		if len(p)+len(s.filling) > e.o.blockSize {
-			add = add[:e.o.blockSize-len(s.filling)]
-		}
-		if e.o.crc {
-			_, _ = s.encoder.CRC().Write(add)
-		}
-		s.filling = append(s.filling, add...)
-		p = p[len(add):]
-		n += len(add)
-		if len(s.filling) < e.o.blockSize {
-			return n, nil
-		}
-		err := e.nextBlock(false)
-		if err != nil {
-			return n, err
-		}
-		if debugAsserts && len(s.filling) > 0 {
-			panic(len(s.filling))
-		}
-	}
-	return n, nil
-}
-
-// nextBlock will synchronize and start compressing input in e.state.filling.
-// If an error has occurred during encoding it will be returned.
-func (e *Encoder) nextBlock(final bool) error {
-	s := &e.state
-	// Wait for current block.
-	s.wg.Wait()
-	if s.err != nil {
-		return s.err
-	}
-	if len(s.filling) > e.o.blockSize {
-		return fmt.Errorf("block > maxStoreBlockSize")
-	}
-	if !s.headerWritten {
-		// If we have a single block encode, do a sync compression.
-		if final && len(s.filling) == 0 && !e.o.fullZero {
-			s.headerWritten = true
-			s.fullFrameWritten = true
-			s.eofWritten = true
-			return nil
-		}
-		if final && len(s.filling) > 0 {
-			s.current = e.encodeAll(s.encoder, s.filling, s.current[:0])
-			var n2 int
-			n2, s.err = s.w.Write(s.current)
-			if s.err != nil {
-				return s.err
-			}
-			s.nWritten += int64(n2)
-			s.nInput += int64(len(s.filling))
-			s.current = s.current[:0]
-			s.filling = s.filling[:0]
-			s.headerWritten = true
-			s.fullFrameWritten = true
-			s.eofWritten = true
-			return nil
-		}
-
-		var tmp [maxHeaderSize]byte
-		fh := frameHeader{
-			ContentSize:   uint64(s.frameContentSize),
-			WindowSize:    uint32(s.encoder.WindowSize(s.frameContentSize)),
-			SingleSegment: false,
-			Checksum:      e.o.crc,
-			DictID:        e.o.dict.ID(),
-		}
-
-		dst := fh.appendTo(tmp[:0])
-		s.headerWritten = true
-		s.wWg.Wait()
-		var n2 int
-		n2, s.err = s.w.Write(dst)
-		if s.err != nil {
-			return s.err
-		}
-		s.nWritten += int64(n2)
-	}
-	if s.eofWritten {
-		// Ensure we only write it once.
-		final = false
-	}
-
-	if len(s.filling) == 0 {
-		// Final block, but no data.
-		if final {
-			enc := s.encoder
-			blk := enc.Block()
-			blk.reset(nil)
-			blk.last = true
-			blk.encodeRaw(nil)
-			s.wWg.Wait()
-			_, s.err = s.w.Write(blk.output)
-			s.nWritten += int64(len(blk.output))
-			s.eofWritten = true
-		}
-		return s.err
-	}
-
-	// SYNC:
-	if e.o.concurrent == 1 {
-		src := s.filling
-		s.nInput += int64(len(s.filling))
-		if debugEncoder {
-			println("Adding sync block,", len(src), "bytes, final:", final)
-		}
-		enc := s.encoder
-		blk := enc.Block()
-		blk.reset(nil)
-		enc.Encode(blk, src)
-		blk.last = final
-		if final {
-			s.eofWritten = true
-		}
-
-		s.err = blk.encode(src, e.o.noEntropy, !e.o.allLitEntropy)
-		if s.err != nil {
-			return s.err
-		}
-		_, s.err = s.w.Write(blk.output)
-		s.nWritten += int64(len(blk.output))
-		s.filling = s.filling[:0]
-		return s.err
-	}
-
-	// Move blocks forward.
-	s.filling, s.current, s.previous = s.previous[:0], s.filling, s.current
-	s.nInput += int64(len(s.current))
-	s.wg.Add(1)
-	if final {
-		s.eofWritten = true
-	}
-	go func(src []byte) {
-		if debugEncoder {
-			println("Adding block,", len(src), "bytes, final:", final)
-		}
-		defer func() {
-			if r := recover(); r != nil {
-				s.err = fmt.Errorf("panic while encoding: %v", r)
-				rdebug.PrintStack()
-			}
-			s.wg.Done()
-		}()
-		enc := s.encoder
-		blk := enc.Block()
-		enc.Encode(blk, src)
-		blk.last = final
-		// Wait for pending writes.
-		s.wWg.Wait()
-		if s.writeErr != nil {
-			s.err = s.writeErr
-			return
-		}
-		// Transfer encoders from previous write block.
-		blk.swapEncoders(s.writing)
-		// Transfer recent offsets to next.
-		enc.UseBlock(s.writing)
-		s.writing = blk
-		s.wWg.Add(1)
-		go func() {
-			defer func() {
-				if r := recover(); r != nil {
-					s.writeErr = fmt.Errorf("panic while encoding/writing: %v", r)
-					rdebug.PrintStack()
-				}
-				s.wWg.Done()
-			}()
-			s.writeErr = blk.encode(src, e.o.noEntropy, !e.o.allLitEntropy)
-			if s.writeErr != nil {
-				return
-			}
-			_, s.writeErr = s.w.Write(blk.output)
-			s.nWritten += int64(len(blk.output))
-		}()
-	}(s.current)
-	return nil
-}
-
-// ReadFrom reads data from r until EOF or error.
-// The return value n is the number of bytes read.
-// Any error except io.EOF encountered during the read is also returned.
-//
-// The Copy function uses ReaderFrom if available.
-func (e *Encoder) ReadFrom(r io.Reader) (n int64, err error) {
-	if debugEncoder {
-		println("Using ReadFrom")
-	}
-
-	// Flush any current writes.
-	if len(e.state.filling) > 0 {
-		if err := e.nextBlock(false); err != nil {
-			return 0, err
-		}
-	}
-	e.state.filling = e.state.filling[:e.o.blockSize]
-	src := e.state.filling
-	for {
-		n2, err := r.Read(src)
-		if e.o.crc {
-			_, _ = e.state.encoder.CRC().Write(src[:n2])
-		}
-		// src is now the unfilled part...
-		src = src[n2:]
-		n += int64(n2)
-		switch err {
-		case io.EOF:
-			e.state.filling = e.state.filling[:len(e.state.filling)-len(src)]
-			if debugEncoder {
-				println("ReadFrom: got EOF final block:", len(e.state.filling))
-			}
-			return n, nil
-		case nil:
-		default:
-			if debugEncoder {
-				println("ReadFrom: got error:", err)
-			}
-			e.state.err = err
-			return n, err
-		}
-		if len(src) > 0 {
-			if debugEncoder {
-				println("ReadFrom: got space left in source:", len(src))
-			}
-			continue
-		}
-		err = e.nextBlock(false)
-		if err != nil {
-			return n, err
-		}
-		e.state.filling = e.state.filling[:e.o.blockSize]
-		src = e.state.filling
-	}
-}
-
-// Flush will send the currently written data to output
-// and block until everything has been written.
-// This should only be used on rare occasions where pushing the currently queued data is critical.
-func (e *Encoder) Flush() error {
-	s := &e.state
-	if len(s.filling) > 0 {
-		err := e.nextBlock(false)
-		if err != nil {
-			// Ignore Flush after Close.
-			if errors.Is(s.err, ErrEncoderClosed) {
-				return nil
-			}
-			return err
-		}
-	}
-	s.wg.Wait()
-	s.wWg.Wait()
-	if s.err != nil {
-		// Ignore Flush after Close.
-		if errors.Is(s.err, ErrEncoderClosed) {
-			return nil
-		}
-		return s.err
-	}
-	return s.writeErr
-}
-
-// Close will flush the final output and close the stream.
-// The function will block until everything has been written.
-// The Encoder can still be re-used after calling this.
-func (e *Encoder) Close() error {
-	s := &e.state
-	if s.encoder == nil {
-		return nil
-	}
-	err := e.nextBlock(true)
-	if err != nil {
-		if errors.Is(s.err, ErrEncoderClosed) {
-			return nil
-		}
-		return err
-	}
-	if s.frameContentSize > 0 {
-		if s.nInput != s.frameContentSize {
-			return fmt.Errorf("frame content size %d given, but %d bytes was written", s.frameContentSize, s.nInput)
-		}
-	}
-	if e.state.fullFrameWritten {
-		return s.err
-	}
-	s.wg.Wait()
-	s.wWg.Wait()
-
-	if s.err != nil {
-		return s.err
-	}
-	if s.writeErr != nil {
-		return s.writeErr
-	}
-
-	// Write CRC
-	if e.o.crc && s.err == nil {
-		// heap alloc.
-		var tmp [4]byte
-		_, s.err = s.w.Write(s.encoder.AppendCRC(tmp[:0]))
-		s.nWritten += 4
-	}
-
-	// Add padding with content from crypto/rand.Reader
-	if s.err == nil && e.o.pad > 0 {
-		add := calcSkippableFrame(s.nWritten, int64(e.o.pad))
-		frame, err := skippableFrame(s.filling[:0], add, rand.Reader)
-		if err != nil {
-			return err
-		}
-		_, s.err = s.w.Write(frame)
-	}
-	if s.err == nil {
-		s.err = ErrEncoderClosed
-		return nil
-	}
-
-	return s.err
-}
-
-// EncodeAll will encode all input in src and append it to dst.
-// This function can be called concurrently, but each call will only run on a single goroutine.
-// If empty input is given, nothing is returned, unless WithZeroFrames is specified.
-// Encoded blocks can be concatenated and the result will be the combined input stream.
-// Data compressed with EncodeAll can be decoded with the Decoder,
-// using either a stream or DecodeAll.
-func (e *Encoder) EncodeAll(src, dst []byte) []byte {
-	e.init.Do(e.initialize)
-	enc := <-e.encoders
-	defer func() {
-		e.encoders <- enc
-	}()
-	return e.encodeAll(enc, src, dst)
-}
-
-func (e *Encoder) encodeAll(enc encoder, src, dst []byte) []byte {
-	if len(src) == 0 {
-		if e.o.fullZero {
-			// Add frame header.
-			fh := frameHeader{
-				ContentSize:   0,
-				WindowSize:    MinWindowSize,
-				SingleSegment: true,
-				// Adding a checksum would be a waste of space.
-				Checksum: false,
-				DictID:   0,
-			}
-			dst = fh.appendTo(dst)
-
-			// Write raw block as last one only.
-			var blk blockHeader
-			blk.setSize(0)
-			blk.setType(blockTypeRaw)
-			blk.setLast(true)
-			dst = blk.appendTo(dst)
-		}
-		return dst
-	}
-
-	// Use single segments when above minimum window and below window size.
-	single := len(src) <= e.o.windowSize && len(src) > MinWindowSize
-	if e.o.single != nil {
-		single = *e.o.single
-	}
-	fh := frameHeader{
-		ContentSize:   uint64(len(src)),
-		WindowSize:    uint32(enc.WindowSize(int64(len(src)))),
-		SingleSegment: single,
-		Checksum:      e.o.crc,
-		DictID:        e.o.dict.ID(),
-	}
-
-	// If less than 1MB, allocate a buffer up front.
-	if len(dst) == 0 && cap(dst) == 0 && len(src) < 1<<20 && !e.o.lowMem {
-		dst = make([]byte, 0, len(src))
-	}
-	dst = fh.appendTo(dst)
-
-	// If we can do everything in one block, prefer that.
-	if len(src) <= e.o.blockSize {
-		enc.Reset(e.o.dict, true)
-		// Slightly faster with no history and everything in one block.
-		if e.o.crc {
-			_, _ = enc.CRC().Write(src)
-		}
-		blk := enc.Block()
-		blk.last = true
-		if e.o.dict == nil {
-			enc.EncodeNoHist(blk, src)
-		} else {
-			enc.Encode(blk, src)
-		}
-
-		// If we got the exact same number of literals as input,
-		// assume the literals cannot be compressed.
-		oldout := blk.output
-		// Output directly to dst
-		blk.output = dst
-
-		err := blk.encode(src, e.o.noEntropy, !e.o.allLitEntropy)
-		if err != nil {
-			panic(err)
-		}
-		dst = blk.output
-		blk.output = oldout
-	} else {
-		enc.Reset(e.o.dict, false)
-		blk := enc.Block()
-		for len(src) > 0 {
-			todo := src
-			if len(todo) > e.o.blockSize {
-				todo = todo[:e.o.blockSize]
-			}
-			src = src[len(todo):]
-			if e.o.crc {
-				_, _ = enc.CRC().Write(todo)
-			}
-			blk.pushOffsets()
-			enc.Encode(blk, todo)
-			if len(src) == 0 {
-				blk.last = true
-			}
-			err := blk.encode(todo, e.o.noEntropy, !e.o.allLitEntropy)
-			if err != nil {
-				panic(err)
-			}
-			dst = append(dst, blk.output...)
-			blk.reset(nil)
-		}
-	}
-	if e.o.crc {
-		dst = enc.AppendCRC(dst)
-	}
-	// Add padding with content from crypto/rand.Reader
-	if e.o.pad > 0 {
-		add := calcSkippableFrame(int64(len(dst)), int64(e.o.pad))
-		var err error
-		dst, err = skippableFrame(dst, add, rand.Reader)
-		if err != nil {
-			panic(err)
-		}
-	}
-	return dst
-}
-
-// MaxEncodedSize returns the expected maximum
-// size of an encoded block or stream.
-func (e *Encoder) MaxEncodedSize(size int) int {
-	frameHeader := 4 + 2 // magic + frame header & window descriptor
-	if e.o.dict != nil {
-		frameHeader += 4
-	}
-	// Frame content size:
-	if size < 256 {
-		frameHeader++
-	} else if size < 65536+256 {
-		frameHeader += 2
-	} else if size < math.MaxInt32 {
-		frameHeader += 4
-	} else {
-		frameHeader += 8
-	}
-	// Final crc
-	if e.o.crc {
-		frameHeader += 4
-	}
-
-	// Max overhead is 3 bytes/block.
-	// There cannot be 0 blocks.
-	blocks := (size + e.o.blockSize) / e.o.blockSize
-
-	// Combine, add padding.
-	maxSz := frameHeader + 3*blocks + size
-	if e.o.pad > 1 {
-		maxSz += calcSkippableFrame(int64(maxSz), int64(e.o.pad))
-	}
-	return maxSz
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/encoder_options.go b/vendor/github.com/klauspost/compress/zstd/encoder_options.go
deleted file mode 100644
index 20671dcb9..000000000
--- a/vendor/github.com/klauspost/compress/zstd/encoder_options.go
+++ /dev/null
@@ -1,339 +0,0 @@
-package zstd
-
-import (
-	"errors"
-	"fmt"
-	"math"
-	"math/bits"
-	"runtime"
-	"strings"
-)
-
-// EOption is an option for creating a encoder.
-type EOption func(*encoderOptions) error
-
-// options retains accumulated state of multiple options.
-type encoderOptions struct {
-	concurrent      int
-	level           EncoderLevel
-	single          *bool
-	pad             int
-	blockSize       int
-	windowSize      int
-	crc             bool
-	fullZero        bool
-	noEntropy       bool
-	allLitEntropy   bool
-	customWindow    bool
-	customALEntropy bool
-	customBlockSize bool
-	lowMem          bool
-	dict            *dict
-}
-
-func (o *encoderOptions) setDefault() {
-	*o = encoderOptions{
-		concurrent:    runtime.GOMAXPROCS(0),
-		crc:           true,
-		single:        nil,
-		blockSize:     maxCompressedBlockSize,
-		windowSize:    8 << 20,
-		level:         SpeedDefault,
-		allLitEntropy: false,
-		lowMem:        false,
-	}
-}
-
-// encoder returns an encoder with the selected options.
-func (o encoderOptions) encoder() encoder {
-	switch o.level {
-	case SpeedFastest:
-		if o.dict != nil {
-			return &fastEncoderDict{fastEncoder: fastEncoder{fastBase: fastBase{maxMatchOff: int32(o.windowSize), bufferReset: math.MaxInt32 - int32(o.windowSize*2), lowMem: o.lowMem}}}
-		}
-		return &fastEncoder{fastBase: fastBase{maxMatchOff: int32(o.windowSize), bufferReset: math.MaxInt32 - int32(o.windowSize*2), lowMem: o.lowMem}}
-
-	case SpeedDefault:
-		if o.dict != nil {
-			return &doubleFastEncoderDict{fastEncoderDict: fastEncoderDict{fastEncoder: fastEncoder{fastBase: fastBase{maxMatchOff: int32(o.windowSize), bufferReset: math.MaxInt32 - int32(o.windowSize*2), lowMem: o.lowMem}}}}
-		}
-		return &doubleFastEncoder{fastEncoder: fastEncoder{fastBase: fastBase{maxMatchOff: int32(o.windowSize), bufferReset: math.MaxInt32 - int32(o.windowSize*2), lowMem: o.lowMem}}}
-	case SpeedBetterCompression:
-		if o.dict != nil {
-			return &betterFastEncoderDict{betterFastEncoder: betterFastEncoder{fastBase: fastBase{maxMatchOff: int32(o.windowSize), bufferReset: math.MaxInt32 - int32(o.windowSize*2), lowMem: o.lowMem}}}
-		}
-		return &betterFastEncoder{fastBase: fastBase{maxMatchOff: int32(o.windowSize), bufferReset: math.MaxInt32 - int32(o.windowSize*2), lowMem: o.lowMem}}
-	case SpeedBestCompression:
-		return &bestFastEncoder{fastBase: fastBase{maxMatchOff: int32(o.windowSize), bufferReset: math.MaxInt32 - int32(o.windowSize*2), lowMem: o.lowMem}}
-	}
-	panic("unknown compression level")
-}
-
-// WithEncoderCRC will add CRC value to output.
-// Output will be 4 bytes larger.
-func WithEncoderCRC(b bool) EOption {
-	return func(o *encoderOptions) error { o.crc = b; return nil }
-}
-
-// WithEncoderConcurrency will set the concurrency,
-// meaning the maximum number of encoders to run concurrently.
-// The value supplied must be at least 1.
-// For streams, setting a value of 1 will disable async compression.
-// By default this will be set to GOMAXPROCS.
-func WithEncoderConcurrency(n int) EOption {
-	return func(o *encoderOptions) error {
-		if n <= 0 {
-			return fmt.Errorf("concurrency must be at least 1")
-		}
-		o.concurrent = n
-		return nil
-	}
-}
-
-// WithWindowSize will set the maximum allowed back-reference distance.
-// The value must be a power of two between MinWindowSize and MaxWindowSize.
-// A larger value will enable better compression but allocate more memory and,
-// for above-default values, take considerably longer.
-// The default value is determined by the compression level and max 8MB.
-func WithWindowSize(n int) EOption {
-	return func(o *encoderOptions) error {
-		switch {
-		case n < MinWindowSize:
-			return fmt.Errorf("window size must be at least %d", MinWindowSize)
-		case n > MaxWindowSize:
-			return fmt.Errorf("window size must be at most %d", MaxWindowSize)
-		case (n & (n - 1)) != 0:
-			return errors.New("window size must be a power of 2")
-		}
-
-		o.windowSize = n
-		o.customWindow = true
-		if o.blockSize > o.windowSize {
-			o.blockSize = o.windowSize
-			o.customBlockSize = true
-		}
-		return nil
-	}
-}
-
-// WithEncoderPadding will add padding to all output so the size will be a multiple of n.
-// This can be used to obfuscate the exact output size or make blocks of a certain size.
-// The contents will be a skippable frame, so it will be invisible by the decoder.
-// n must be > 0 and <= 1GB, 1<<30 bytes.
-// The padded area will be filled with data from crypto/rand.Reader.
-// If `EncodeAll` is used with data already in the destination, the total size will be multiple of this.
-func WithEncoderPadding(n int) EOption {
-	return func(o *encoderOptions) error {
-		if n <= 0 {
-			return fmt.Errorf("padding must be at least 1")
-		}
-		// No need to waste our time.
-		if n == 1 {
-			n = 0
-		}
-		if n > 1<<30 {
-			return fmt.Errorf("padding must less than 1GB (1<<30 bytes) ")
-		}
-		o.pad = n
-		return nil
-	}
-}
-
-// EncoderLevel predefines encoder compression levels.
-// Only use the constants made available, since the actual mapping
-// of these values are very likely to change and your compression could change
-// unpredictably when upgrading the library.
-type EncoderLevel int
-
-const (
-	speedNotSet EncoderLevel = iota
-
-	// SpeedFastest will choose the fastest reasonable compression.
-	// This is roughly equivalent to the fastest Zstandard mode.
-	SpeedFastest
-
-	// SpeedDefault is the default "pretty fast" compression option.
-	// This is roughly equivalent to the default Zstandard mode (level 3).
-	SpeedDefault
-
-	// SpeedBetterCompression will yield better compression than the default.
-	// Currently it is about zstd level 7-8 with ~ 2x-3x the default CPU usage.
-	// By using this, notice that CPU usage may go up in the future.
-	SpeedBetterCompression
-
-	// SpeedBestCompression will choose the best available compression option.
-	// This will offer the best compression no matter the CPU cost.
-	SpeedBestCompression
-
-	// speedLast should be kept as the last actual compression option.
-	// The is not for external usage, but is used to keep track of the valid options.
-	speedLast
-)
-
-// EncoderLevelFromString will convert a string representation of an encoding level back
-// to a compression level. The compare is not case sensitive.
-// If the string wasn't recognized, (false, SpeedDefault) will be returned.
-func EncoderLevelFromString(s string) (bool, EncoderLevel) {
-	for l := speedNotSet + 1; l < speedLast; l++ {
-		if strings.EqualFold(s, l.String()) {
-			return true, l
-		}
-	}
-	return false, SpeedDefault
-}
-
-// EncoderLevelFromZstd will return an encoder level that closest matches the compression
-// ratio of a specific zstd compression level.
-// Many input values will provide the same compression level.
-func EncoderLevelFromZstd(level int) EncoderLevel {
-	switch {
-	case level < 3:
-		return SpeedFastest
-	case level >= 3 && level < 6:
-		return SpeedDefault
-	case level >= 6 && level < 10:
-		return SpeedBetterCompression
-	default:
-		return SpeedBestCompression
-	}
-}
-
-// String provides a string representation of the compression level.
-func (e EncoderLevel) String() string {
-	switch e {
-	case SpeedFastest:
-		return "fastest"
-	case SpeedDefault:
-		return "default"
-	case SpeedBetterCompression:
-		return "better"
-	case SpeedBestCompression:
-		return "best"
-	default:
-		return "invalid"
-	}
-}
-
-// WithEncoderLevel specifies a predefined compression level.
-func WithEncoderLevel(l EncoderLevel) EOption {
-	return func(o *encoderOptions) error {
-		switch {
-		case l <= speedNotSet || l >= speedLast:
-			return fmt.Errorf("unknown encoder level")
-		}
-		o.level = l
-		if !o.customWindow {
-			switch o.level {
-			case SpeedFastest:
-				o.windowSize = 4 << 20
-				if !o.customBlockSize {
-					o.blockSize = 1 << 16
-				}
-			case SpeedDefault:
-				o.windowSize = 8 << 20
-			case SpeedBetterCompression:
-				o.windowSize = 8 << 20
-			case SpeedBestCompression:
-				o.windowSize = 8 << 20
-			}
-		}
-		if !o.customALEntropy {
-			o.allLitEntropy = l > SpeedDefault
-		}
-
-		return nil
-	}
-}
-
-// WithZeroFrames will encode 0 length input as full frames.
-// This can be needed for compatibility with zstandard usage,
-// but is not needed for this package.
-func WithZeroFrames(b bool) EOption {
-	return func(o *encoderOptions) error {
-		o.fullZero = b
-		return nil
-	}
-}
-
-// WithAllLitEntropyCompression will apply entropy compression if no matches are found.
-// Disabling this will skip incompressible data faster, but in cases with no matches but
-// skewed character distribution compression is lost.
-// Default value depends on the compression level selected.
-func WithAllLitEntropyCompression(b bool) EOption {
-	return func(o *encoderOptions) error {
-		o.customALEntropy = true
-		o.allLitEntropy = b
-		return nil
-	}
-}
-
-// WithNoEntropyCompression will always skip entropy compression of literals.
-// This can be useful if content has matches, but unlikely to benefit from entropy
-// compression. Usually the slight speed improvement is not worth enabling this.
-func WithNoEntropyCompression(b bool) EOption {
-	return func(o *encoderOptions) error {
-		o.noEntropy = b
-		return nil
-	}
-}
-
-// WithSingleSegment will set the "single segment" flag when EncodeAll is used.
-// If this flag is set, data must be regenerated within a single continuous memory segment.
-// In this case, Window_Descriptor byte is skipped, but Frame_Content_Size is necessarily present.
-// As a consequence, the decoder must allocate a memory segment of size equal or larger than size of your content.
-// In order to preserve the decoder from unreasonable memory requirements,
-// a decoder is allowed to reject a compressed frame which requests a memory size beyond decoder's authorized range.
-// For broader compatibility, decoders are recommended to support memory sizes of at least 8 MB.
-// This is only a recommendation, each decoder is free to support higher or lower limits, depending on local limitations.
-// If this is not specified, block encodes will automatically choose this based on the input size and the window size.
-// This setting has no effect on streamed encodes.
-func WithSingleSegment(b bool) EOption {
-	return func(o *encoderOptions) error {
-		o.single = &b
-		return nil
-	}
-}
-
-// WithLowerEncoderMem will trade in some memory cases trade less memory usage for
-// slower encoding speed.
-// This will not change the window size which is the primary function for reducing
-// memory usage. See WithWindowSize.
-func WithLowerEncoderMem(b bool) EOption {
-	return func(o *encoderOptions) error {
-		o.lowMem = b
-		return nil
-	}
-}
-
-// WithEncoderDict allows to register a dictionary that will be used for the encode.
-//
-// The slice dict must be in the [dictionary format] produced by
-// "zstd --train" from the Zstandard reference implementation.
-//
-// The encoder *may* choose to use no dictionary instead for certain payloads.
-//
-// [dictionary format]: https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#dictionary-format
-func WithEncoderDict(dict []byte) EOption {
-	return func(o *encoderOptions) error {
-		d, err := loadDict(dict)
-		if err != nil {
-			return err
-		}
-		o.dict = d
-		return nil
-	}
-}
-
-// WithEncoderDictRaw registers a dictionary that may be used by the encoder.
-//
-// The slice content may contain arbitrary data. It will be used as an initial
-// history.
-func WithEncoderDictRaw(id uint32, content []byte) EOption {
-	return func(o *encoderOptions) error {
-		if bits.UintSize > 32 && uint(len(content)) > dictMaxLength {
-			return fmt.Errorf("dictionary of size %d > 2GiB too large", len(content))
-		}
-		o.dict = &dict{id: id, content: content, offsets: [3]int{1, 4, 8}}
-		return nil
-	}
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/framedec.go b/vendor/github.com/klauspost/compress/zstd/framedec.go
deleted file mode 100644
index e47af66e7..000000000
--- a/vendor/github.com/klauspost/compress/zstd/framedec.go
+++ /dev/null
@@ -1,415 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"encoding/binary"
-	"encoding/hex"
-	"errors"
-	"io"
-
-	"github.com/klauspost/compress/zstd/internal/xxhash"
-)
-
-type frameDec struct {
-	o   decoderOptions
-	crc *xxhash.Digest
-
-	WindowSize uint64
-
-	// Frame history passed between blocks
-	history history
-
-	rawInput byteBuffer
-
-	// Byte buffer that can be reused for small input blocks.
-	bBuf byteBuf
-
-	FrameContentSize uint64
-
-	DictionaryID  uint32
-	HasCheckSum   bool
-	SingleSegment bool
-}
-
-const (
-	// MinWindowSize is the minimum Window Size, which is 1 KB.
-	MinWindowSize = 1 << 10
-
-	// MaxWindowSize is the maximum encoder window size
-	// and the default decoder maximum window size.
-	MaxWindowSize = 1 << 29
-)
-
-const (
-	frameMagic          = "\x28\xb5\x2f\xfd"
-	skippableFrameMagic = "\x2a\x4d\x18"
-)
-
-func newFrameDec(o decoderOptions) *frameDec {
-	if o.maxWindowSize > o.maxDecodedSize {
-		o.maxWindowSize = o.maxDecodedSize
-	}
-	d := frameDec{
-		o: o,
-	}
-	return &d
-}
-
-// reset will read the frame header and prepare for block decoding.
-// If nothing can be read from the input, io.EOF will be returned.
-// Any other error indicated that the stream contained data, but
-// there was a problem.
-func (d *frameDec) reset(br byteBuffer) error {
-	d.HasCheckSum = false
-	d.WindowSize = 0
-	var signature [4]byte
-	for {
-		var err error
-		// Check if we can read more...
-		b, err := br.readSmall(1)
-		switch err {
-		case io.EOF, io.ErrUnexpectedEOF:
-			return io.EOF
-		case nil:
-			signature[0] = b[0]
-		default:
-			return err
-		}
-		// Read the rest, don't allow io.ErrUnexpectedEOF
-		b, err = br.readSmall(3)
-		switch err {
-		case io.EOF:
-			return io.EOF
-		case nil:
-			copy(signature[1:], b)
-		default:
-			return err
-		}
-
-		if string(signature[1:4]) != skippableFrameMagic || signature[0]&0xf0 != 0x50 {
-			if debugDecoder {
-				println("Not skippable", hex.EncodeToString(signature[:]), hex.EncodeToString([]byte(skippableFrameMagic)))
-			}
-			// Break if not skippable frame.
-			break
-		}
-		// Read size to skip
-		b, err = br.readSmall(4)
-		if err != nil {
-			if debugDecoder {
-				println("Reading Frame Size", err)
-			}
-			return err
-		}
-		n := uint32(b[0]) | (uint32(b[1]) << 8) | (uint32(b[2]) << 16) | (uint32(b[3]) << 24)
-		println("Skipping frame with", n, "bytes.")
-		err = br.skipN(int64(n))
-		if err != nil {
-			if debugDecoder {
-				println("Reading discarded frame", err)
-			}
-			return err
-		}
-	}
-	if string(signature[:]) != frameMagic {
-		if debugDecoder {
-			println("Got magic numbers: ", signature, "want:", []byte(frameMagic))
-		}
-		return ErrMagicMismatch
-	}
-
-	// Read Frame_Header_Descriptor
-	fhd, err := br.readByte()
-	if err != nil {
-		if debugDecoder {
-			println("Reading Frame_Header_Descriptor", err)
-		}
-		return err
-	}
-	d.SingleSegment = fhd&(1<<5) != 0
-
-	if fhd&(1<<3) != 0 {
-		return errors.New("reserved bit set on frame header")
-	}
-
-	// Read Window_Descriptor
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#window_descriptor
-	d.WindowSize = 0
-	if !d.SingleSegment {
-		wd, err := br.readByte()
-		if err != nil {
-			if debugDecoder {
-				println("Reading Window_Descriptor", err)
-			}
-			return err
-		}
-		if debugDecoder {
-			printf("raw: %x, mantissa: %d, exponent: %d\n", wd, wd&7, wd>>3)
-		}
-		windowLog := 10 + (wd >> 3)
-		windowBase := uint64(1) << windowLog
-		windowAdd := (windowBase / 8) * uint64(wd&0x7)
-		d.WindowSize = windowBase + windowAdd
-	}
-
-	// Read Dictionary_ID
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#dictionary_id
-	d.DictionaryID = 0
-	if size := fhd & 3; size != 0 {
-		if size == 3 {
-			size = 4
-		}
-
-		b, err := br.readSmall(int(size))
-		if err != nil {
-			println("Reading Dictionary_ID", err)
-			return err
-		}
-		var id uint32
-		switch len(b) {
-		case 1:
-			id = uint32(b[0])
-		case 2:
-			id = uint32(b[0]) | (uint32(b[1]) << 8)
-		case 4:
-			id = uint32(b[0]) | (uint32(b[1]) << 8) | (uint32(b[2]) << 16) | (uint32(b[3]) << 24)
-		}
-		if debugDecoder {
-			println("Dict size", size, "ID:", id)
-		}
-		d.DictionaryID = id
-	}
-
-	// Read Frame_Content_Size
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#frame_content_size
-	var fcsSize int
-	v := fhd >> 6
-	switch v {
-	case 0:
-		if d.SingleSegment {
-			fcsSize = 1
-		}
-	default:
-		fcsSize = 1 << v
-	}
-	d.FrameContentSize = fcsUnknown
-	if fcsSize > 0 {
-		b, err := br.readSmall(fcsSize)
-		if err != nil {
-			println("Reading Frame content", err)
-			return err
-		}
-		switch len(b) {
-		case 1:
-			d.FrameContentSize = uint64(b[0])
-		case 2:
-			// When FCS_Field_Size is 2, the offset of 256 is added.
-			d.FrameContentSize = uint64(b[0]) | (uint64(b[1]) << 8) + 256
-		case 4:
-			d.FrameContentSize = uint64(b[0]) | (uint64(b[1]) << 8) | (uint64(b[2]) << 16) | (uint64(b[3]) << 24)
-		case 8:
-			d1 := uint32(b[0]) | (uint32(b[1]) << 8) | (uint32(b[2]) << 16) | (uint32(b[3]) << 24)
-			d2 := uint32(b[4]) | (uint32(b[5]) << 8) | (uint32(b[6]) << 16) | (uint32(b[7]) << 24)
-			d.FrameContentSize = uint64(d1) | (uint64(d2) << 32)
-		}
-		if debugDecoder {
-			println("Read FCS:", d.FrameContentSize)
-		}
-	}
-
-	// Move this to shared.
-	d.HasCheckSum = fhd&(1<<2) != 0
-	if d.HasCheckSum {
-		if d.crc == nil {
-			d.crc = xxhash.New()
-		}
-		d.crc.Reset()
-	}
-
-	if d.WindowSize > d.o.maxWindowSize {
-		if debugDecoder {
-			printf("window size %d > max %d\n", d.WindowSize, d.o.maxWindowSize)
-		}
-		return ErrWindowSizeExceeded
-	}
-
-	if d.WindowSize == 0 && d.SingleSegment {
-		// We may not need window in this case.
-		d.WindowSize = d.FrameContentSize
-		if d.WindowSize < MinWindowSize {
-			d.WindowSize = MinWindowSize
-		}
-		if d.WindowSize > d.o.maxDecodedSize {
-			if debugDecoder {
-				printf("window size %d > max %d\n", d.WindowSize, d.o.maxWindowSize)
-			}
-			return ErrDecoderSizeExceeded
-		}
-	}
-
-	// The minimum Window_Size is 1 KB.
-	if d.WindowSize < MinWindowSize {
-		if debugDecoder {
-			println("got window size: ", d.WindowSize)
-		}
-		return ErrWindowSizeTooSmall
-	}
-	d.history.windowSize = int(d.WindowSize)
-	if !d.o.lowMem || d.history.windowSize < maxBlockSize {
-		// Alloc 2x window size if not low-mem, or window size below 2MB.
-		d.history.allocFrameBuffer = d.history.windowSize * 2
-	} else {
-		if d.o.lowMem {
-			// Alloc with 1MB extra.
-			d.history.allocFrameBuffer = d.history.windowSize + maxBlockSize/2
-		} else {
-			// Alloc with 2MB extra.
-			d.history.allocFrameBuffer = d.history.windowSize + maxBlockSize
-		}
-	}
-
-	if debugDecoder {
-		println("Frame: Dict:", d.DictionaryID, "FrameContentSize:", d.FrameContentSize, "singleseg:", d.SingleSegment, "window:", d.WindowSize, "crc:", d.HasCheckSum)
-	}
-
-	// history contains input - maybe we do something
-	d.rawInput = br
-	return nil
-}
-
-// next will start decoding the next block from stream.
-func (d *frameDec) next(block *blockDec) error {
-	if debugDecoder {
-		println("decoding new block")
-	}
-	err := block.reset(d.rawInput, d.WindowSize)
-	if err != nil {
-		println("block error:", err)
-		// Signal the frame decoder we have a problem.
-		block.sendErr(err)
-		return err
-	}
-	return nil
-}
-
-// checkCRC will check the checksum, assuming the frame has one.
-// Will return ErrCRCMismatch if crc check failed, otherwise nil.
-func (d *frameDec) checkCRC() error {
-	// We can overwrite upper tmp now
-	buf, err := d.rawInput.readSmall(4)
-	if err != nil {
-		println("CRC missing?", err)
-		return err
-	}
-
-	want := binary.LittleEndian.Uint32(buf[:4])
-	got := uint32(d.crc.Sum64())
-
-	if got != want {
-		if debugDecoder {
-			printf("CRC check failed: got %08x, want %08x\n", got, want)
-		}
-		return ErrCRCMismatch
-	}
-	if debugDecoder {
-		printf("CRC ok %08x\n", got)
-	}
-	return nil
-}
-
-// consumeCRC skips over the checksum, assuming the frame has one.
-func (d *frameDec) consumeCRC() error {
-	_, err := d.rawInput.readSmall(4)
-	if err != nil {
-		println("CRC missing?", err)
-	}
-	return err
-}
-
-// runDecoder will run the decoder for the remainder of the frame.
-func (d *frameDec) runDecoder(dst []byte, dec *blockDec) ([]byte, error) {
-	saved := d.history.b
-
-	// We use the history for output to avoid copying it.
-	d.history.b = dst
-	d.history.ignoreBuffer = len(dst)
-	// Store input length, so we only check new data.
-	crcStart := len(dst)
-	d.history.decoders.maxSyncLen = 0
-	if d.o.limitToCap {
-		d.history.decoders.maxSyncLen = uint64(cap(dst) - len(dst))
-	}
-	if d.FrameContentSize != fcsUnknown {
-		if !d.o.limitToCap || d.FrameContentSize+uint64(len(dst)) < d.history.decoders.maxSyncLen {
-			d.history.decoders.maxSyncLen = d.FrameContentSize + uint64(len(dst))
-		}
-		if d.history.decoders.maxSyncLen > d.o.maxDecodedSize {
-			if debugDecoder {
-				println("maxSyncLen:", d.history.decoders.maxSyncLen, "> maxDecodedSize:", d.o.maxDecodedSize)
-			}
-			return dst, ErrDecoderSizeExceeded
-		}
-		if debugDecoder {
-			println("maxSyncLen:", d.history.decoders.maxSyncLen)
-		}
-		if !d.o.limitToCap && uint64(cap(dst)) < d.history.decoders.maxSyncLen {
-			// Alloc for output
-			dst2 := make([]byte, len(dst), d.history.decoders.maxSyncLen+compressedBlockOverAlloc)
-			copy(dst2, dst)
-			dst = dst2
-		}
-	}
-	var err error
-	for {
-		err = dec.reset(d.rawInput, d.WindowSize)
-		if err != nil {
-			break
-		}
-		if debugDecoder {
-			println("next block:", dec)
-		}
-		err = dec.decodeBuf(&d.history)
-		if err != nil {
-			break
-		}
-		if uint64(len(d.history.b)-crcStart) > d.o.maxDecodedSize {
-			println("runDecoder: maxDecodedSize exceeded", uint64(len(d.history.b)-crcStart), ">", d.o.maxDecodedSize)
-			err = ErrDecoderSizeExceeded
-			break
-		}
-		if d.o.limitToCap && len(d.history.b) > cap(dst) {
-			println("runDecoder: cap exceeded", uint64(len(d.history.b)), ">", cap(dst))
-			err = ErrDecoderSizeExceeded
-			break
-		}
-		if uint64(len(d.history.b)-crcStart) > d.FrameContentSize {
-			println("runDecoder: FrameContentSize exceeded", uint64(len(d.history.b)-crcStart), ">", d.FrameContentSize)
-			err = ErrFrameSizeExceeded
-			break
-		}
-		if dec.Last {
-			break
-		}
-		if debugDecoder {
-			println("runDecoder: FrameContentSize", uint64(len(d.history.b)-crcStart), "<=", d.FrameContentSize)
-		}
-	}
-	dst = d.history.b
-	if err == nil {
-		if d.FrameContentSize != fcsUnknown && uint64(len(d.history.b)-crcStart) != d.FrameContentSize {
-			err = ErrFrameSizeMismatch
-		} else if d.HasCheckSum {
-			if d.o.ignoreChecksum {
-				err = d.consumeCRC()
-			} else {
-				d.crc.Write(dst[crcStart:])
-				err = d.checkCRC()
-			}
-		}
-	}
-	d.history.b = saved
-	return dst, err
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/frameenc.go b/vendor/github.com/klauspost/compress/zstd/frameenc.go
deleted file mode 100644
index 667ca0679..000000000
--- a/vendor/github.com/klauspost/compress/zstd/frameenc.go
+++ /dev/null
@@ -1,137 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"encoding/binary"
-	"fmt"
-	"io"
-	"math"
-	"math/bits"
-)
-
-type frameHeader struct {
-	ContentSize   uint64
-	WindowSize    uint32
-	SingleSegment bool
-	Checksum      bool
-	DictID        uint32
-}
-
-const maxHeaderSize = 14
-
-func (f frameHeader) appendTo(dst []byte) []byte {
-	dst = append(dst, frameMagic...)
-	var fhd uint8
-	if f.Checksum {
-		fhd |= 1 << 2
-	}
-	if f.SingleSegment {
-		fhd |= 1 << 5
-	}
-
-	var dictIDContent []byte
-	if f.DictID > 0 {
-		var tmp [4]byte
-		if f.DictID < 256 {
-			fhd |= 1
-			tmp[0] = uint8(f.DictID)
-			dictIDContent = tmp[:1]
-		} else if f.DictID < 1<<16 {
-			fhd |= 2
-			binary.LittleEndian.PutUint16(tmp[:2], uint16(f.DictID))
-			dictIDContent = tmp[:2]
-		} else {
-			fhd |= 3
-			binary.LittleEndian.PutUint32(tmp[:4], f.DictID)
-			dictIDContent = tmp[:4]
-		}
-	}
-	var fcs uint8
-	if f.ContentSize >= 256 {
-		fcs++
-	}
-	if f.ContentSize >= 65536+256 {
-		fcs++
-	}
-	if f.ContentSize >= 0xffffffff {
-		fcs++
-	}
-
-	fhd |= fcs << 6
-
-	dst = append(dst, fhd)
-	if !f.SingleSegment {
-		const winLogMin = 10
-		windowLog := (bits.Len32(f.WindowSize-1) - winLogMin) << 3
-		dst = append(dst, uint8(windowLog))
-	}
-	if f.DictID > 0 {
-		dst = append(dst, dictIDContent...)
-	}
-	switch fcs {
-	case 0:
-		if f.SingleSegment {
-			dst = append(dst, uint8(f.ContentSize))
-		}
-		// Unless SingleSegment is set, framessizes < 256 are not stored.
-	case 1:
-		f.ContentSize -= 256
-		dst = append(dst, uint8(f.ContentSize), uint8(f.ContentSize>>8))
-	case 2:
-		dst = append(dst, uint8(f.ContentSize), uint8(f.ContentSize>>8), uint8(f.ContentSize>>16), uint8(f.ContentSize>>24))
-	case 3:
-		dst = append(dst, uint8(f.ContentSize), uint8(f.ContentSize>>8), uint8(f.ContentSize>>16), uint8(f.ContentSize>>24),
-			uint8(f.ContentSize>>32), uint8(f.ContentSize>>40), uint8(f.ContentSize>>48), uint8(f.ContentSize>>56))
-	default:
-		panic("invalid fcs")
-	}
-	return dst
-}
-
-const skippableFrameHeader = 4 + 4
-
-// calcSkippableFrame will return a total size to be added for written
-// to be divisible by multiple.
-// The value will always be > skippableFrameHeader.
-// The function will panic if written < 0 or wantMultiple <= 0.
-func calcSkippableFrame(written, wantMultiple int64) int {
-	if wantMultiple <= 0 {
-		panic("wantMultiple <= 0")
-	}
-	if written < 0 {
-		panic("written < 0")
-	}
-	leftOver := written % wantMultiple
-	if leftOver == 0 {
-		return 0
-	}
-	toAdd := wantMultiple - leftOver
-	for toAdd < skippableFrameHeader {
-		toAdd += wantMultiple
-	}
-	return int(toAdd)
-}
-
-// skippableFrame will add a skippable frame with a total size of bytes.
-// total should be >= skippableFrameHeader and < math.MaxUint32.
-func skippableFrame(dst []byte, total int, r io.Reader) ([]byte, error) {
-	if total == 0 {
-		return dst, nil
-	}
-	if total < skippableFrameHeader {
-		return dst, fmt.Errorf("requested skippable frame (%d) < 8", total)
-	}
-	if int64(total) > math.MaxUint32 {
-		return dst, fmt.Errorf("requested skippable frame (%d) > max uint32", total)
-	}
-	dst = append(dst, 0x50, 0x2a, 0x4d, 0x18)
-	f := uint32(total - skippableFrameHeader)
-	dst = append(dst, uint8(f), uint8(f>>8), uint8(f>>16), uint8(f>>24))
-	start := len(dst)
-	dst = append(dst, make([]byte, f)...)
-	_, err := io.ReadFull(r, dst[start:])
-	return dst, err
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/fse_decoder.go b/vendor/github.com/klauspost/compress/zstd/fse_decoder.go
deleted file mode 100644
index 2f8860a72..000000000
--- a/vendor/github.com/klauspost/compress/zstd/fse_decoder.go
+++ /dev/null
@@ -1,307 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"encoding/binary"
-	"errors"
-	"fmt"
-	"io"
-)
-
-const (
-	tablelogAbsoluteMax = 9
-)
-
-const (
-	/*!MEMORY_USAGE :
-	 *  Memory usage formula : N->2^N Bytes (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
-	 *  Increasing memory usage improves compression ratio
-	 *  Reduced memory usage can improve speed, due to cache effect
-	 *  Recommended max value is 14, for 16KB, which nicely fits into Intel x86 L1 cache */
-	maxMemoryUsage = tablelogAbsoluteMax + 2
-
-	maxTableLog    = maxMemoryUsage - 2
-	maxTablesize   = 1 << maxTableLog
-	maxTableMask   = (1 << maxTableLog) - 1
-	minTablelog    = 5
-	maxSymbolValue = 255
-)
-
-// fseDecoder provides temporary storage for compression and decompression.
-type fseDecoder struct {
-	dt             [maxTablesize]decSymbol // Decompression table.
-	symbolLen      uint16                  // Length of active part of the symbol table.
-	actualTableLog uint8                   // Selected tablelog.
-	maxBits        uint8                   // Maximum number of additional bits
-
-	// used for table creation to avoid allocations.
-	stateTable [256]uint16
-	norm       [maxSymbolValue + 1]int16
-	preDefined bool
-}
-
-// tableStep returns the next table index.
-func tableStep(tableSize uint32) uint32 {
-	return (tableSize >> 1) + (tableSize >> 3) + 3
-}
-
-// readNCount will read the symbol distribution so decoding tables can be constructed.
-func (s *fseDecoder) readNCount(b *byteReader, maxSymbol uint16) error {
-	var (
-		charnum   uint16
-		previous0 bool
-	)
-	if b.remain() < 4 {
-		return errors.New("input too small")
-	}
-	bitStream := b.Uint32NC()
-	nbBits := uint((bitStream & 0xF) + minTablelog) // extract tableLog
-	if nbBits > tablelogAbsoluteMax {
-		println("Invalid tablelog:", nbBits)
-		return errors.New("tableLog too large")
-	}
-	bitStream >>= 4
-	bitCount := uint(4)
-
-	s.actualTableLog = uint8(nbBits)
-	remaining := int32((1 << nbBits) + 1)
-	threshold := int32(1 << nbBits)
-	gotTotal := int32(0)
-	nbBits++
-
-	for remaining > 1 && charnum <= maxSymbol {
-		if previous0 {
-			//println("prev0")
-			n0 := charnum
-			for (bitStream & 0xFFFF) == 0xFFFF {
-				//println("24 x 0")
-				n0 += 24
-				if r := b.remain(); r > 5 {
-					b.advance(2)
-					// The check above should make sure we can read 32 bits
-					bitStream = b.Uint32NC() >> bitCount
-				} else {
-					// end of bit stream
-					bitStream >>= 16
-					bitCount += 16
-				}
-			}
-			//printf("bitstream: %d, 0b%b", bitStream&3, bitStream)
-			for (bitStream & 3) == 3 {
-				n0 += 3
-				bitStream >>= 2
-				bitCount += 2
-			}
-			n0 += uint16(bitStream & 3)
-			bitCount += 2
-
-			if n0 > maxSymbolValue {
-				return errors.New("maxSymbolValue too small")
-			}
-			//println("inserting ", n0-charnum, "zeroes from idx", charnum, "ending before", n0)
-			for charnum < n0 {
-				s.norm[uint8(charnum)] = 0
-				charnum++
-			}
-
-			if r := b.remain(); r >= 7 || r-int(bitCount>>3) >= 4 {
-				b.advance(bitCount >> 3)
-				bitCount &= 7
-				// The check above should make sure we can read 32 bits
-				bitStream = b.Uint32NC() >> bitCount
-			} else {
-				bitStream >>= 2
-			}
-		}
-
-		max := (2*threshold - 1) - remaining
-		var count int32
-
-		if int32(bitStream)&(threshold-1) < max {
-			count = int32(bitStream) & (threshold - 1)
-			if debugAsserts && nbBits < 1 {
-				panic("nbBits underflow")
-			}
-			bitCount += nbBits - 1
-		} else {
-			count = int32(bitStream) & (2*threshold - 1)
-			if count >= threshold {
-				count -= max
-			}
-			bitCount += nbBits
-		}
-
-		// extra accuracy
-		count--
-		if count < 0 {
-			// -1 means +1
-			remaining += count
-			gotTotal -= count
-		} else {
-			remaining -= count
-			gotTotal += count
-		}
-		s.norm[charnum&0xff] = int16(count)
-		charnum++
-		previous0 = count == 0
-		for remaining < threshold {
-			nbBits--
-			threshold >>= 1
-		}
-
-		if r := b.remain(); r >= 7 || r-int(bitCount>>3) >= 4 {
-			b.advance(bitCount >> 3)
-			bitCount &= 7
-			// The check above should make sure we can read 32 bits
-			bitStream = b.Uint32NC() >> (bitCount & 31)
-		} else {
-			bitCount -= (uint)(8 * (len(b.b) - 4 - b.off))
-			b.off = len(b.b) - 4
-			bitStream = b.Uint32() >> (bitCount & 31)
-		}
-	}
-	s.symbolLen = charnum
-	if s.symbolLen <= 1 {
-		return fmt.Errorf("symbolLen (%d) too small", s.symbolLen)
-	}
-	if s.symbolLen > maxSymbolValue+1 {
-		return fmt.Errorf("symbolLen (%d) too big", s.symbolLen)
-	}
-	if remaining != 1 {
-		return fmt.Errorf("corruption detected (remaining %d != 1)", remaining)
-	}
-	if bitCount > 32 {
-		return fmt.Errorf("corruption detected (bitCount %d > 32)", bitCount)
-	}
-	if gotTotal != 1<<s.actualTableLog {
-		return fmt.Errorf("corruption detected (total %d != %d)", gotTotal, 1<<s.actualTableLog)
-	}
-	b.advance((bitCount + 7) >> 3)
-	return s.buildDtable()
-}
-
-func (s *fseDecoder) mustReadFrom(r io.Reader) {
-	fatalErr := func(err error) {
-		if err != nil {
-			panic(err)
-		}
-	}
-	// 	dt             [maxTablesize]decSymbol // Decompression table.
-	//	symbolLen      uint16                  // Length of active part of the symbol table.
-	//	actualTableLog uint8                   // Selected tablelog.
-	//	maxBits        uint8                   // Maximum number of additional bits
-	//	// used for table creation to avoid allocations.
-	//	stateTable [256]uint16
-	//	norm       [maxSymbolValue + 1]int16
-	//	preDefined bool
-	fatalErr(binary.Read(r, binary.LittleEndian, &s.dt))
-	fatalErr(binary.Read(r, binary.LittleEndian, &s.symbolLen))
-	fatalErr(binary.Read(r, binary.LittleEndian, &s.actualTableLog))
-	fatalErr(binary.Read(r, binary.LittleEndian, &s.maxBits))
-	fatalErr(binary.Read(r, binary.LittleEndian, &s.stateTable))
-	fatalErr(binary.Read(r, binary.LittleEndian, &s.norm))
-	fatalErr(binary.Read(r, binary.LittleEndian, &s.preDefined))
-}
-
-// decSymbol contains information about a state entry,
-// Including the state offset base, the output symbol and
-// the number of bits to read for the low part of the destination state.
-// Using a composite uint64 is faster than a struct with separate members.
-type decSymbol uint64
-
-func newDecSymbol(nbits, addBits uint8, newState uint16, baseline uint32) decSymbol {
-	return decSymbol(nbits) | (decSymbol(addBits) << 8) | (decSymbol(newState) << 16) | (decSymbol(baseline) << 32)
-}
-
-func (d decSymbol) nbBits() uint8 {
-	return uint8(d)
-}
-
-func (d decSymbol) addBits() uint8 {
-	return uint8(d >> 8)
-}
-
-func (d decSymbol) newState() uint16 {
-	return uint16(d >> 16)
-}
-
-func (d decSymbol) baselineInt() int {
-	return int(d >> 32)
-}
-
-func (d *decSymbol) setNBits(nBits uint8) {
-	const mask = 0xffffffffffffff00
-	*d = (*d & mask) | decSymbol(nBits)
-}
-
-func (d *decSymbol) setAddBits(addBits uint8) {
-	const mask = 0xffffffffffff00ff
-	*d = (*d & mask) | (decSymbol(addBits) << 8)
-}
-
-func (d *decSymbol) setNewState(state uint16) {
-	const mask = 0xffffffff0000ffff
-	*d = (*d & mask) | decSymbol(state)<<16
-}
-
-func (d *decSymbol) setExt(addBits uint8, baseline uint32) {
-	const mask = 0xffff00ff
-	*d = (*d & mask) | (decSymbol(addBits) << 8) | (decSymbol(baseline) << 32)
-}
-
-// decSymbolValue returns the transformed decSymbol for the given symbol.
-func decSymbolValue(symb uint8, t []baseOffset) (decSymbol, error) {
-	if int(symb) >= len(t) {
-		return 0, fmt.Errorf("rle symbol %d >= max %d", symb, len(t))
-	}
-	lu := t[symb]
-	return newDecSymbol(0, lu.addBits, 0, lu.baseLine), nil
-}
-
-// setRLE will set the decoder til RLE mode.
-func (s *fseDecoder) setRLE(symbol decSymbol) {
-	s.actualTableLog = 0
-	s.maxBits = symbol.addBits()
-	s.dt[0] = symbol
-}
-
-// transform will transform the decoder table into a table usable for
-// decoding without having to apply the transformation while decoding.
-// The state will contain the base value and the number of bits to read.
-func (s *fseDecoder) transform(t []baseOffset) error {
-	tableSize := uint16(1 << s.actualTableLog)
-	s.maxBits = 0
-	for i, v := range s.dt[:tableSize] {
-		add := v.addBits()
-		if int(add) >= len(t) {
-			return fmt.Errorf("invalid decoding table entry %d, symbol %d >= max (%d)", i, v.addBits(), len(t))
-		}
-		lu := t[add]
-		if lu.addBits > s.maxBits {
-			s.maxBits = lu.addBits
-		}
-		v.setExt(lu.addBits, lu.baseLine)
-		s.dt[i] = v
-	}
-	return nil
-}
-
-type fseState struct {
-	dt    []decSymbol
-	state decSymbol
-}
-
-// Initialize and decodeAsync first state and symbol.
-func (s *fseState) init(br *bitReader, tableLog uint8, dt []decSymbol) {
-	s.dt = dt
-	br.fill()
-	s.state = dt[br.getBits(tableLog)]
-}
-
-// final returns the current state symbol without decoding the next.
-func (s decSymbol) final() (int, uint8) {
-	return s.baselineInt(), s.addBits()
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/fse_decoder_amd64.go b/vendor/github.com/klauspost/compress/zstd/fse_decoder_amd64.go
deleted file mode 100644
index d04a829b0..000000000
--- a/vendor/github.com/klauspost/compress/zstd/fse_decoder_amd64.go
+++ /dev/null
@@ -1,65 +0,0 @@
-//go:build amd64 && !appengine && !noasm && gc
-// +build amd64,!appengine,!noasm,gc
-
-package zstd
-
-import (
-	"fmt"
-)
-
-type buildDtableAsmContext struct {
-	// inputs
-	stateTable *uint16
-	norm       *int16
-	dt         *uint64
-
-	// outputs --- set by the procedure in the case of error;
-	// for interpretation please see the error handling part below
-	errParam1 uint64
-	errParam2 uint64
-}
-
-// buildDtable_asm is an x86 assembly implementation of fseDecoder.buildDtable.
-// Function returns non-zero exit code on error.
-//
-//go:noescape
-func buildDtable_asm(s *fseDecoder, ctx *buildDtableAsmContext) int
-
-// please keep in sync with _generate/gen_fse.go
-const (
-	errorCorruptedNormalizedCounter = 1
-	errorNewStateTooBig             = 2
-	errorNewStateNoBits             = 3
-)
-
-// buildDtable will build the decoding table.
-func (s *fseDecoder) buildDtable() error {
-	ctx := buildDtableAsmContext{
-		stateTable: &s.stateTable[0],
-		norm:       &s.norm[0],
-		dt:         (*uint64)(&s.dt[0]),
-	}
-	code := buildDtable_asm(s, &ctx)
-
-	if code != 0 {
-		switch code {
-		case errorCorruptedNormalizedCounter:
-			position := ctx.errParam1
-			return fmt.Errorf("corrupted input (position=%d, expected 0)", position)
-
-		case errorNewStateTooBig:
-			newState := decSymbol(ctx.errParam1)
-			size := ctx.errParam2
-			return fmt.Errorf("newState (%d) outside table size (%d)", newState, size)
-
-		case errorNewStateNoBits:
-			newState := decSymbol(ctx.errParam1)
-			oldState := decSymbol(ctx.errParam2)
-			return fmt.Errorf("newState (%d) == oldState (%d) and no bits", newState, oldState)
-
-		default:
-			return fmt.Errorf("buildDtable_asm returned unhandled nonzero code = %d", code)
-		}
-	}
-	return nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/fse_decoder_amd64.s b/vendor/github.com/klauspost/compress/zstd/fse_decoder_amd64.s
deleted file mode 100644
index bcde39869..000000000
--- a/vendor/github.com/klauspost/compress/zstd/fse_decoder_amd64.s
+++ /dev/null
@@ -1,126 +0,0 @@
-// Code generated by command: go run gen_fse.go -out ../fse_decoder_amd64.s -pkg=zstd. DO NOT EDIT.
-
-//go:build !appengine && !noasm && gc && !noasm
-
-// func buildDtable_asm(s *fseDecoder, ctx *buildDtableAsmContext) int
-TEXT ·buildDtable_asm(SB), $0-24
-	MOVQ ctx+8(FP), CX
-	MOVQ s+0(FP), DI
-
-	// Load values
-	MOVBQZX 4098(DI), DX
-	XORQ    AX, AX
-	BTSQ    DX, AX
-	MOVQ    (CX), BX
-	MOVQ    16(CX), SI
-	LEAQ    -1(AX), R8
-	MOVQ    8(CX), CX
-	MOVWQZX 4096(DI), DI
-
-	// End load values
-	// Init, lay down lowprob symbols
-	XORQ R9, R9
-	JMP  init_main_loop_condition
-
-init_main_loop:
-	MOVWQSX (CX)(R9*2), R10
-	CMPW    R10, $-1
-	JNE     do_not_update_high_threshold
-	MOVB    R9, 1(SI)(R8*8)
-	DECQ    R8
-	MOVQ    $0x0000000000000001, R10
-
-do_not_update_high_threshold:
-	MOVW R10, (BX)(R9*2)
-	INCQ R9
-
-init_main_loop_condition:
-	CMPQ R9, DI
-	JL   init_main_loop
-
-	// Spread symbols
-	// Calculate table step
-	MOVQ AX, R9
-	SHRQ $0x01, R9
-	MOVQ AX, R10
-	SHRQ $0x03, R10
-	LEAQ 3(R9)(R10*1), R9
-
-	// Fill add bits values
-	LEAQ -1(AX), R10
-	XORQ R11, R11
-	XORQ R12, R12
-	JMP  spread_main_loop_condition
-
-spread_main_loop:
-	XORQ    R13, R13
-	MOVWQSX (CX)(R12*2), R14
-	JMP     spread_inner_loop_condition
-
-spread_inner_loop:
-	MOVB R12, 1(SI)(R11*8)
-
-adjust_position:
-	ADDQ R9, R11
-	ANDQ R10, R11
-	CMPQ R11, R8
-	JG   adjust_position
-	INCQ R13
-
-spread_inner_loop_condition:
-	CMPQ R13, R14
-	JL   spread_inner_loop
-	INCQ R12
-
-spread_main_loop_condition:
-	CMPQ  R12, DI
-	JL    spread_main_loop
-	TESTQ R11, R11
-	JZ    spread_check_ok
-	MOVQ  ctx+8(FP), AX
-	MOVQ  R11, 24(AX)
-	MOVQ  $+1, ret+16(FP)
-	RET
-
-spread_check_ok:
-	// Build Decoding table
-	XORQ DI, DI
-
-build_table_main_table:
-	MOVBQZX 1(SI)(DI*8), CX
-	MOVWQZX (BX)(CX*2), R8
-	LEAQ    1(R8), R9
-	MOVW    R9, (BX)(CX*2)
-	MOVQ    R8, R9
-	BSRQ    R9, R9
-	MOVQ    DX, CX
-	SUBQ    R9, CX
-	SHLQ    CL, R8
-	SUBQ    AX, R8
-	MOVB    CL, (SI)(DI*8)
-	MOVW    R8, 2(SI)(DI*8)
-	CMPQ    R8, AX
-	JLE     build_table_check1_ok
-	MOVQ    ctx+8(FP), CX
-	MOVQ    R8, 24(CX)
-	MOVQ    AX, 32(CX)
-	MOVQ    $+2, ret+16(FP)
-	RET
-
-build_table_check1_ok:
-	TESTB CL, CL
-	JNZ   build_table_check2_ok
-	CMPW  R8, DI
-	JNE   build_table_check2_ok
-	MOVQ  ctx+8(FP), AX
-	MOVQ  R8, 24(AX)
-	MOVQ  DI, 32(AX)
-	MOVQ  $+3, ret+16(FP)
-	RET
-
-build_table_check2_ok:
-	INCQ DI
-	CMPQ DI, AX
-	JL   build_table_main_table
-	MOVQ $+0, ret+16(FP)
-	RET
diff --git a/vendor/github.com/klauspost/compress/zstd/fse_decoder_generic.go b/vendor/github.com/klauspost/compress/zstd/fse_decoder_generic.go
deleted file mode 100644
index 8adfebb02..000000000
--- a/vendor/github.com/klauspost/compress/zstd/fse_decoder_generic.go
+++ /dev/null
@@ -1,73 +0,0 @@
-//go:build !amd64 || appengine || !gc || noasm
-// +build !amd64 appengine !gc noasm
-
-package zstd
-
-import (
-	"errors"
-	"fmt"
-)
-
-// buildDtable will build the decoding table.
-func (s *fseDecoder) buildDtable() error {
-	tableSize := uint32(1 << s.actualTableLog)
-	highThreshold := tableSize - 1
-	symbolNext := s.stateTable[:256]
-
-	// Init, lay down lowprob symbols
-	{
-		for i, v := range s.norm[:s.symbolLen] {
-			if v == -1 {
-				s.dt[highThreshold].setAddBits(uint8(i))
-				highThreshold--
-				v = 1
-			}
-			symbolNext[i] = uint16(v)
-		}
-	}
-
-	// Spread symbols
-	{
-		tableMask := tableSize - 1
-		step := tableStep(tableSize)
-		position := uint32(0)
-		for ss, v := range s.norm[:s.symbolLen] {
-			for i := 0; i < int(v); i++ {
-				s.dt[position].setAddBits(uint8(ss))
-				for {
-					// lowprob area
-					position = (position + step) & tableMask
-					if position <= highThreshold {
-						break
-					}
-				}
-			}
-		}
-		if position != 0 {
-			// position must reach all cells once, otherwise normalizedCounter is incorrect
-			return errors.New("corrupted input (position != 0)")
-		}
-	}
-
-	// Build Decoding table
-	{
-		tableSize := uint16(1 << s.actualTableLog)
-		for u, v := range s.dt[:tableSize] {
-			symbol := v.addBits()
-			nextState := symbolNext[symbol]
-			symbolNext[symbol] = nextState + 1
-			nBits := s.actualTableLog - byte(highBits(uint32(nextState)))
-			s.dt[u&maxTableMask].setNBits(nBits)
-			newState := (nextState << nBits) - tableSize
-			if newState > tableSize {
-				return fmt.Errorf("newState (%d) outside table size (%d)", newState, tableSize)
-			}
-			if newState == uint16(u) && nBits == 0 {
-				// Seems weird that this is possible with nbits > 0.
-				return fmt.Errorf("newState (%d) == oldState (%d) and no bits", newState, u)
-			}
-			s.dt[u&maxTableMask].setNewState(newState)
-		}
-	}
-	return nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/fse_encoder.go b/vendor/github.com/klauspost/compress/zstd/fse_encoder.go
deleted file mode 100644
index ab26326a8..000000000
--- a/vendor/github.com/klauspost/compress/zstd/fse_encoder.go
+++ /dev/null
@@ -1,701 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"errors"
-	"fmt"
-	"math"
-)
-
-const (
-	// For encoding we only support up to
-	maxEncTableLog    = 8
-	maxEncTablesize   = 1 << maxTableLog
-	maxEncTableMask   = (1 << maxTableLog) - 1
-	minEncTablelog    = 5
-	maxEncSymbolValue = maxMatchLengthSymbol
-)
-
-// Scratch provides temporary storage for compression and decompression.
-type fseEncoder struct {
-	symbolLen      uint16 // Length of active part of the symbol table.
-	actualTableLog uint8  // Selected tablelog.
-	ct             cTable // Compression tables.
-	maxCount       int    // count of the most probable symbol
-	zeroBits       bool   // no bits has prob > 50%.
-	clearCount     bool   // clear count
-	useRLE         bool   // This encoder is for RLE
-	preDefined     bool   // This encoder is predefined.
-	reUsed         bool   // Set to know when the encoder has been reused.
-	rleVal         uint8  // RLE Symbol
-	maxBits        uint8  // Maximum output bits after transform.
-
-	// TODO: Technically zstd should be fine with 64 bytes.
-	count [256]uint32
-	norm  [256]int16
-}
-
-// cTable contains tables used for compression.
-type cTable struct {
-	tableSymbol []byte
-	stateTable  []uint16
-	symbolTT    []symbolTransform
-}
-
-// symbolTransform contains the state transform for a symbol.
-type symbolTransform struct {
-	deltaNbBits    uint32
-	deltaFindState int16
-	outBits        uint8
-}
-
-// String prints values as a human readable string.
-func (s symbolTransform) String() string {
-	return fmt.Sprintf("{deltabits: %08x, findstate:%d outbits:%d}", s.deltaNbBits, s.deltaFindState, s.outBits)
-}
-
-// Histogram allows to populate the histogram and skip that step in the compression,
-// It otherwise allows to inspect the histogram when compression is done.
-// To indicate that you have populated the histogram call HistogramFinished
-// with the value of the highest populated symbol, as well as the number of entries
-// in the most populated entry. These are accepted at face value.
-func (s *fseEncoder) Histogram() *[256]uint32 {
-	return &s.count
-}
-
-// HistogramFinished can be called to indicate that the histogram has been populated.
-// maxSymbol is the index of the highest set symbol of the next data segment.
-// maxCount is the number of entries in the most populated entry.
-// These are accepted at face value.
-func (s *fseEncoder) HistogramFinished(maxSymbol uint8, maxCount int) {
-	s.maxCount = maxCount
-	s.symbolLen = uint16(maxSymbol) + 1
-	s.clearCount = maxCount != 0
-}
-
-// allocCtable will allocate tables needed for compression.
-// If existing tables a re big enough, they are simply re-used.
-func (s *fseEncoder) allocCtable() {
-	tableSize := 1 << s.actualTableLog
-	// get tableSymbol that is big enough.
-	if cap(s.ct.tableSymbol) < tableSize {
-		s.ct.tableSymbol = make([]byte, tableSize)
-	}
-	s.ct.tableSymbol = s.ct.tableSymbol[:tableSize]
-
-	ctSize := tableSize
-	if cap(s.ct.stateTable) < ctSize {
-		s.ct.stateTable = make([]uint16, ctSize)
-	}
-	s.ct.stateTable = s.ct.stateTable[:ctSize]
-
-	if cap(s.ct.symbolTT) < 256 {
-		s.ct.symbolTT = make([]symbolTransform, 256)
-	}
-	s.ct.symbolTT = s.ct.symbolTT[:256]
-}
-
-// buildCTable will populate the compression table so it is ready to be used.
-func (s *fseEncoder) buildCTable() error {
-	tableSize := uint32(1 << s.actualTableLog)
-	highThreshold := tableSize - 1
-	var cumul [256]int16
-
-	s.allocCtable()
-	tableSymbol := s.ct.tableSymbol[:tableSize]
-	// symbol start positions
-	{
-		cumul[0] = 0
-		for ui, v := range s.norm[:s.symbolLen-1] {
-			u := byte(ui) // one less than reference
-			if v == -1 {
-				// Low proba symbol
-				cumul[u+1] = cumul[u] + 1
-				tableSymbol[highThreshold] = u
-				highThreshold--
-			} else {
-				cumul[u+1] = cumul[u] + v
-			}
-		}
-		// Encode last symbol separately to avoid overflowing u
-		u := int(s.symbolLen - 1)
-		v := s.norm[s.symbolLen-1]
-		if v == -1 {
-			// Low proba symbol
-			cumul[u+1] = cumul[u] + 1
-			tableSymbol[highThreshold] = byte(u)
-			highThreshold--
-		} else {
-			cumul[u+1] = cumul[u] + v
-		}
-		if uint32(cumul[s.symbolLen]) != tableSize {
-			return fmt.Errorf("internal error: expected cumul[s.symbolLen] (%d) == tableSize (%d)", cumul[s.symbolLen], tableSize)
-		}
-		cumul[s.symbolLen] = int16(tableSize) + 1
-	}
-	// Spread symbols
-	s.zeroBits = false
-	{
-		step := tableStep(tableSize)
-		tableMask := tableSize - 1
-		var position uint32
-		// if any symbol > largeLimit, we may have 0 bits output.
-		largeLimit := int16(1 << (s.actualTableLog - 1))
-		for ui, v := range s.norm[:s.symbolLen] {
-			symbol := byte(ui)
-			if v > largeLimit {
-				s.zeroBits = true
-			}
-			for nbOccurrences := int16(0); nbOccurrences < v; nbOccurrences++ {
-				tableSymbol[position] = symbol
-				position = (position + step) & tableMask
-				for position > highThreshold {
-					position = (position + step) & tableMask
-				} /* Low proba area */
-			}
-		}
-
-		// Check if we have gone through all positions
-		if position != 0 {
-			return errors.New("position!=0")
-		}
-	}
-
-	// Build table
-	table := s.ct.stateTable
-	{
-		tsi := int(tableSize)
-		for u, v := range tableSymbol {
-			// TableU16 : sorted by symbol order; gives next state value
-			table[cumul[v]] = uint16(tsi + u)
-			cumul[v]++
-		}
-	}
-
-	// Build Symbol Transformation Table
-	{
-		total := int16(0)
-		symbolTT := s.ct.symbolTT[:s.symbolLen]
-		tableLog := s.actualTableLog
-		tl := (uint32(tableLog) << 16) - (1 << tableLog)
-		for i, v := range s.norm[:s.symbolLen] {
-			switch v {
-			case 0:
-			case -1, 1:
-				symbolTT[i].deltaNbBits = tl
-				symbolTT[i].deltaFindState = total - 1
-				total++
-			default:
-				maxBitsOut := uint32(tableLog) - highBit(uint32(v-1))
-				minStatePlus := uint32(v) << maxBitsOut
-				symbolTT[i].deltaNbBits = (maxBitsOut << 16) - minStatePlus
-				symbolTT[i].deltaFindState = total - v
-				total += v
-			}
-		}
-		if total != int16(tableSize) {
-			return fmt.Errorf("total mismatch %d (got) != %d (want)", total, tableSize)
-		}
-	}
-	return nil
-}
-
-var rtbTable = [...]uint32{0, 473195, 504333, 520860, 550000, 700000, 750000, 830000}
-
-func (s *fseEncoder) setRLE(val byte) {
-	s.allocCtable()
-	s.actualTableLog = 0
-	s.ct.stateTable = s.ct.stateTable[:1]
-	s.ct.symbolTT[val] = symbolTransform{
-		deltaFindState: 0,
-		deltaNbBits:    0,
-	}
-	if debugEncoder {
-		println("setRLE: val", val, "symbolTT", s.ct.symbolTT[val])
-	}
-	s.rleVal = val
-	s.useRLE = true
-}
-
-// setBits will set output bits for the transform.
-// if nil is provided, the number of bits is equal to the index.
-func (s *fseEncoder) setBits(transform []byte) {
-	if s.reUsed || s.preDefined {
-		return
-	}
-	if s.useRLE {
-		if transform == nil {
-			s.ct.symbolTT[s.rleVal].outBits = s.rleVal
-			s.maxBits = s.rleVal
-			return
-		}
-		s.maxBits = transform[s.rleVal]
-		s.ct.symbolTT[s.rleVal].outBits = s.maxBits
-		return
-	}
-	if transform == nil {
-		for i := range s.ct.symbolTT[:s.symbolLen] {
-			s.ct.symbolTT[i].outBits = uint8(i)
-		}
-		s.maxBits = uint8(s.symbolLen - 1)
-		return
-	}
-	s.maxBits = 0
-	for i, v := range transform[:s.symbolLen] {
-		s.ct.symbolTT[i].outBits = v
-		if v > s.maxBits {
-			// We could assume bits always going up, but we play safe.
-			s.maxBits = v
-		}
-	}
-}
-
-// normalizeCount will normalize the count of the symbols so
-// the total is equal to the table size.
-// If successful, compression tables will also be made ready.
-func (s *fseEncoder) normalizeCount(length int) error {
-	if s.reUsed {
-		return nil
-	}
-	s.optimalTableLog(length)
-	var (
-		tableLog          = s.actualTableLog
-		scale             = 62 - uint64(tableLog)
-		step              = (1 << 62) / uint64(length)
-		vStep             = uint64(1) << (scale - 20)
-		stillToDistribute = int16(1 << tableLog)
-		largest           int
-		largestP          int16
-		lowThreshold      = (uint32)(length >> tableLog)
-	)
-	if s.maxCount == length {
-		s.useRLE = true
-		return nil
-	}
-	s.useRLE = false
-	for i, cnt := range s.count[:s.symbolLen] {
-		// already handled
-		// if (count[s] == s.length) return 0;   /* rle special case */
-
-		if cnt == 0 {
-			s.norm[i] = 0
-			continue
-		}
-		if cnt <= lowThreshold {
-			s.norm[i] = -1
-			stillToDistribute--
-		} else {
-			proba := (int16)((uint64(cnt) * step) >> scale)
-			if proba < 8 {
-				restToBeat := vStep * uint64(rtbTable[proba])
-				v := uint64(cnt)*step - (uint64(proba) << scale)
-				if v > restToBeat {
-					proba++
-				}
-			}
-			if proba > largestP {
-				largestP = proba
-				largest = i
-			}
-			s.norm[i] = proba
-			stillToDistribute -= proba
-		}
-	}
-
-	if -stillToDistribute >= (s.norm[largest] >> 1) {
-		// corner case, need another normalization method
-		err := s.normalizeCount2(length)
-		if err != nil {
-			return err
-		}
-		if debugAsserts {
-			err = s.validateNorm()
-			if err != nil {
-				return err
-			}
-		}
-		return s.buildCTable()
-	}
-	s.norm[largest] += stillToDistribute
-	if debugAsserts {
-		err := s.validateNorm()
-		if err != nil {
-			return err
-		}
-	}
-	return s.buildCTable()
-}
-
-// Secondary normalization method.
-// To be used when primary method fails.
-func (s *fseEncoder) normalizeCount2(length int) error {
-	const notYetAssigned = -2
-	var (
-		distributed  uint32
-		total        = uint32(length)
-		tableLog     = s.actualTableLog
-		lowThreshold = total >> tableLog
-		lowOne       = (total * 3) >> (tableLog + 1)
-	)
-	for i, cnt := range s.count[:s.symbolLen] {
-		if cnt == 0 {
-			s.norm[i] = 0
-			continue
-		}
-		if cnt <= lowThreshold {
-			s.norm[i] = -1
-			distributed++
-			total -= cnt
-			continue
-		}
-		if cnt <= lowOne {
-			s.norm[i] = 1
-			distributed++
-			total -= cnt
-			continue
-		}
-		s.norm[i] = notYetAssigned
-	}
-	toDistribute := (1 << tableLog) - distributed
-
-	if (total / toDistribute) > lowOne {
-		// risk of rounding to zero
-		lowOne = (total * 3) / (toDistribute * 2)
-		for i, cnt := range s.count[:s.symbolLen] {
-			if (s.norm[i] == notYetAssigned) && (cnt <= lowOne) {
-				s.norm[i] = 1
-				distributed++
-				total -= cnt
-				continue
-			}
-		}
-		toDistribute = (1 << tableLog) - distributed
-	}
-	if distributed == uint32(s.symbolLen)+1 {
-		// all values are pretty poor;
-		//   probably incompressible data (should have already been detected);
-		//   find max, then give all remaining points to max
-		var maxV int
-		var maxC uint32
-		for i, cnt := range s.count[:s.symbolLen] {
-			if cnt > maxC {
-				maxV = i
-				maxC = cnt
-			}
-		}
-		s.norm[maxV] += int16(toDistribute)
-		return nil
-	}
-
-	if total == 0 {
-		// all of the symbols were low enough for the lowOne or lowThreshold
-		for i := uint32(0); toDistribute > 0; i = (i + 1) % (uint32(s.symbolLen)) {
-			if s.norm[i] > 0 {
-				toDistribute--
-				s.norm[i]++
-			}
-		}
-		return nil
-	}
-
-	var (
-		vStepLog = 62 - uint64(tableLog)
-		mid      = uint64((1 << (vStepLog - 1)) - 1)
-		rStep    = (((1 << vStepLog) * uint64(toDistribute)) + mid) / uint64(total) // scale on remaining
-		tmpTotal = mid
-	)
-	for i, cnt := range s.count[:s.symbolLen] {
-		if s.norm[i] == notYetAssigned {
-			var (
-				end    = tmpTotal + uint64(cnt)*rStep
-				sStart = uint32(tmpTotal >> vStepLog)
-				sEnd   = uint32(end >> vStepLog)
-				weight = sEnd - sStart
-			)
-			if weight < 1 {
-				return errors.New("weight < 1")
-			}
-			s.norm[i] = int16(weight)
-			tmpTotal = end
-		}
-	}
-	return nil
-}
-
-// optimalTableLog calculates and sets the optimal tableLog in s.actualTableLog
-func (s *fseEncoder) optimalTableLog(length int) {
-	tableLog := uint8(maxEncTableLog)
-	minBitsSrc := highBit(uint32(length)) + 1
-	minBitsSymbols := highBit(uint32(s.symbolLen-1)) + 2
-	minBits := uint8(minBitsSymbols)
-	if minBitsSrc < minBitsSymbols {
-		minBits = uint8(minBitsSrc)
-	}
-
-	maxBitsSrc := uint8(highBit(uint32(length-1))) - 2
-	if maxBitsSrc < tableLog {
-		// Accuracy can be reduced
-		tableLog = maxBitsSrc
-	}
-	if minBits > tableLog {
-		tableLog = minBits
-	}
-	// Need a minimum to safely represent all symbol values
-	if tableLog < minEncTablelog {
-		tableLog = minEncTablelog
-	}
-	if tableLog > maxEncTableLog {
-		tableLog = maxEncTableLog
-	}
-	s.actualTableLog = tableLog
-}
-
-// validateNorm validates the normalized histogram table.
-func (s *fseEncoder) validateNorm() (err error) {
-	var total int
-	for _, v := range s.norm[:s.symbolLen] {
-		if v >= 0 {
-			total += int(v)
-		} else {
-			total -= int(v)
-		}
-	}
-	defer func() {
-		if err == nil {
-			return
-		}
-		fmt.Printf("selected TableLog: %d, Symbol length: %d\n", s.actualTableLog, s.symbolLen)
-		for i, v := range s.norm[:s.symbolLen] {
-			fmt.Printf("%3d: %5d -> %4d \n", i, s.count[i], v)
-		}
-	}()
-	if total != (1 << s.actualTableLog) {
-		return fmt.Errorf("warning: Total == %d != %d", total, 1<<s.actualTableLog)
-	}
-	for i, v := range s.count[s.symbolLen:] {
-		if v != 0 {
-			return fmt.Errorf("warning: Found symbol out of range, %d after cut", i)
-		}
-	}
-	return nil
-}
-
-// writeCount will write the normalized histogram count to header.
-// This is read back by readNCount.
-func (s *fseEncoder) writeCount(out []byte) ([]byte, error) {
-	if s.useRLE {
-		return append(out, s.rleVal), nil
-	}
-	if s.preDefined || s.reUsed {
-		// Never write predefined.
-		return out, nil
-	}
-
-	var (
-		tableLog  = s.actualTableLog
-		tableSize = 1 << tableLog
-		previous0 bool
-		charnum   uint16
-
-		// maximum header size plus 2 extra bytes for final output if bitCount == 0.
-		maxHeaderSize = ((int(s.symbolLen) * int(tableLog)) >> 3) + 3 + 2
-
-		// Write Table Size
-		bitStream = uint32(tableLog - minEncTablelog)
-		bitCount  = uint(4)
-		remaining = int16(tableSize + 1) /* +1 for extra accuracy */
-		threshold = int16(tableSize)
-		nbBits    = uint(tableLog + 1)
-		outP      = len(out)
-	)
-	if cap(out) < outP+maxHeaderSize {
-		out = append(out, make([]byte, maxHeaderSize*3)...)
-		out = out[:len(out)-maxHeaderSize*3]
-	}
-	out = out[:outP+maxHeaderSize]
-
-	// stops at 1
-	for remaining > 1 {
-		if previous0 {
-			start := charnum
-			for s.norm[charnum] == 0 {
-				charnum++
-			}
-			for charnum >= start+24 {
-				start += 24
-				bitStream += uint32(0xFFFF) << bitCount
-				out[outP] = byte(bitStream)
-				out[outP+1] = byte(bitStream >> 8)
-				outP += 2
-				bitStream >>= 16
-			}
-			for charnum >= start+3 {
-				start += 3
-				bitStream += 3 << bitCount
-				bitCount += 2
-			}
-			bitStream += uint32(charnum-start) << bitCount
-			bitCount += 2
-			if bitCount > 16 {
-				out[outP] = byte(bitStream)
-				out[outP+1] = byte(bitStream >> 8)
-				outP += 2
-				bitStream >>= 16
-				bitCount -= 16
-			}
-		}
-
-		count := s.norm[charnum]
-		charnum++
-		max := (2*threshold - 1) - remaining
-		if count < 0 {
-			remaining += count
-		} else {
-			remaining -= count
-		}
-		count++ // +1 for extra accuracy
-		if count >= threshold {
-			count += max // [0..max[ [max..threshold[ (...) [threshold+max 2*threshold[
-		}
-		bitStream += uint32(count) << bitCount
-		bitCount += nbBits
-		if count < max {
-			bitCount--
-		}
-
-		previous0 = count == 1
-		if remaining < 1 {
-			return nil, errors.New("internal error: remaining < 1")
-		}
-		for remaining < threshold {
-			nbBits--
-			threshold >>= 1
-		}
-
-		if bitCount > 16 {
-			out[outP] = byte(bitStream)
-			out[outP+1] = byte(bitStream >> 8)
-			outP += 2
-			bitStream >>= 16
-			bitCount -= 16
-		}
-	}
-
-	if outP+2 > len(out) {
-		return nil, fmt.Errorf("internal error: %d > %d, maxheader: %d, sl: %d, tl: %d, normcount: %v", outP+2, len(out), maxHeaderSize, s.symbolLen, int(tableLog), s.norm[:s.symbolLen])
-	}
-	out[outP] = byte(bitStream)
-	out[outP+1] = byte(bitStream >> 8)
-	outP += int((bitCount + 7) / 8)
-
-	if charnum > s.symbolLen {
-		return nil, errors.New("internal error: charnum > s.symbolLen")
-	}
-	return out[:outP], nil
-}
-
-// Approximate symbol cost, as fractional value, using fixed-point format (accuracyLog fractional bits)
-// note 1 : assume symbolValue is valid (<= maxSymbolValue)
-// note 2 : if freq[symbolValue]==0, @return a fake cost of tableLog+1 bits *
-func (s *fseEncoder) bitCost(symbolValue uint8, accuracyLog uint32) uint32 {
-	minNbBits := s.ct.symbolTT[symbolValue].deltaNbBits >> 16
-	threshold := (minNbBits + 1) << 16
-	if debugAsserts {
-		if !(s.actualTableLog < 16) {
-			panic("!s.actualTableLog < 16")
-		}
-		// ensure enough room for renormalization double shift
-		if !(uint8(accuracyLog) < 31-s.actualTableLog) {
-			panic("!uint8(accuracyLog) < 31-s.actualTableLog")
-		}
-	}
-	tableSize := uint32(1) << s.actualTableLog
-	deltaFromThreshold := threshold - (s.ct.symbolTT[symbolValue].deltaNbBits + tableSize)
-	// linear interpolation (very approximate)
-	normalizedDeltaFromThreshold := (deltaFromThreshold << accuracyLog) >> s.actualTableLog
-	bitMultiplier := uint32(1) << accuracyLog
-	if debugAsserts {
-		if s.ct.symbolTT[symbolValue].deltaNbBits+tableSize > threshold {
-			panic("s.ct.symbolTT[symbolValue].deltaNbBits+tableSize > threshold")
-		}
-		if normalizedDeltaFromThreshold > bitMultiplier {
-			panic("normalizedDeltaFromThreshold > bitMultiplier")
-		}
-	}
-	return (minNbBits+1)*bitMultiplier - normalizedDeltaFromThreshold
-}
-
-// Returns the cost in bits of encoding the distribution in count using ctable.
-// Histogram should only be up to the last non-zero symbol.
-// Returns an -1 if ctable cannot represent all the symbols in count.
-func (s *fseEncoder) approxSize(hist []uint32) uint32 {
-	if int(s.symbolLen) < len(hist) {
-		// More symbols than we have.
-		return math.MaxUint32
-	}
-	if s.useRLE {
-		// We will never reuse RLE encoders.
-		return math.MaxUint32
-	}
-	const kAccuracyLog = 8
-	badCost := (uint32(s.actualTableLog) + 1) << kAccuracyLog
-	var cost uint32
-	for i, v := range hist {
-		if v == 0 {
-			continue
-		}
-		if s.norm[i] == 0 {
-			return math.MaxUint32
-		}
-		bitCost := s.bitCost(uint8(i), kAccuracyLog)
-		if bitCost > badCost {
-			return math.MaxUint32
-		}
-		cost += v * bitCost
-	}
-	return cost >> kAccuracyLog
-}
-
-// maxHeaderSize returns the maximum header size in bits.
-// This is not exact size, but we want a penalty for new tables anyway.
-func (s *fseEncoder) maxHeaderSize() uint32 {
-	if s.preDefined {
-		return 0
-	}
-	if s.useRLE {
-		return 8
-	}
-	return (((uint32(s.symbolLen) * uint32(s.actualTableLog)) >> 3) + 3) * 8
-}
-
-// cState contains the compression state of a stream.
-type cState struct {
-	bw         *bitWriter
-	stateTable []uint16
-	state      uint16
-}
-
-// init will initialize the compression state to the first symbol of the stream.
-func (c *cState) init(bw *bitWriter, ct *cTable, first symbolTransform) {
-	c.bw = bw
-	c.stateTable = ct.stateTable
-	if len(c.stateTable) == 1 {
-		// RLE
-		c.stateTable[0] = uint16(0)
-		c.state = 0
-		return
-	}
-	nbBitsOut := (first.deltaNbBits + (1 << 15)) >> 16
-	im := int32((nbBitsOut << 16) - first.deltaNbBits)
-	lu := (im >> nbBitsOut) + int32(first.deltaFindState)
-	c.state = c.stateTable[lu]
-}
-
-// flush will write the tablelog to the output and flush the remaining full bytes.
-func (c *cState) flush(tableLog uint8) {
-	c.bw.flush32()
-	c.bw.addBits16NC(c.state, tableLog)
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/fse_predefined.go b/vendor/github.com/klauspost/compress/zstd/fse_predefined.go
deleted file mode 100644
index 474cb77d2..000000000
--- a/vendor/github.com/klauspost/compress/zstd/fse_predefined.go
+++ /dev/null
@@ -1,158 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"fmt"
-	"math"
-	"sync"
-)
-
-var (
-	// fsePredef are the predefined fse tables as defined here:
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#default-distributions
-	// These values are already transformed.
-	fsePredef [3]fseDecoder
-
-	// fsePredefEnc are the predefined encoder based on fse tables as defined here:
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#default-distributions
-	// These values are already transformed.
-	fsePredefEnc [3]fseEncoder
-
-	// symbolTableX contain the transformations needed for each type as defined in
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#the-codes-for-literals-lengths-match-lengths-and-offsets
-	symbolTableX [3][]baseOffset
-
-	// maxTableSymbol is the biggest supported symbol for each table type
-	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#the-codes-for-literals-lengths-match-lengths-and-offsets
-	maxTableSymbol = [3]uint8{tableLiteralLengths: maxLiteralLengthSymbol, tableOffsets: maxOffsetLengthSymbol, tableMatchLengths: maxMatchLengthSymbol}
-
-	// bitTables is the bits table for each table.
-	bitTables = [3][]byte{tableLiteralLengths: llBitsTable[:], tableOffsets: nil, tableMatchLengths: mlBitsTable[:]}
-)
-
-type tableIndex uint8
-
-const (
-	// indexes for fsePredef and symbolTableX
-	tableLiteralLengths tableIndex = 0
-	tableOffsets        tableIndex = 1
-	tableMatchLengths   tableIndex = 2
-
-	maxLiteralLengthSymbol = 35
-	maxOffsetLengthSymbol  = 30
-	maxMatchLengthSymbol   = 52
-)
-
-// baseOffset is used for calculating transformations.
-type baseOffset struct {
-	baseLine uint32
-	addBits  uint8
-}
-
-// fillBase will precalculate base offsets with the given bit distributions.
-func fillBase(dst []baseOffset, base uint32, bits ...uint8) {
-	if len(bits) != len(dst) {
-		panic(fmt.Sprintf("len(dst) (%d) != len(bits) (%d)", len(dst), len(bits)))
-	}
-	for i, bit := range bits {
-		if base > math.MaxInt32 {
-			panic("invalid decoding table, base overflows int32")
-		}
-
-		dst[i] = baseOffset{
-			baseLine: base,
-			addBits:  bit,
-		}
-		base += 1 << bit
-	}
-}
-
-var predef sync.Once
-
-func initPredefined() {
-	predef.Do(func() {
-		// Literals length codes
-		tmp := make([]baseOffset, 36)
-		for i := range tmp[:16] {
-			tmp[i] = baseOffset{
-				baseLine: uint32(i),
-				addBits:  0,
-			}
-		}
-		fillBase(tmp[16:], 16, 1, 1, 1, 1, 2, 2, 3, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-		symbolTableX[tableLiteralLengths] = tmp
-
-		// Match length codes
-		tmp = make([]baseOffset, 53)
-		for i := range tmp[:32] {
-			tmp[i] = baseOffset{
-				// The transformation adds the 3 length.
-				baseLine: uint32(i) + 3,
-				addBits:  0,
-			}
-		}
-		fillBase(tmp[32:], 35, 1, 1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-		symbolTableX[tableMatchLengths] = tmp
-
-		// Offset codes
-		tmp = make([]baseOffset, maxOffsetBits+1)
-		tmp[1] = baseOffset{
-			baseLine: 1,
-			addBits:  1,
-		}
-		fillBase(tmp[2:], 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-		symbolTableX[tableOffsets] = tmp
-
-		// Fill predefined tables and transform them.
-		// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#default-distributions
-		for i := range fsePredef[:] {
-			f := &fsePredef[i]
-			switch tableIndex(i) {
-			case tableLiteralLengths:
-				// https://github.com/facebook/zstd/blob/ededcfca57366461021c922720878c81a5854a0a/lib/decompress/zstd_decompress_block.c#L243
-				f.actualTableLog = 6
-				copy(f.norm[:], []int16{4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1,
-					2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1,
-					-1, -1, -1, -1})
-				f.symbolLen = 36
-			case tableOffsets:
-				// https://github.com/facebook/zstd/blob/ededcfca57366461021c922720878c81a5854a0a/lib/decompress/zstd_decompress_block.c#L281
-				f.actualTableLog = 5
-				copy(f.norm[:], []int16{
-					1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
-					1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1})
-				f.symbolLen = 29
-			case tableMatchLengths:
-				//https://github.com/facebook/zstd/blob/ededcfca57366461021c922720878c81a5854a0a/lib/decompress/zstd_decompress_block.c#L304
-				f.actualTableLog = 6
-				copy(f.norm[:], []int16{
-					1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
-					1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-					1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1,
-					-1, -1, -1, -1, -1})
-				f.symbolLen = 53
-			}
-			if err := f.buildDtable(); err != nil {
-				panic(fmt.Errorf("building table %v: %v", tableIndex(i), err))
-			}
-			if err := f.transform(symbolTableX[i]); err != nil {
-				panic(fmt.Errorf("building table %v: %v", tableIndex(i), err))
-			}
-			f.preDefined = true
-
-			// Create encoder as well
-			enc := &fsePredefEnc[i]
-			copy(enc.norm[:], f.norm[:])
-			enc.symbolLen = f.symbolLen
-			enc.actualTableLog = f.actualTableLog
-			if err := enc.buildCTable(); err != nil {
-				panic(fmt.Errorf("building encoding table %v: %v", tableIndex(i), err))
-			}
-			enc.setBits(bitTables[i])
-			enc.preDefined = true
-		}
-	})
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/hash.go b/vendor/github.com/klauspost/compress/zstd/hash.go
deleted file mode 100644
index 5d73c21eb..000000000
--- a/vendor/github.com/klauspost/compress/zstd/hash.go
+++ /dev/null
@@ -1,35 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-const (
-	prime3bytes = 506832829
-	prime4bytes = 2654435761
-	prime5bytes = 889523592379
-	prime6bytes = 227718039650203
-	prime7bytes = 58295818150454627
-	prime8bytes = 0xcf1bbcdcb7a56463
-)
-
-// hashLen returns a hash of the lowest mls bytes of with length output bits.
-// mls must be >=3 and <=8. Any other value will return hash for 4 bytes.
-// length should always be < 32.
-// Preferably length and mls should be a constant for inlining.
-func hashLen(u uint64, length, mls uint8) uint32 {
-	switch mls {
-	case 3:
-		return (uint32(u<<8) * prime3bytes) >> (32 - length)
-	case 5:
-		return uint32(((u << (64 - 40)) * prime5bytes) >> (64 - length))
-	case 6:
-		return uint32(((u << (64 - 48)) * prime6bytes) >> (64 - length))
-	case 7:
-		return uint32(((u << (64 - 56)) * prime7bytes) >> (64 - length))
-	case 8:
-		return uint32((u * prime8bytes) >> (64 - length))
-	default:
-		return (uint32(u) * prime4bytes) >> (32 - length)
-	}
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/history.go b/vendor/github.com/klauspost/compress/zstd/history.go
deleted file mode 100644
index 09164856d..000000000
--- a/vendor/github.com/klauspost/compress/zstd/history.go
+++ /dev/null
@@ -1,116 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"github.com/klauspost/compress/huff0"
-)
-
-// history contains the information transferred between blocks.
-type history struct {
-	// Literal decompression
-	huffTree *huff0.Scratch
-
-	// Sequence decompression
-	decoders      sequenceDecs
-	recentOffsets [3]int
-
-	// History buffer...
-	b []byte
-
-	// ignoreBuffer is meant to ignore a number of bytes
-	// when checking for matches in history
-	ignoreBuffer int
-
-	windowSize       int
-	allocFrameBuffer int // needed?
-	error            bool
-	dict             *dict
-}
-
-// reset will reset the history to initial state of a frame.
-// The history must already have been initialized to the desired size.
-func (h *history) reset() {
-	h.b = h.b[:0]
-	h.ignoreBuffer = 0
-	h.error = false
-	h.recentOffsets = [3]int{1, 4, 8}
-	h.decoders.freeDecoders()
-	h.decoders = sequenceDecs{br: h.decoders.br}
-	h.freeHuffDecoder()
-	h.huffTree = nil
-	h.dict = nil
-	//printf("history created: %+v (l: %d, c: %d)", *h, len(h.b), cap(h.b))
-}
-
-func (h *history) freeHuffDecoder() {
-	if h.huffTree != nil {
-		if h.dict == nil || h.dict.litEnc != h.huffTree {
-			huffDecoderPool.Put(h.huffTree)
-			h.huffTree = nil
-		}
-	}
-}
-
-func (h *history) setDict(dict *dict) {
-	if dict == nil {
-		return
-	}
-	h.dict = dict
-	h.decoders.litLengths = dict.llDec
-	h.decoders.offsets = dict.ofDec
-	h.decoders.matchLengths = dict.mlDec
-	h.decoders.dict = dict.content
-	h.recentOffsets = dict.offsets
-	h.huffTree = dict.litEnc
-}
-
-// append bytes to history.
-// This function will make sure there is space for it,
-// if the buffer has been allocated with enough extra space.
-func (h *history) append(b []byte) {
-	if len(b) >= h.windowSize {
-		// Discard all history by simply overwriting
-		h.b = h.b[:h.windowSize]
-		copy(h.b, b[len(b)-h.windowSize:])
-		return
-	}
-
-	// If there is space, append it.
-	if len(b) < cap(h.b)-len(h.b) {
-		h.b = append(h.b, b...)
-		return
-	}
-
-	// Move data down so we only have window size left.
-	// We know we have less than window size in b at this point.
-	discard := len(b) + len(h.b) - h.windowSize
-	copy(h.b, h.b[discard:])
-	h.b = h.b[:h.windowSize]
-	copy(h.b[h.windowSize-len(b):], b)
-}
-
-// ensureBlock will ensure there is space for at least one block...
-func (h *history) ensureBlock() {
-	if cap(h.b) < h.allocFrameBuffer {
-		h.b = make([]byte, 0, h.allocFrameBuffer)
-		return
-	}
-
-	avail := cap(h.b) - len(h.b)
-	if avail >= h.windowSize || avail > maxCompressedBlockSize {
-		return
-	}
-	// Move data down so we only have window size left.
-	// We know we have less than window size in b at this point.
-	discard := len(h.b) - h.windowSize
-	copy(h.b, h.b[discard:])
-	h.b = h.b[:h.windowSize]
-}
-
-// append bytes to history without ever discarding anything.
-func (h *history) appendKeep(b []byte) {
-	h.b = append(h.b, b...)
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/LICENSE.txt b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/LICENSE.txt
deleted file mode 100644
index 24b53065f..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/LICENSE.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-Copyright (c) 2016 Caleb Spare
-
-MIT License
-
-Permission is hereby granted, free of charge, to any person obtaining
-a copy of this software and associated documentation files (the
-"Software"), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
-
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
-LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
-OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/README.md b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/README.md
deleted file mode 100644
index 777290d44..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/README.md
+++ /dev/null
@@ -1,71 +0,0 @@
-# xxhash
-
-VENDORED: Go to [github.com/cespare/xxhash](https://github.com/cespare/xxhash) for original package.
-
-xxhash is a Go implementation of the 64-bit [xxHash] algorithm, XXH64. This is a
-high-quality hashing algorithm that is much faster than anything in the Go
-standard library.
-
-This package provides a straightforward API:
-
-```
-func Sum64(b []byte) uint64
-func Sum64String(s string) uint64
-type Digest struct{ ... }
-    func New() *Digest
-```
-
-The `Digest` type implements hash.Hash64. Its key methods are:
-
-```
-func (*Digest) Write([]byte) (int, error)
-func (*Digest) WriteString(string) (int, error)
-func (*Digest) Sum64() uint64
-```
-
-The package is written with optimized pure Go and also contains even faster
-assembly implementations for amd64 and arm64. If desired, the `purego` build tag
-opts into using the Go code even on those architectures.
-
-[xxHash]: http://cyan4973.github.io/xxHash/
-
-## Compatibility
-
-This package is in a module and the latest code is in version 2 of the module.
-You need a version of Go with at least "minimal module compatibility" to use
-github.com/cespare/xxhash/v2:
-
-* 1.9.7+ for Go 1.9
-* 1.10.3+ for Go 1.10
-* Go 1.11 or later
-
-I recommend using the latest release of Go.
-
-## Benchmarks
-
-Here are some quick benchmarks comparing the pure-Go and assembly
-implementations of Sum64.
-
-| input size | purego    | asm       |
-| ---------- | --------- | --------- |
-| 4 B        |  1.3 GB/s |  1.2 GB/s |
-| 16 B       |  2.9 GB/s |  3.5 GB/s |
-| 100 B      |  6.9 GB/s |  8.1 GB/s |
-| 4 KB       | 11.7 GB/s | 16.7 GB/s |
-| 10 MB      | 12.0 GB/s | 17.3 GB/s |
-
-These numbers were generated on Ubuntu 20.04 with an Intel Xeon Platinum 8252C
-CPU using the following commands under Go 1.19.2:
-
-```
-benchstat <(go test -tags purego -benchtime 500ms -count 15 -bench 'Sum64$')
-benchstat <(go test -benchtime 500ms -count 15 -bench 'Sum64$')
-```
-
-## Projects using this package
-
-- [InfluxDB](https://github.com/influxdata/influxdb)
-- [Prometheus](https://github.com/prometheus/prometheus)
-- [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)
-- [FreeCache](https://github.com/coocood/freecache)
-- [FastCache](https://github.com/VictoriaMetrics/fastcache)
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash.go b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash.go
deleted file mode 100644
index fc40c8200..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash.go
+++ /dev/null
@@ -1,230 +0,0 @@
-// Package xxhash implements the 64-bit variant of xxHash (XXH64) as described
-// at http://cyan4973.github.io/xxHash/.
-// THIS IS VENDORED: Go to github.com/cespare/xxhash for original package.
-
-package xxhash
-
-import (
-	"encoding/binary"
-	"errors"
-	"math/bits"
-)
-
-const (
-	prime1 uint64 = 11400714785074694791
-	prime2 uint64 = 14029467366897019727
-	prime3 uint64 = 1609587929392839161
-	prime4 uint64 = 9650029242287828579
-	prime5 uint64 = 2870177450012600261
-)
-
-// Store the primes in an array as well.
-//
-// The consts are used when possible in Go code to avoid MOVs but we need a
-// contiguous array of the assembly code.
-var primes = [...]uint64{prime1, prime2, prime3, prime4, prime5}
-
-// Digest implements hash.Hash64.
-type Digest struct {
-	v1    uint64
-	v2    uint64
-	v3    uint64
-	v4    uint64
-	total uint64
-	mem   [32]byte
-	n     int // how much of mem is used
-}
-
-// New creates a new Digest that computes the 64-bit xxHash algorithm.
-func New() *Digest {
-	var d Digest
-	d.Reset()
-	return &d
-}
-
-// Reset clears the Digest's state so that it can be reused.
-func (d *Digest) Reset() {
-	d.v1 = primes[0] + prime2
-	d.v2 = prime2
-	d.v3 = 0
-	d.v4 = -primes[0]
-	d.total = 0
-	d.n = 0
-}
-
-// Size always returns 8 bytes.
-func (d *Digest) Size() int { return 8 }
-
-// BlockSize always returns 32 bytes.
-func (d *Digest) BlockSize() int { return 32 }
-
-// Write adds more data to d. It always returns len(b), nil.
-func (d *Digest) Write(b []byte) (n int, err error) {
-	n = len(b)
-	d.total += uint64(n)
-
-	memleft := d.mem[d.n&(len(d.mem)-1):]
-
-	if d.n+n < 32 {
-		// This new data doesn't even fill the current block.
-		copy(memleft, b)
-		d.n += n
-		return
-	}
-
-	if d.n > 0 {
-		// Finish off the partial block.
-		c := copy(memleft, b)
-		d.v1 = round(d.v1, u64(d.mem[0:8]))
-		d.v2 = round(d.v2, u64(d.mem[8:16]))
-		d.v3 = round(d.v3, u64(d.mem[16:24]))
-		d.v4 = round(d.v4, u64(d.mem[24:32]))
-		b = b[c:]
-		d.n = 0
-	}
-
-	if len(b) >= 32 {
-		// One or more full blocks left.
-		nw := writeBlocks(d, b)
-		b = b[nw:]
-	}
-
-	// Store any remaining partial block.
-	copy(d.mem[:], b)
-	d.n = len(b)
-
-	return
-}
-
-// Sum appends the current hash to b and returns the resulting slice.
-func (d *Digest) Sum(b []byte) []byte {
-	s := d.Sum64()
-	return append(
-		b,
-		byte(s>>56),
-		byte(s>>48),
-		byte(s>>40),
-		byte(s>>32),
-		byte(s>>24),
-		byte(s>>16),
-		byte(s>>8),
-		byte(s),
-	)
-}
-
-// Sum64 returns the current hash.
-func (d *Digest) Sum64() uint64 {
-	var h uint64
-
-	if d.total >= 32 {
-		v1, v2, v3, v4 := d.v1, d.v2, d.v3, d.v4
-		h = rol1(v1) + rol7(v2) + rol12(v3) + rol18(v4)
-		h = mergeRound(h, v1)
-		h = mergeRound(h, v2)
-		h = mergeRound(h, v3)
-		h = mergeRound(h, v4)
-	} else {
-		h = d.v3 + prime5
-	}
-
-	h += d.total
-
-	b := d.mem[:d.n&(len(d.mem)-1)]
-	for ; len(b) >= 8; b = b[8:] {
-		k1 := round(0, u64(b[:8]))
-		h ^= k1
-		h = rol27(h)*prime1 + prime4
-	}
-	if len(b) >= 4 {
-		h ^= uint64(u32(b[:4])) * prime1
-		h = rol23(h)*prime2 + prime3
-		b = b[4:]
-	}
-	for ; len(b) > 0; b = b[1:] {
-		h ^= uint64(b[0]) * prime5
-		h = rol11(h) * prime1
-	}
-
-	h ^= h >> 33
-	h *= prime2
-	h ^= h >> 29
-	h *= prime3
-	h ^= h >> 32
-
-	return h
-}
-
-const (
-	magic         = "xxh\x06"
-	marshaledSize = len(magic) + 8*5 + 32
-)
-
-// MarshalBinary implements the encoding.BinaryMarshaler interface.
-func (d *Digest) MarshalBinary() ([]byte, error) {
-	b := make([]byte, 0, marshaledSize)
-	b = append(b, magic...)
-	b = appendUint64(b, d.v1)
-	b = appendUint64(b, d.v2)
-	b = appendUint64(b, d.v3)
-	b = appendUint64(b, d.v4)
-	b = appendUint64(b, d.total)
-	b = append(b, d.mem[:d.n]...)
-	b = b[:len(b)+len(d.mem)-d.n]
-	return b, nil
-}
-
-// UnmarshalBinary implements the encoding.BinaryUnmarshaler interface.
-func (d *Digest) UnmarshalBinary(b []byte) error {
-	if len(b) < len(magic) || string(b[:len(magic)]) != magic {
-		return errors.New("xxhash: invalid hash state identifier")
-	}
-	if len(b) != marshaledSize {
-		return errors.New("xxhash: invalid hash state size")
-	}
-	b = b[len(magic):]
-	b, d.v1 = consumeUint64(b)
-	b, d.v2 = consumeUint64(b)
-	b, d.v3 = consumeUint64(b)
-	b, d.v4 = consumeUint64(b)
-	b, d.total = consumeUint64(b)
-	copy(d.mem[:], b)
-	d.n = int(d.total % uint64(len(d.mem)))
-	return nil
-}
-
-func appendUint64(b []byte, x uint64) []byte {
-	var a [8]byte
-	binary.LittleEndian.PutUint64(a[:], x)
-	return append(b, a[:]...)
-}
-
-func consumeUint64(b []byte) ([]byte, uint64) {
-	x := u64(b)
-	return b[8:], x
-}
-
-func u64(b []byte) uint64 { return binary.LittleEndian.Uint64(b) }
-func u32(b []byte) uint32 { return binary.LittleEndian.Uint32(b) }
-
-func round(acc, input uint64) uint64 {
-	acc += input * prime2
-	acc = rol31(acc)
-	acc *= prime1
-	return acc
-}
-
-func mergeRound(acc, val uint64) uint64 {
-	val = round(0, val)
-	acc ^= val
-	acc = acc*prime1 + prime4
-	return acc
-}
-
-func rol1(x uint64) uint64  { return bits.RotateLeft64(x, 1) }
-func rol7(x uint64) uint64  { return bits.RotateLeft64(x, 7) }
-func rol11(x uint64) uint64 { return bits.RotateLeft64(x, 11) }
-func rol12(x uint64) uint64 { return bits.RotateLeft64(x, 12) }
-func rol18(x uint64) uint64 { return bits.RotateLeft64(x, 18) }
-func rol23(x uint64) uint64 { return bits.RotateLeft64(x, 23) }
-func rol27(x uint64) uint64 { return bits.RotateLeft64(x, 27) }
-func rol31(x uint64) uint64 { return bits.RotateLeft64(x, 31) }
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_amd64.s b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_amd64.s
deleted file mode 100644
index ddb63aa91..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_amd64.s
+++ /dev/null
@@ -1,210 +0,0 @@
-//go:build !appengine && gc && !purego && !noasm
-// +build !appengine
-// +build gc
-// +build !purego
-// +build !noasm
-
-#include "textflag.h"
-
-// Registers:
-#define h      AX
-#define d      AX
-#define p      SI // pointer to advance through b
-#define n      DX
-#define end    BX // loop end
-#define v1     R8
-#define v2     R9
-#define v3     R10
-#define v4     R11
-#define x      R12
-#define prime1 R13
-#define prime2 R14
-#define prime4 DI
-
-#define round(acc, x) \
-	IMULQ prime2, x   \
-	ADDQ  x, acc      \
-	ROLQ  $31, acc    \
-	IMULQ prime1, acc
-
-// round0 performs the operation x = round(0, x).
-#define round0(x) \
-	IMULQ prime2, x \
-	ROLQ  $31, x    \
-	IMULQ prime1, x
-
-// mergeRound applies a merge round on the two registers acc and x.
-// It assumes that prime1, prime2, and prime4 have been loaded.
-#define mergeRound(acc, x) \
-	round0(x)         \
-	XORQ  x, acc      \
-	IMULQ prime1, acc \
-	ADDQ  prime4, acc
-
-// blockLoop processes as many 32-byte blocks as possible,
-// updating v1, v2, v3, and v4. It assumes that there is at least one block
-// to process.
-#define blockLoop() \
-loop:  \
-	MOVQ +0(p), x  \
-	round(v1, x)   \
-	MOVQ +8(p), x  \
-	round(v2, x)   \
-	MOVQ +16(p), x \
-	round(v3, x)   \
-	MOVQ +24(p), x \
-	round(v4, x)   \
-	ADDQ $32, p    \
-	CMPQ p, end    \
-	JLE  loop
-
-// func Sum64(b []byte) uint64
-TEXT ·Sum64(SB), NOSPLIT|NOFRAME, $0-32
-	// Load fixed primes.
-	MOVQ ·primes+0(SB), prime1
-	MOVQ ·primes+8(SB), prime2
-	MOVQ ·primes+24(SB), prime4
-
-	// Load slice.
-	MOVQ b_base+0(FP), p
-	MOVQ b_len+8(FP), n
-	LEAQ (p)(n*1), end
-
-	// The first loop limit will be len(b)-32.
-	SUBQ $32, end
-
-	// Check whether we have at least one block.
-	CMPQ n, $32
-	JLT  noBlocks
-
-	// Set up initial state (v1, v2, v3, v4).
-	MOVQ prime1, v1
-	ADDQ prime2, v1
-	MOVQ prime2, v2
-	XORQ v3, v3
-	XORQ v4, v4
-	SUBQ prime1, v4
-
-	blockLoop()
-
-	MOVQ v1, h
-	ROLQ $1, h
-	MOVQ v2, x
-	ROLQ $7, x
-	ADDQ x, h
-	MOVQ v3, x
-	ROLQ $12, x
-	ADDQ x, h
-	MOVQ v4, x
-	ROLQ $18, x
-	ADDQ x, h
-
-	mergeRound(h, v1)
-	mergeRound(h, v2)
-	mergeRound(h, v3)
-	mergeRound(h, v4)
-
-	JMP afterBlocks
-
-noBlocks:
-	MOVQ ·primes+32(SB), h
-
-afterBlocks:
-	ADDQ n, h
-
-	ADDQ $24, end
-	CMPQ p, end
-	JG   try4
-
-loop8:
-	MOVQ  (p), x
-	ADDQ  $8, p
-	round0(x)
-	XORQ  x, h
-	ROLQ  $27, h
-	IMULQ prime1, h
-	ADDQ  prime4, h
-
-	CMPQ p, end
-	JLE  loop8
-
-try4:
-	ADDQ $4, end
-	CMPQ p, end
-	JG   try1
-
-	MOVL  (p), x
-	ADDQ  $4, p
-	IMULQ prime1, x
-	XORQ  x, h
-
-	ROLQ  $23, h
-	IMULQ prime2, h
-	ADDQ  ·primes+16(SB), h
-
-try1:
-	ADDQ $4, end
-	CMPQ p, end
-	JGE  finalize
-
-loop1:
-	MOVBQZX (p), x
-	ADDQ    $1, p
-	IMULQ   ·primes+32(SB), x
-	XORQ    x, h
-	ROLQ    $11, h
-	IMULQ   prime1, h
-
-	CMPQ p, end
-	JL   loop1
-
-finalize:
-	MOVQ  h, x
-	SHRQ  $33, x
-	XORQ  x, h
-	IMULQ prime2, h
-	MOVQ  h, x
-	SHRQ  $29, x
-	XORQ  x, h
-	IMULQ ·primes+16(SB), h
-	MOVQ  h, x
-	SHRQ  $32, x
-	XORQ  x, h
-
-	MOVQ h, ret+24(FP)
-	RET
-
-// func writeBlocks(d *Digest, b []byte) int
-TEXT ·writeBlocks(SB), NOSPLIT|NOFRAME, $0-40
-	// Load fixed primes needed for round.
-	MOVQ ·primes+0(SB), prime1
-	MOVQ ·primes+8(SB), prime2
-
-	// Load slice.
-	MOVQ b_base+8(FP), p
-	MOVQ b_len+16(FP), n
-	LEAQ (p)(n*1), end
-	SUBQ $32, end
-
-	// Load vN from d.
-	MOVQ s+0(FP), d
-	MOVQ 0(d), v1
-	MOVQ 8(d), v2
-	MOVQ 16(d), v3
-	MOVQ 24(d), v4
-
-	// We don't need to check the loop condition here; this function is
-	// always called with at least one block of data to process.
-	blockLoop()
-
-	// Copy vN back to d.
-	MOVQ v1, 0(d)
-	MOVQ v2, 8(d)
-	MOVQ v3, 16(d)
-	MOVQ v4, 24(d)
-
-	// The number of bytes written is p minus the old base pointer.
-	SUBQ b_base+8(FP), p
-	MOVQ p, ret+32(FP)
-
-	RET
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_arm64.s b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_arm64.s
deleted file mode 100644
index ae7d4d329..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_arm64.s
+++ /dev/null
@@ -1,184 +0,0 @@
-//go:build !appengine && gc && !purego && !noasm
-// +build !appengine
-// +build gc
-// +build !purego
-// +build !noasm
-
-#include "textflag.h"
-
-// Registers:
-#define digest	R1
-#define h	R2 // return value
-#define p	R3 // input pointer
-#define n	R4 // input length
-#define nblocks	R5 // n / 32
-#define prime1	R7
-#define prime2	R8
-#define prime3	R9
-#define prime4	R10
-#define prime5	R11
-#define v1	R12
-#define v2	R13
-#define v3	R14
-#define v4	R15
-#define x1	R20
-#define x2	R21
-#define x3	R22
-#define x4	R23
-
-#define round(acc, x) \
-	MADD prime2, acc, x, acc \
-	ROR  $64-31, acc         \
-	MUL  prime1, acc
-
-// round0 performs the operation x = round(0, x).
-#define round0(x) \
-	MUL prime2, x \
-	ROR $64-31, x \
-	MUL prime1, x
-
-#define mergeRound(acc, x) \
-	round0(x)                     \
-	EOR  x, acc                   \
-	MADD acc, prime4, prime1, acc
-
-// blockLoop processes as many 32-byte blocks as possible,
-// updating v1, v2, v3, and v4. It assumes that n >= 32.
-#define blockLoop() \
-	LSR     $5, n, nblocks  \
-	PCALIGN $16             \
-	loop:                   \
-	LDP.P   16(p), (x1, x2) \
-	LDP.P   16(p), (x3, x4) \
-	round(v1, x1)           \
-	round(v2, x2)           \
-	round(v3, x3)           \
-	round(v4, x4)           \
-	SUB     $1, nblocks     \
-	CBNZ    nblocks, loop
-
-// func Sum64(b []byte) uint64
-TEXT ·Sum64(SB), NOSPLIT|NOFRAME, $0-32
-	LDP b_base+0(FP), (p, n)
-
-	LDP  ·primes+0(SB), (prime1, prime2)
-	LDP  ·primes+16(SB), (prime3, prime4)
-	MOVD ·primes+32(SB), prime5
-
-	CMP  $32, n
-	CSEL LT, prime5, ZR, h // if n < 32 { h = prime5 } else { h = 0 }
-	BLT  afterLoop
-
-	ADD  prime1, prime2, v1
-	MOVD prime2, v2
-	MOVD $0, v3
-	NEG  prime1, v4
-
-	blockLoop()
-
-	ROR $64-1, v1, x1
-	ROR $64-7, v2, x2
-	ADD x1, x2
-	ROR $64-12, v3, x3
-	ROR $64-18, v4, x4
-	ADD x3, x4
-	ADD x2, x4, h
-
-	mergeRound(h, v1)
-	mergeRound(h, v2)
-	mergeRound(h, v3)
-	mergeRound(h, v4)
-
-afterLoop:
-	ADD n, h
-
-	TBZ   $4, n, try8
-	LDP.P 16(p), (x1, x2)
-
-	round0(x1)
-
-	// NOTE: here and below, sequencing the EOR after the ROR (using a
-	// rotated register) is worth a small but measurable speedup for small
-	// inputs.
-	ROR  $64-27, h
-	EOR  x1 @> 64-27, h, h
-	MADD h, prime4, prime1, h
-
-	round0(x2)
-	ROR  $64-27, h
-	EOR  x2 @> 64-27, h, h
-	MADD h, prime4, prime1, h
-
-try8:
-	TBZ    $3, n, try4
-	MOVD.P 8(p), x1
-
-	round0(x1)
-	ROR  $64-27, h
-	EOR  x1 @> 64-27, h, h
-	MADD h, prime4, prime1, h
-
-try4:
-	TBZ     $2, n, try2
-	MOVWU.P 4(p), x2
-
-	MUL  prime1, x2
-	ROR  $64-23, h
-	EOR  x2 @> 64-23, h, h
-	MADD h, prime3, prime2, h
-
-try2:
-	TBZ     $1, n, try1
-	MOVHU.P 2(p), x3
-	AND     $255, x3, x1
-	LSR     $8, x3, x2
-
-	MUL prime5, x1
-	ROR $64-11, h
-	EOR x1 @> 64-11, h, h
-	MUL prime1, h
-
-	MUL prime5, x2
-	ROR $64-11, h
-	EOR x2 @> 64-11, h, h
-	MUL prime1, h
-
-try1:
-	TBZ   $0, n, finalize
-	MOVBU (p), x4
-
-	MUL prime5, x4
-	ROR $64-11, h
-	EOR x4 @> 64-11, h, h
-	MUL prime1, h
-
-finalize:
-	EOR h >> 33, h
-	MUL prime2, h
-	EOR h >> 29, h
-	MUL prime3, h
-	EOR h >> 32, h
-
-	MOVD h, ret+24(FP)
-	RET
-
-// func writeBlocks(s *Digest, b []byte) int
-TEXT ·writeBlocks(SB), NOSPLIT|NOFRAME, $0-40
-	LDP ·primes+0(SB), (prime1, prime2)
-
-	// Load state. Assume v[1-4] are stored contiguously.
-	MOVD s+0(FP), digest
-	LDP  0(digest), (v1, v2)
-	LDP  16(digest), (v3, v4)
-
-	LDP b_base+8(FP), (p, n)
-
-	blockLoop()
-
-	// Store updated state.
-	STP (v1, v2), 0(digest)
-	STP (v3, v4), 16(digest)
-
-	BIC  $31, n
-	MOVD n, ret+32(FP)
-	RET
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_asm.go b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_asm.go
deleted file mode 100644
index d4221edf4..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_asm.go
+++ /dev/null
@@ -1,16 +0,0 @@
-//go:build (amd64 || arm64) && !appengine && gc && !purego && !noasm
-// +build amd64 arm64
-// +build !appengine
-// +build gc
-// +build !purego
-// +build !noasm
-
-package xxhash
-
-// Sum64 computes the 64-bit xxHash digest of b.
-//
-//go:noescape
-func Sum64(b []byte) uint64
-
-//go:noescape
-func writeBlocks(s *Digest, b []byte) int
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_other.go b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_other.go
deleted file mode 100644
index 0be16cefc..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_other.go
+++ /dev/null
@@ -1,76 +0,0 @@
-//go:build (!amd64 && !arm64) || appengine || !gc || purego || noasm
-// +build !amd64,!arm64 appengine !gc purego noasm
-
-package xxhash
-
-// Sum64 computes the 64-bit xxHash digest of b.
-func Sum64(b []byte) uint64 {
-	// A simpler version would be
-	//   d := New()
-	//   d.Write(b)
-	//   return d.Sum64()
-	// but this is faster, particularly for small inputs.
-
-	n := len(b)
-	var h uint64
-
-	if n >= 32 {
-		v1 := primes[0] + prime2
-		v2 := prime2
-		v3 := uint64(0)
-		v4 := -primes[0]
-		for len(b) >= 32 {
-			v1 = round(v1, u64(b[0:8:len(b)]))
-			v2 = round(v2, u64(b[8:16:len(b)]))
-			v3 = round(v3, u64(b[16:24:len(b)]))
-			v4 = round(v4, u64(b[24:32:len(b)]))
-			b = b[32:len(b):len(b)]
-		}
-		h = rol1(v1) + rol7(v2) + rol12(v3) + rol18(v4)
-		h = mergeRound(h, v1)
-		h = mergeRound(h, v2)
-		h = mergeRound(h, v3)
-		h = mergeRound(h, v4)
-	} else {
-		h = prime5
-	}
-
-	h += uint64(n)
-
-	for ; len(b) >= 8; b = b[8:] {
-		k1 := round(0, u64(b[:8]))
-		h ^= k1
-		h = rol27(h)*prime1 + prime4
-	}
-	if len(b) >= 4 {
-		h ^= uint64(u32(b[:4])) * prime1
-		h = rol23(h)*prime2 + prime3
-		b = b[4:]
-	}
-	for ; len(b) > 0; b = b[1:] {
-		h ^= uint64(b[0]) * prime5
-		h = rol11(h) * prime1
-	}
-
-	h ^= h >> 33
-	h *= prime2
-	h ^= h >> 29
-	h *= prime3
-	h ^= h >> 32
-
-	return h
-}
-
-func writeBlocks(d *Digest, b []byte) int {
-	v1, v2, v3, v4 := d.v1, d.v2, d.v3, d.v4
-	n := len(b)
-	for len(b) >= 32 {
-		v1 = round(v1, u64(b[0:8:len(b)]))
-		v2 = round(v2, u64(b[8:16:len(b)]))
-		v3 = round(v3, u64(b[16:24:len(b)]))
-		v4 = round(v4, u64(b[24:32:len(b)]))
-		b = b[32:len(b):len(b)]
-	}
-	d.v1, d.v2, d.v3, d.v4 = v1, v2, v3, v4
-	return n - len(b)
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_safe.go b/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_safe.go
deleted file mode 100644
index 6f3b0cb10..000000000
--- a/vendor/github.com/klauspost/compress/zstd/internal/xxhash/xxhash_safe.go
+++ /dev/null
@@ -1,11 +0,0 @@
-package xxhash
-
-// Sum64String computes the 64-bit xxHash digest of s.
-func Sum64String(s string) uint64 {
-	return Sum64([]byte(s))
-}
-
-// WriteString adds more data to d. It always returns len(s), nil.
-func (d *Digest) WriteString(s string) (n int, err error) {
-	return d.Write([]byte(s))
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/matchlen_amd64.go b/vendor/github.com/klauspost/compress/zstd/matchlen_amd64.go
deleted file mode 100644
index f41932b7a..000000000
--- a/vendor/github.com/klauspost/compress/zstd/matchlen_amd64.go
+++ /dev/null
@@ -1,16 +0,0 @@
-//go:build amd64 && !appengine && !noasm && gc
-// +build amd64,!appengine,!noasm,gc
-
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-
-package zstd
-
-// matchLen returns how many bytes match in a and b
-//
-// It assumes that:
-//
-//	len(a) <= len(b) and len(a) > 0
-//
-//go:noescape
-func matchLen(a []byte, b []byte) int
diff --git a/vendor/github.com/klauspost/compress/zstd/matchlen_amd64.s b/vendor/github.com/klauspost/compress/zstd/matchlen_amd64.s
deleted file mode 100644
index 0782b86e3..000000000
--- a/vendor/github.com/klauspost/compress/zstd/matchlen_amd64.s
+++ /dev/null
@@ -1,66 +0,0 @@
-// Copied from S2 implementation.
-
-//go:build !appengine && !noasm && gc && !noasm
-
-#include "textflag.h"
-
-// func matchLen(a []byte, b []byte) int
-TEXT ·matchLen(SB), NOSPLIT, $0-56
-	MOVQ a_base+0(FP), AX
-	MOVQ b_base+24(FP), CX
-	MOVQ a_len+8(FP), DX
-
-	// matchLen
-	XORL SI, SI
-	CMPL DX, $0x08
-	JB   matchlen_match4_standalone
-
-matchlen_loopback_standalone:
-	MOVQ (AX)(SI*1), BX
-	XORQ (CX)(SI*1), BX
-	JZ   matchlen_loop_standalone
-
-#ifdef GOAMD64_v3
-	TZCNTQ BX, BX
-#else
-	BSFQ BX, BX
-#endif
-	SHRL $0x03, BX
-	LEAL (SI)(BX*1), SI
-	JMP  gen_match_len_end
-
-matchlen_loop_standalone:
-	LEAL -8(DX), DX
-	LEAL 8(SI), SI
-	CMPL DX, $0x08
-	JAE  matchlen_loopback_standalone
-
-matchlen_match4_standalone:
-	CMPL DX, $0x04
-	JB   matchlen_match2_standalone
-	MOVL (AX)(SI*1), BX
-	CMPL (CX)(SI*1), BX
-	JNE  matchlen_match2_standalone
-	LEAL -4(DX), DX
-	LEAL 4(SI), SI
-
-matchlen_match2_standalone:
-	CMPL DX, $0x02
-	JB   matchlen_match1_standalone
-	MOVW (AX)(SI*1), BX
-	CMPW (CX)(SI*1), BX
-	JNE  matchlen_match1_standalone
-	LEAL -2(DX), DX
-	LEAL 2(SI), SI
-
-matchlen_match1_standalone:
-	CMPL DX, $0x01
-	JB   gen_match_len_end
-	MOVB (AX)(SI*1), BL
-	CMPB (CX)(SI*1), BL
-	JNE  gen_match_len_end
-	INCL SI
-
-gen_match_len_end:
-	MOVQ SI, ret+48(FP)
-	RET
diff --git a/vendor/github.com/klauspost/compress/zstd/matchlen_generic.go b/vendor/github.com/klauspost/compress/zstd/matchlen_generic.go
deleted file mode 100644
index 57b9c31c0..000000000
--- a/vendor/github.com/klauspost/compress/zstd/matchlen_generic.go
+++ /dev/null
@@ -1,33 +0,0 @@
-//go:build !amd64 || appengine || !gc || noasm
-// +build !amd64 appengine !gc noasm
-
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-
-package zstd
-
-import (
-	"encoding/binary"
-	"math/bits"
-)
-
-// matchLen returns the maximum common prefix length of a and b.
-// a must be the shortest of the two.
-func matchLen(a, b []byte) (n int) {
-	for ; len(a) >= 8 && len(b) >= 8; a, b = a[8:], b[8:] {
-		diff := binary.LittleEndian.Uint64(a) ^ binary.LittleEndian.Uint64(b)
-		if diff != 0 {
-			return n + bits.TrailingZeros64(diff)>>3
-		}
-		n += 8
-	}
-
-	for i := range a {
-		if a[i] != b[i] {
-			break
-		}
-		n++
-	}
-	return n
-
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/seqdec.go b/vendor/github.com/klauspost/compress/zstd/seqdec.go
deleted file mode 100644
index d7fe6d82d..000000000
--- a/vendor/github.com/klauspost/compress/zstd/seqdec.go
+++ /dev/null
@@ -1,503 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"errors"
-	"fmt"
-	"io"
-)
-
-type seq struct {
-	litLen   uint32
-	matchLen uint32
-	offset   uint32
-
-	// Codes are stored here for the encoder
-	// so they only have to be looked up once.
-	llCode, mlCode, ofCode uint8
-}
-
-type seqVals struct {
-	ll, ml, mo int
-}
-
-func (s seq) String() string {
-	if s.offset <= 3 {
-		if s.offset == 0 {
-			return fmt.Sprint("litLen:", s.litLen, ", matchLen:", s.matchLen+zstdMinMatch, ", offset: INVALID (0)")
-		}
-		return fmt.Sprint("litLen:", s.litLen, ", matchLen:", s.matchLen+zstdMinMatch, ", offset:", s.offset, " (repeat)")
-	}
-	return fmt.Sprint("litLen:", s.litLen, ", matchLen:", s.matchLen+zstdMinMatch, ", offset:", s.offset-3, " (new)")
-}
-
-type seqCompMode uint8
-
-const (
-	compModePredefined seqCompMode = iota
-	compModeRLE
-	compModeFSE
-	compModeRepeat
-)
-
-type sequenceDec struct {
-	// decoder keeps track of the current state and updates it from the bitstream.
-	fse    *fseDecoder
-	state  fseState
-	repeat bool
-}
-
-// init the state of the decoder with input from stream.
-func (s *sequenceDec) init(br *bitReader) error {
-	if s.fse == nil {
-		return errors.New("sequence decoder not defined")
-	}
-	s.state.init(br, s.fse.actualTableLog, s.fse.dt[:1<<s.fse.actualTableLog])
-	return nil
-}
-
-// sequenceDecs contains all 3 sequence decoders and their state.
-type sequenceDecs struct {
-	litLengths   sequenceDec
-	offsets      sequenceDec
-	matchLengths sequenceDec
-	prevOffset   [3]int
-	dict         []byte
-	literals     []byte
-	out          []byte
-	nSeqs        int
-	br           *bitReader
-	seqSize      int
-	windowSize   int
-	maxBits      uint8
-	maxSyncLen   uint64
-}
-
-// initialize all 3 decoders from the stream input.
-func (s *sequenceDecs) initialize(br *bitReader, hist *history, out []byte) error {
-	if err := s.litLengths.init(br); err != nil {
-		return errors.New("litLengths:" + err.Error())
-	}
-	if err := s.offsets.init(br); err != nil {
-		return errors.New("offsets:" + err.Error())
-	}
-	if err := s.matchLengths.init(br); err != nil {
-		return errors.New("matchLengths:" + err.Error())
-	}
-	s.br = br
-	s.prevOffset = hist.recentOffsets
-	s.maxBits = s.litLengths.fse.maxBits + s.offsets.fse.maxBits + s.matchLengths.fse.maxBits
-	s.windowSize = hist.windowSize
-	s.out = out
-	s.dict = nil
-	if hist.dict != nil {
-		s.dict = hist.dict.content
-	}
-	return nil
-}
-
-func (s *sequenceDecs) freeDecoders() {
-	if f := s.litLengths.fse; f != nil && !f.preDefined {
-		fseDecoderPool.Put(f)
-		s.litLengths.fse = nil
-	}
-	if f := s.offsets.fse; f != nil && !f.preDefined {
-		fseDecoderPool.Put(f)
-		s.offsets.fse = nil
-	}
-	if f := s.matchLengths.fse; f != nil && !f.preDefined {
-		fseDecoderPool.Put(f)
-		s.matchLengths.fse = nil
-	}
-}
-
-// execute will execute the decoded sequence with the provided history.
-// The sequence must be evaluated before being sent.
-func (s *sequenceDecs) execute(seqs []seqVals, hist []byte) error {
-	if len(s.dict) == 0 {
-		return s.executeSimple(seqs, hist)
-	}
-
-	// Ensure we have enough output size...
-	if len(s.out)+s.seqSize > cap(s.out) {
-		addBytes := s.seqSize + len(s.out)
-		s.out = append(s.out, make([]byte, addBytes)...)
-		s.out = s.out[:len(s.out)-addBytes]
-	}
-
-	if debugDecoder {
-		printf("Execute %d seqs with hist %d, dict %d, literals: %d into %d bytes\n", len(seqs), len(hist), len(s.dict), len(s.literals), s.seqSize)
-	}
-
-	var t = len(s.out)
-	out := s.out[:t+s.seqSize]
-
-	for _, seq := range seqs {
-		// Add literals
-		copy(out[t:], s.literals[:seq.ll])
-		t += seq.ll
-		s.literals = s.literals[seq.ll:]
-
-		// Copy from dictionary...
-		if seq.mo > t+len(hist) || seq.mo > s.windowSize {
-			if len(s.dict) == 0 {
-				return fmt.Errorf("match offset (%d) bigger than current history (%d)", seq.mo, t+len(hist))
-			}
-
-			// we may be in dictionary.
-			dictO := len(s.dict) - (seq.mo - (t + len(hist)))
-			if dictO < 0 || dictO >= len(s.dict) {
-				return fmt.Errorf("match offset (%d) bigger than current history+dict (%d)", seq.mo, t+len(hist)+len(s.dict))
-			}
-			end := dictO + seq.ml
-			if end > len(s.dict) {
-				n := len(s.dict) - dictO
-				copy(out[t:], s.dict[dictO:])
-				t += n
-				seq.ml -= n
-			} else {
-				copy(out[t:], s.dict[dictO:end])
-				t += end - dictO
-				continue
-			}
-		}
-
-		// Copy from history.
-		if v := seq.mo - t; v > 0 {
-			// v is the start position in history from end.
-			start := len(hist) - v
-			if seq.ml > v {
-				// Some goes into current block.
-				// Copy remainder of history
-				copy(out[t:], hist[start:])
-				t += v
-				seq.ml -= v
-			} else {
-				copy(out[t:], hist[start:start+seq.ml])
-				t += seq.ml
-				continue
-			}
-		}
-		// We must be in current buffer now
-		if seq.ml > 0 {
-			start := t - seq.mo
-			if seq.ml <= t-start {
-				// No overlap
-				copy(out[t:], out[start:start+seq.ml])
-				t += seq.ml
-				continue
-			} else {
-				// Overlapping copy
-				// Extend destination slice and copy one byte at the time.
-				src := out[start : start+seq.ml]
-				dst := out[t:]
-				dst = dst[:len(src)]
-				t += len(src)
-				// Destination is the space we just added.
-				for i := range src {
-					dst[i] = src[i]
-				}
-			}
-		}
-	}
-
-	// Add final literals
-	copy(out[t:], s.literals)
-	if debugDecoder {
-		t += len(s.literals)
-		if t != len(out) {
-			panic(fmt.Errorf("length mismatch, want %d, got %d, ss: %d", len(out), t, s.seqSize))
-		}
-	}
-	s.out = out
-
-	return nil
-}
-
-// decode sequences from the stream with the provided history.
-func (s *sequenceDecs) decodeSync(hist []byte) error {
-	supported, err := s.decodeSyncSimple(hist)
-	if supported {
-		return err
-	}
-
-	br := s.br
-	seqs := s.nSeqs
-	startSize := len(s.out)
-	// Grab full sizes tables, to avoid bounds checks.
-	llTable, mlTable, ofTable := s.litLengths.fse.dt[:maxTablesize], s.matchLengths.fse.dt[:maxTablesize], s.offsets.fse.dt[:maxTablesize]
-	llState, mlState, ofState := s.litLengths.state.state, s.matchLengths.state.state, s.offsets.state.state
-	out := s.out
-	maxBlockSize := maxCompressedBlockSize
-	if s.windowSize < maxBlockSize {
-		maxBlockSize = s.windowSize
-	}
-
-	if debugDecoder {
-		println("decodeSync: decoding", seqs, "sequences", br.remain(), "bits remain on stream")
-	}
-	for i := seqs - 1; i >= 0; i-- {
-		if br.overread() {
-			printf("reading sequence %d, exceeded available data. Overread by %d\n", seqs-i, -br.remain())
-			return io.ErrUnexpectedEOF
-		}
-		var ll, mo, ml int
-		if len(br.in) > 4+((maxOffsetBits+16+16)>>3) {
-			// inlined function:
-			// ll, mo, ml = s.nextFast(br, llState, mlState, ofState)
-
-			// Final will not read from stream.
-			var llB, mlB, moB uint8
-			ll, llB = llState.final()
-			ml, mlB = mlState.final()
-			mo, moB = ofState.final()
-
-			// extra bits are stored in reverse order.
-			br.fillFast()
-			mo += br.getBits(moB)
-			if s.maxBits > 32 {
-				br.fillFast()
-			}
-			ml += br.getBits(mlB)
-			ll += br.getBits(llB)
-
-			if moB > 1 {
-				s.prevOffset[2] = s.prevOffset[1]
-				s.prevOffset[1] = s.prevOffset[0]
-				s.prevOffset[0] = mo
-			} else {
-				// mo = s.adjustOffset(mo, ll, moB)
-				// Inlined for rather big speedup
-				if ll == 0 {
-					// There is an exception though, when current sequence's literals_length = 0.
-					// In this case, repeated offsets are shifted by one, so an offset_value of 1 means Repeated_Offset2,
-					// an offset_value of 2 means Repeated_Offset3, and an offset_value of 3 means Repeated_Offset1 - 1_byte.
-					mo++
-				}
-
-				if mo == 0 {
-					mo = s.prevOffset[0]
-				} else {
-					var temp int
-					if mo == 3 {
-						temp = s.prevOffset[0] - 1
-					} else {
-						temp = s.prevOffset[mo]
-					}
-
-					if temp == 0 {
-						// 0 is not valid; input is corrupted; force offset to 1
-						println("WARNING: temp was 0")
-						temp = 1
-					}
-
-					if mo != 1 {
-						s.prevOffset[2] = s.prevOffset[1]
-					}
-					s.prevOffset[1] = s.prevOffset[0]
-					s.prevOffset[0] = temp
-					mo = temp
-				}
-			}
-			br.fillFast()
-		} else {
-			ll, mo, ml = s.next(br, llState, mlState, ofState)
-			br.fill()
-		}
-
-		if debugSequences {
-			println("Seq", seqs-i-1, "Litlen:", ll, "mo:", mo, "(abs) ml:", ml)
-		}
-
-		if ll > len(s.literals) {
-			return fmt.Errorf("unexpected literal count, want %d bytes, but only %d is available", ll, len(s.literals))
-		}
-		size := ll + ml + len(out)
-		if size-startSize > maxBlockSize {
-			return fmt.Errorf("output bigger than max block size (%d)", maxBlockSize)
-		}
-		if size > cap(out) {
-			// Not enough size, which can happen under high volume block streaming conditions
-			// but could be if destination slice is too small for sync operations.
-			// over-allocating here can create a large amount of GC pressure so we try to keep
-			// it as contained as possible
-			used := len(out) - startSize
-			addBytes := 256 + ll + ml + used>>2
-			// Clamp to max block size.
-			if used+addBytes > maxBlockSize {
-				addBytes = maxBlockSize - used
-			}
-			out = append(out, make([]byte, addBytes)...)
-			out = out[:len(out)-addBytes]
-		}
-		if ml > maxMatchLen {
-			return fmt.Errorf("match len (%d) bigger than max allowed length", ml)
-		}
-
-		// Add literals
-		out = append(out, s.literals[:ll]...)
-		s.literals = s.literals[ll:]
-
-		if mo == 0 && ml > 0 {
-			return fmt.Errorf("zero matchoff and matchlen (%d) > 0", ml)
-		}
-
-		if mo > len(out)+len(hist) || mo > s.windowSize {
-			if len(s.dict) == 0 {
-				return fmt.Errorf("match offset (%d) bigger than current history (%d)", mo, len(out)+len(hist)-startSize)
-			}
-
-			// we may be in dictionary.
-			dictO := len(s.dict) - (mo - (len(out) + len(hist)))
-			if dictO < 0 || dictO >= len(s.dict) {
-				return fmt.Errorf("match offset (%d) bigger than current history (%d)", mo, len(out)+len(hist)-startSize)
-			}
-			end := dictO + ml
-			if end > len(s.dict) {
-				out = append(out, s.dict[dictO:]...)
-				ml -= len(s.dict) - dictO
-			} else {
-				out = append(out, s.dict[dictO:end]...)
-				mo = 0
-				ml = 0
-			}
-		}
-
-		// Copy from history.
-		// TODO: Blocks without history could be made to ignore this completely.
-		if v := mo - len(out); v > 0 {
-			// v is the start position in history from end.
-			start := len(hist) - v
-			if ml > v {
-				// Some goes into current block.
-				// Copy remainder of history
-				out = append(out, hist[start:]...)
-				ml -= v
-			} else {
-				out = append(out, hist[start:start+ml]...)
-				ml = 0
-			}
-		}
-		// We must be in current buffer now
-		if ml > 0 {
-			start := len(out) - mo
-			if ml <= len(out)-start {
-				// No overlap
-				out = append(out, out[start:start+ml]...)
-			} else {
-				// Overlapping copy
-				// Extend destination slice and copy one byte at the time.
-				out = out[:len(out)+ml]
-				src := out[start : start+ml]
-				// Destination is the space we just added.
-				dst := out[len(out)-ml:]
-				dst = dst[:len(src)]
-				for i := range src {
-					dst[i] = src[i]
-				}
-			}
-		}
-		if i == 0 {
-			// This is the last sequence, so we shouldn't update state.
-			break
-		}
-
-		// Manually inlined, ~ 5-20% faster
-		// Update all 3 states at once. Approx 20% faster.
-		nBits := llState.nbBits() + mlState.nbBits() + ofState.nbBits()
-		if nBits == 0 {
-			llState = llTable[llState.newState()&maxTableMask]
-			mlState = mlTable[mlState.newState()&maxTableMask]
-			ofState = ofTable[ofState.newState()&maxTableMask]
-		} else {
-			bits := br.get32BitsFast(nBits)
-
-			lowBits := uint16(bits >> ((ofState.nbBits() + mlState.nbBits()) & 31))
-			llState = llTable[(llState.newState()+lowBits)&maxTableMask]
-
-			lowBits = uint16(bits >> (ofState.nbBits() & 31))
-			lowBits &= bitMask[mlState.nbBits()&15]
-			mlState = mlTable[(mlState.newState()+lowBits)&maxTableMask]
-
-			lowBits = uint16(bits) & bitMask[ofState.nbBits()&15]
-			ofState = ofTable[(ofState.newState()+lowBits)&maxTableMask]
-		}
-	}
-
-	if size := len(s.literals) + len(out) - startSize; size > maxBlockSize {
-		return fmt.Errorf("output bigger than max block size (%d)", maxBlockSize)
-	}
-
-	// Add final literals
-	s.out = append(out, s.literals...)
-	return br.close()
-}
-
-var bitMask [16]uint16
-
-func init() {
-	for i := range bitMask[:] {
-		bitMask[i] = uint16((1 << uint(i)) - 1)
-	}
-}
-
-func (s *sequenceDecs) next(br *bitReader, llState, mlState, ofState decSymbol) (ll, mo, ml int) {
-	// Final will not read from stream.
-	ll, llB := llState.final()
-	ml, mlB := mlState.final()
-	mo, moB := ofState.final()
-
-	// extra bits are stored in reverse order.
-	br.fill()
-	mo += br.getBits(moB)
-	if s.maxBits > 32 {
-		br.fill()
-	}
-	// matchlength+literal length, max 32 bits
-	ml += br.getBits(mlB)
-	ll += br.getBits(llB)
-	mo = s.adjustOffset(mo, ll, moB)
-	return
-}
-
-func (s *sequenceDecs) adjustOffset(offset, litLen int, offsetB uint8) int {
-	if offsetB > 1 {
-		s.prevOffset[2] = s.prevOffset[1]
-		s.prevOffset[1] = s.prevOffset[0]
-		s.prevOffset[0] = offset
-		return offset
-	}
-
-	if litLen == 0 {
-		// There is an exception though, when current sequence's literals_length = 0.
-		// In this case, repeated offsets are shifted by one, so an offset_value of 1 means Repeated_Offset2,
-		// an offset_value of 2 means Repeated_Offset3, and an offset_value of 3 means Repeated_Offset1 - 1_byte.
-		offset++
-	}
-
-	if offset == 0 {
-		return s.prevOffset[0]
-	}
-	var temp int
-	if offset == 3 {
-		temp = s.prevOffset[0] - 1
-	} else {
-		temp = s.prevOffset[offset]
-	}
-
-	if temp == 0 {
-		// 0 is not valid; input is corrupted; force offset to 1
-		println("temp was 0")
-		temp = 1
-	}
-
-	if offset != 1 {
-		s.prevOffset[2] = s.prevOffset[1]
-	}
-	s.prevOffset[1] = s.prevOffset[0]
-	s.prevOffset[0] = temp
-	return temp
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/seqdec_amd64.go b/vendor/github.com/klauspost/compress/zstd/seqdec_amd64.go
deleted file mode 100644
index c59f17e07..000000000
--- a/vendor/github.com/klauspost/compress/zstd/seqdec_amd64.go
+++ /dev/null
@@ -1,394 +0,0 @@
-//go:build amd64 && !appengine && !noasm && gc
-// +build amd64,!appengine,!noasm,gc
-
-package zstd
-
-import (
-	"fmt"
-	"io"
-
-	"github.com/klauspost/compress/internal/cpuinfo"
-)
-
-type decodeSyncAsmContext struct {
-	llTable     []decSymbol
-	mlTable     []decSymbol
-	ofTable     []decSymbol
-	llState     uint64
-	mlState     uint64
-	ofState     uint64
-	iteration   int
-	litRemain   int
-	out         []byte
-	outPosition int
-	literals    []byte
-	litPosition int
-	history     []byte
-	windowSize  int
-	ll          int // set on error (not for all errors, please refer to _generate/gen.go)
-	ml          int // set on error (not for all errors, please refer to _generate/gen.go)
-	mo          int // set on error (not for all errors, please refer to _generate/gen.go)
-}
-
-// sequenceDecs_decodeSync_amd64 implements the main loop of sequenceDecs.decodeSync in x86 asm.
-//
-// Please refer to seqdec_generic.go for the reference implementation.
-//
-//go:noescape
-func sequenceDecs_decodeSync_amd64(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-
-// sequenceDecs_decodeSync_bmi2 implements the main loop of sequenceDecs.decodeSync in x86 asm with BMI2 extensions.
-//
-//go:noescape
-func sequenceDecs_decodeSync_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-
-// sequenceDecs_decodeSync_safe_amd64 does the same as above, but does not write more than output buffer.
-//
-//go:noescape
-func sequenceDecs_decodeSync_safe_amd64(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-
-// sequenceDecs_decodeSync_safe_bmi2 does the same as above, but does not write more than output buffer.
-//
-//go:noescape
-func sequenceDecs_decodeSync_safe_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-
-// decode sequences from the stream with the provided history but without a dictionary.
-func (s *sequenceDecs) decodeSyncSimple(hist []byte) (bool, error) {
-	if len(s.dict) > 0 {
-		return false, nil
-	}
-	if s.maxSyncLen == 0 && cap(s.out)-len(s.out) < maxCompressedBlockSize {
-		return false, nil
-	}
-
-	// FIXME: Using unsafe memory copies leads to rare, random crashes
-	// with fuzz testing. It is therefore disabled for now.
-	const useSafe = true
-	/*
-		useSafe := false
-		if s.maxSyncLen == 0 && cap(s.out)-len(s.out) < maxCompressedBlockSizeAlloc {
-			useSafe = true
-		}
-		if s.maxSyncLen > 0 && cap(s.out)-len(s.out)-compressedBlockOverAlloc < int(s.maxSyncLen) {
-			useSafe = true
-		}
-		if cap(s.literals) < len(s.literals)+compressedBlockOverAlloc {
-			useSafe = true
-		}
-	*/
-
-	br := s.br
-
-	maxBlockSize := maxCompressedBlockSize
-	if s.windowSize < maxBlockSize {
-		maxBlockSize = s.windowSize
-	}
-
-	ctx := decodeSyncAsmContext{
-		llTable:     s.litLengths.fse.dt[:maxTablesize],
-		mlTable:     s.matchLengths.fse.dt[:maxTablesize],
-		ofTable:     s.offsets.fse.dt[:maxTablesize],
-		llState:     uint64(s.litLengths.state.state),
-		mlState:     uint64(s.matchLengths.state.state),
-		ofState:     uint64(s.offsets.state.state),
-		iteration:   s.nSeqs - 1,
-		litRemain:   len(s.literals),
-		out:         s.out,
-		outPosition: len(s.out),
-		literals:    s.literals,
-		windowSize:  s.windowSize,
-		history:     hist,
-	}
-
-	s.seqSize = 0
-	startSize := len(s.out)
-
-	var errCode int
-	if cpuinfo.HasBMI2() {
-		if useSafe {
-			errCode = sequenceDecs_decodeSync_safe_bmi2(s, br, &ctx)
-		} else {
-			errCode = sequenceDecs_decodeSync_bmi2(s, br, &ctx)
-		}
-	} else {
-		if useSafe {
-			errCode = sequenceDecs_decodeSync_safe_amd64(s, br, &ctx)
-		} else {
-			errCode = sequenceDecs_decodeSync_amd64(s, br, &ctx)
-		}
-	}
-	switch errCode {
-	case noError:
-		break
-
-	case errorMatchLenOfsMismatch:
-		return true, fmt.Errorf("zero matchoff and matchlen (%d) > 0", ctx.ml)
-
-	case errorMatchLenTooBig:
-		return true, fmt.Errorf("match len (%d) bigger than max allowed length", ctx.ml)
-
-	case errorMatchOffTooBig:
-		return true, fmt.Errorf("match offset (%d) bigger than current history (%d)",
-			ctx.mo, ctx.outPosition+len(hist)-startSize)
-
-	case errorNotEnoughLiterals:
-		return true, fmt.Errorf("unexpected literal count, want %d bytes, but only %d is available",
-			ctx.ll, ctx.litRemain+ctx.ll)
-
-	case errorOverread:
-		return true, io.ErrUnexpectedEOF
-
-	case errorNotEnoughSpace:
-		size := ctx.outPosition + ctx.ll + ctx.ml
-		if debugDecoder {
-			println("msl:", s.maxSyncLen, "cap", cap(s.out), "bef:", startSize, "sz:", size-startSize, "mbs:", maxBlockSize, "outsz:", cap(s.out)-startSize)
-		}
-		return true, fmt.Errorf("output bigger than max block size (%d)", maxBlockSize)
-
-	default:
-		return true, fmt.Errorf("sequenceDecs_decode returned erroneous code %d", errCode)
-	}
-
-	s.seqSize += ctx.litRemain
-	if s.seqSize > maxBlockSize {
-		return true, fmt.Errorf("output bigger than max block size (%d)", maxBlockSize)
-	}
-	err := br.close()
-	if err != nil {
-		printf("Closing sequences: %v, %+v\n", err, *br)
-		return true, err
-	}
-
-	s.literals = s.literals[ctx.litPosition:]
-	t := ctx.outPosition
-	s.out = s.out[:t]
-
-	// Add final literals
-	s.out = append(s.out, s.literals...)
-	if debugDecoder {
-		t += len(s.literals)
-		if t != len(s.out) {
-			panic(fmt.Errorf("length mismatch, want %d, got %d", len(s.out), t))
-		}
-	}
-
-	return true, nil
-}
-
-// --------------------------------------------------------------------------------
-
-type decodeAsmContext struct {
-	llTable   []decSymbol
-	mlTable   []decSymbol
-	ofTable   []decSymbol
-	llState   uint64
-	mlState   uint64
-	ofState   uint64
-	iteration int
-	seqs      []seqVals
-	litRemain int
-}
-
-const noError = 0
-
-// error reported when mo == 0 && ml > 0
-const errorMatchLenOfsMismatch = 1
-
-// error reported when ml > maxMatchLen
-const errorMatchLenTooBig = 2
-
-// error reported when mo > available history or mo > s.windowSize
-const errorMatchOffTooBig = 3
-
-// error reported when the sum of literal lengths exeeceds the literal buffer size
-const errorNotEnoughLiterals = 4
-
-// error reported when capacity of `out` is too small
-const errorNotEnoughSpace = 5
-
-// error reported when bits are overread.
-const errorOverread = 6
-
-// sequenceDecs_decode implements the main loop of sequenceDecs in x86 asm.
-//
-// Please refer to seqdec_generic.go for the reference implementation.
-//
-//go:noescape
-func sequenceDecs_decode_amd64(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-
-// sequenceDecs_decode implements the main loop of sequenceDecs in x86 asm.
-//
-// Please refer to seqdec_generic.go for the reference implementation.
-//
-//go:noescape
-func sequenceDecs_decode_56_amd64(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-
-// sequenceDecs_decode implements the main loop of sequenceDecs in x86 asm with BMI2 extensions.
-//
-//go:noescape
-func sequenceDecs_decode_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-
-// sequenceDecs_decode implements the main loop of sequenceDecs in x86 asm with BMI2 extensions.
-//
-//go:noescape
-func sequenceDecs_decode_56_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-
-// decode sequences from the stream without the provided history.
-func (s *sequenceDecs) decode(seqs []seqVals) error {
-	br := s.br
-
-	maxBlockSize := maxCompressedBlockSize
-	if s.windowSize < maxBlockSize {
-		maxBlockSize = s.windowSize
-	}
-
-	ctx := decodeAsmContext{
-		llTable:   s.litLengths.fse.dt[:maxTablesize],
-		mlTable:   s.matchLengths.fse.dt[:maxTablesize],
-		ofTable:   s.offsets.fse.dt[:maxTablesize],
-		llState:   uint64(s.litLengths.state.state),
-		mlState:   uint64(s.matchLengths.state.state),
-		ofState:   uint64(s.offsets.state.state),
-		seqs:      seqs,
-		iteration: len(seqs) - 1,
-		litRemain: len(s.literals),
-	}
-
-	if debugDecoder {
-		println("decode: decoding", len(seqs), "sequences", br.remain(), "bits remain on stream")
-	}
-
-	s.seqSize = 0
-	lte56bits := s.maxBits+s.offsets.fse.actualTableLog+s.matchLengths.fse.actualTableLog+s.litLengths.fse.actualTableLog <= 56
-	var errCode int
-	if cpuinfo.HasBMI2() {
-		if lte56bits {
-			errCode = sequenceDecs_decode_56_bmi2(s, br, &ctx)
-		} else {
-			errCode = sequenceDecs_decode_bmi2(s, br, &ctx)
-		}
-	} else {
-		if lte56bits {
-			errCode = sequenceDecs_decode_56_amd64(s, br, &ctx)
-		} else {
-			errCode = sequenceDecs_decode_amd64(s, br, &ctx)
-		}
-	}
-	if errCode != 0 {
-		i := len(seqs) - ctx.iteration - 1
-		switch errCode {
-		case errorMatchLenOfsMismatch:
-			ml := ctx.seqs[i].ml
-			return fmt.Errorf("zero matchoff and matchlen (%d) > 0", ml)
-
-		case errorMatchLenTooBig:
-			ml := ctx.seqs[i].ml
-			return fmt.Errorf("match len (%d) bigger than max allowed length", ml)
-
-		case errorNotEnoughLiterals:
-			ll := ctx.seqs[i].ll
-			return fmt.Errorf("unexpected literal count, want %d bytes, but only %d is available", ll, ctx.litRemain+ll)
-		case errorOverread:
-			return io.ErrUnexpectedEOF
-		}
-
-		return fmt.Errorf("sequenceDecs_decode_amd64 returned erroneous code %d", errCode)
-	}
-
-	if ctx.litRemain < 0 {
-		return fmt.Errorf("literal count is too big: total available %d, total requested %d",
-			len(s.literals), len(s.literals)-ctx.litRemain)
-	}
-
-	s.seqSize += ctx.litRemain
-	if s.seqSize > maxBlockSize {
-		return fmt.Errorf("output bigger than max block size (%d)", maxBlockSize)
-	}
-	if debugDecoder {
-		println("decode: ", br.remain(), "bits remain on stream. code:", errCode)
-	}
-	err := br.close()
-	if err != nil {
-		printf("Closing sequences: %v, %+v\n", err, *br)
-	}
-	return err
-}
-
-// --------------------------------------------------------------------------------
-
-type executeAsmContext struct {
-	seqs        []seqVals
-	seqIndex    int
-	out         []byte
-	history     []byte
-	literals    []byte
-	outPosition int
-	litPosition int
-	windowSize  int
-}
-
-// sequenceDecs_executeSimple_amd64 implements the main loop of sequenceDecs.executeSimple in x86 asm.
-//
-// Returns false if a match offset is too big.
-//
-// Please refer to seqdec_generic.go for the reference implementation.
-//
-//go:noescape
-func sequenceDecs_executeSimple_amd64(ctx *executeAsmContext) bool
-
-// Same as above, but with safe memcopies
-//
-//go:noescape
-func sequenceDecs_executeSimple_safe_amd64(ctx *executeAsmContext) bool
-
-// executeSimple handles cases when dictionary is not used.
-func (s *sequenceDecs) executeSimple(seqs []seqVals, hist []byte) error {
-	// Ensure we have enough output size...
-	if len(s.out)+s.seqSize+compressedBlockOverAlloc > cap(s.out) {
-		addBytes := s.seqSize + len(s.out) + compressedBlockOverAlloc
-		s.out = append(s.out, make([]byte, addBytes)...)
-		s.out = s.out[:len(s.out)-addBytes]
-	}
-
-	if debugDecoder {
-		printf("Execute %d seqs with literals: %d into %d bytes\n", len(seqs), len(s.literals), s.seqSize)
-	}
-
-	var t = len(s.out)
-	out := s.out[:t+s.seqSize]
-
-	ctx := executeAsmContext{
-		seqs:        seqs,
-		seqIndex:    0,
-		out:         out,
-		history:     hist,
-		outPosition: t,
-		litPosition: 0,
-		literals:    s.literals,
-		windowSize:  s.windowSize,
-	}
-	var ok bool
-	if cap(s.literals) < len(s.literals)+compressedBlockOverAlloc {
-		ok = sequenceDecs_executeSimple_safe_amd64(&ctx)
-	} else {
-		ok = sequenceDecs_executeSimple_amd64(&ctx)
-	}
-	if !ok {
-		return fmt.Errorf("match offset (%d) bigger than current history (%d)",
-			seqs[ctx.seqIndex].mo, ctx.outPosition+len(hist))
-	}
-	s.literals = s.literals[ctx.litPosition:]
-	t = ctx.outPosition
-
-	// Add final literals
-	copy(out[t:], s.literals)
-	if debugDecoder {
-		t += len(s.literals)
-		if t != len(out) {
-			panic(fmt.Errorf("length mismatch, want %d, got %d, ss: %d", len(out), t, s.seqSize))
-		}
-	}
-	s.out = out
-
-	return nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/seqdec_amd64.s b/vendor/github.com/klauspost/compress/zstd/seqdec_amd64.s
deleted file mode 100644
index f5591fa1e..000000000
--- a/vendor/github.com/klauspost/compress/zstd/seqdec_amd64.s
+++ /dev/null
@@ -1,4151 +0,0 @@
-// Code generated by command: go run gen.go -out ../seqdec_amd64.s -pkg=zstd. DO NOT EDIT.
-
-//go:build !appengine && !noasm && gc && !noasm
-
-// func sequenceDecs_decode_amd64(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-// Requires: CMOV
-TEXT ·sequenceDecs_decode_amd64(SB), $8-32
-	MOVQ    br+8(FP), CX
-	MOVQ    24(CX), DX
-	MOVBQZX 32(CX), BX
-	MOVQ    (CX), AX
-	MOVQ    8(CX), SI
-	ADDQ    SI, AX
-	MOVQ    AX, (SP)
-	MOVQ    ctx+16(FP), AX
-	MOVQ    72(AX), DI
-	MOVQ    80(AX), R8
-	MOVQ    88(AX), R9
-	MOVQ    104(AX), R10
-	MOVQ    s+0(FP), AX
-	MOVQ    144(AX), R11
-	MOVQ    152(AX), R12
-	MOVQ    160(AX), R13
-
-sequenceDecs_decode_amd64_main_loop:
-	MOVQ (SP), R14
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ SI, $0x08
-	JL   sequenceDecs_decode_amd64_fill_byte_by_byte
-	MOVQ BX, AX
-	SHRQ $0x03, AX
-	SUBQ AX, R14
-	MOVQ (R14), DX
-	SUBQ AX, SI
-	ANDQ $0x07, BX
-	JMP  sequenceDecs_decode_amd64_fill_end
-
-sequenceDecs_decode_amd64_fill_byte_by_byte:
-	CMPQ    SI, $0x00
-	JLE     sequenceDecs_decode_amd64_fill_check_overread
-	CMPQ    BX, $0x07
-	JLE     sequenceDecs_decode_amd64_fill_end
-	SHLQ    $0x08, DX
-	SUBQ    $0x01, R14
-	SUBQ    $0x01, SI
-	SUBQ    $0x08, BX
-	MOVBQZX (R14), AX
-	ORQ     AX, DX
-	JMP     sequenceDecs_decode_amd64_fill_byte_by_byte
-
-sequenceDecs_decode_amd64_fill_check_overread:
-	CMPQ BX, $0x40
-	JA   error_overread
-
-sequenceDecs_decode_amd64_fill_end:
-	// Update offset
-	MOVQ  R9, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R15
-	SHLQ  CL, R15
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decode_amd64_of_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decode_amd64_of_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decode_amd64_of_update_zero
-	NEGQ  CX
-	SHRQ  CL, R15
-	ADDQ  R15, AX
-
-sequenceDecs_decode_amd64_of_update_zero:
-	MOVQ AX, 16(R10)
-
-	// Update match length
-	MOVQ  R8, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R15
-	SHLQ  CL, R15
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decode_amd64_ml_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decode_amd64_ml_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decode_amd64_ml_update_zero
-	NEGQ  CX
-	SHRQ  CL, R15
-	ADDQ  R15, AX
-
-sequenceDecs_decode_amd64_ml_update_zero:
-	MOVQ AX, 8(R10)
-
-	// Fill bitreader to have enough for the remaining
-	CMPQ SI, $0x08
-	JL   sequenceDecs_decode_amd64_fill_2_byte_by_byte
-	MOVQ BX, AX
-	SHRQ $0x03, AX
-	SUBQ AX, R14
-	MOVQ (R14), DX
-	SUBQ AX, SI
-	ANDQ $0x07, BX
-	JMP  sequenceDecs_decode_amd64_fill_2_end
-
-sequenceDecs_decode_amd64_fill_2_byte_by_byte:
-	CMPQ    SI, $0x00
-	JLE     sequenceDecs_decode_amd64_fill_2_check_overread
-	CMPQ    BX, $0x07
-	JLE     sequenceDecs_decode_amd64_fill_2_end
-	SHLQ    $0x08, DX
-	SUBQ    $0x01, R14
-	SUBQ    $0x01, SI
-	SUBQ    $0x08, BX
-	MOVBQZX (R14), AX
-	ORQ     AX, DX
-	JMP     sequenceDecs_decode_amd64_fill_2_byte_by_byte
-
-sequenceDecs_decode_amd64_fill_2_check_overread:
-	CMPQ BX, $0x40
-	JA   error_overread
-
-sequenceDecs_decode_amd64_fill_2_end:
-	// Update literal length
-	MOVQ  DI, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R15
-	SHLQ  CL, R15
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decode_amd64_ll_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decode_amd64_ll_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decode_amd64_ll_update_zero
-	NEGQ  CX
-	SHRQ  CL, R15
-	ADDQ  R15, AX
-
-sequenceDecs_decode_amd64_ll_update_zero:
-	MOVQ AX, (R10)
-
-	// Fill bitreader for state updates
-	MOVQ    R14, (SP)
-	MOVQ    R9, AX
-	SHRQ    $0x08, AX
-	MOVBQZX AL, AX
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decode_amd64_skip_update
-
-	// Update Literal Length State
-	MOVBQZX DI, R14
-	SHRL    $0x10, DI
-	LEAQ    (BX)(R14*1), CX
-	MOVQ    DX, R15
-	MOVQ    CX, BX
-	ROLQ    CL, R15
-	MOVL    $0x00000001, BP
-	MOVB    R14, CL
-	SHLL    CL, BP
-	DECL    BP
-	ANDQ    BP, R15
-	ADDQ    R15, DI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Match Length State
-	MOVBQZX R8, R14
-	SHRL    $0x10, R8
-	LEAQ    (BX)(R14*1), CX
-	MOVQ    DX, R15
-	MOVQ    CX, BX
-	ROLQ    CL, R15
-	MOVL    $0x00000001, BP
-	MOVB    R14, CL
-	SHLL    CL, BP
-	DECL    BP
-	ANDQ    BP, R15
-	ADDQ    R15, R8
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Offset State
-	MOVBQZX R9, R14
-	SHRL    $0x10, R9
-	LEAQ    (BX)(R14*1), CX
-	MOVQ    DX, R15
-	MOVQ    CX, BX
-	ROLQ    CL, R15
-	MOVL    $0x00000001, BP
-	MOVB    R14, CL
-	SHLL    CL, BP
-	DECL    BP
-	ANDQ    BP, R15
-	ADDQ    R15, R9
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R9*8), R9
-
-sequenceDecs_decode_amd64_skip_update:
-	// Adjust offset
-	MOVQ 16(R10), CX
-	CMPQ AX, $0x01
-	JBE  sequenceDecs_decode_amd64_adjust_offsetB_1_or_0
-	MOVQ R12, R13
-	MOVQ R11, R12
-	MOVQ CX, R11
-	JMP  sequenceDecs_decode_amd64_after_adjust
-
-sequenceDecs_decode_amd64_adjust_offsetB_1_or_0:
-	CMPQ (R10), $0x00000000
-	JNE  sequenceDecs_decode_amd64_adjust_offset_maybezero
-	INCQ CX
-	JMP  sequenceDecs_decode_amd64_adjust_offset_nonzero
-
-sequenceDecs_decode_amd64_adjust_offset_maybezero:
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_amd64_adjust_offset_nonzero
-	MOVQ  R11, CX
-	JMP   sequenceDecs_decode_amd64_after_adjust
-
-sequenceDecs_decode_amd64_adjust_offset_nonzero:
-	CMPQ CX, $0x01
-	JB   sequenceDecs_decode_amd64_adjust_zero
-	JEQ  sequenceDecs_decode_amd64_adjust_one
-	CMPQ CX, $0x02
-	JA   sequenceDecs_decode_amd64_adjust_three
-	JMP  sequenceDecs_decode_amd64_adjust_two
-
-sequenceDecs_decode_amd64_adjust_zero:
-	MOVQ R11, AX
-	JMP  sequenceDecs_decode_amd64_adjust_test_temp_valid
-
-sequenceDecs_decode_amd64_adjust_one:
-	MOVQ R12, AX
-	JMP  sequenceDecs_decode_amd64_adjust_test_temp_valid
-
-sequenceDecs_decode_amd64_adjust_two:
-	MOVQ R13, AX
-	JMP  sequenceDecs_decode_amd64_adjust_test_temp_valid
-
-sequenceDecs_decode_amd64_adjust_three:
-	LEAQ -1(R11), AX
-
-sequenceDecs_decode_amd64_adjust_test_temp_valid:
-	TESTQ AX, AX
-	JNZ   sequenceDecs_decode_amd64_adjust_temp_valid
-	MOVQ  $0x00000001, AX
-
-sequenceDecs_decode_amd64_adjust_temp_valid:
-	CMPQ    CX, $0x01
-	CMOVQNE R12, R13
-	MOVQ    R11, R12
-	MOVQ    AX, R11
-	MOVQ    AX, CX
-
-sequenceDecs_decode_amd64_after_adjust:
-	MOVQ CX, 16(R10)
-
-	// Check values
-	MOVQ  8(R10), AX
-	MOVQ  (R10), R14
-	LEAQ  (AX)(R14*1), R15
-	MOVQ  s+0(FP), BP
-	ADDQ  R15, 256(BP)
-	MOVQ  ctx+16(FP), R15
-	SUBQ  R14, 128(R15)
-	JS    error_not_enough_literals
-	CMPQ  AX, $0x00020002
-	JA    sequenceDecs_decode_amd64_error_match_len_too_big
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_amd64_match_len_ofs_ok
-	TESTQ AX, AX
-	JNZ   sequenceDecs_decode_amd64_error_match_len_ofs_mismatch
-
-sequenceDecs_decode_amd64_match_len_ofs_ok:
-	ADDQ $0x18, R10
-	MOVQ ctx+16(FP), AX
-	DECQ 96(AX)
-	JNS  sequenceDecs_decode_amd64_main_loop
-	MOVQ s+0(FP), AX
-	MOVQ R11, 144(AX)
-	MOVQ R12, 152(AX)
-	MOVQ R13, 160(AX)
-	MOVQ br+8(FP), AX
-	MOVQ DX, 24(AX)
-	MOVB BL, 32(AX)
-	MOVQ SI, 8(AX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decode_amd64_error_match_len_ofs_mismatch:
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decode_amd64_error_match_len_too_big:
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-// func sequenceDecs_decode_56_amd64(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-// Requires: CMOV
-TEXT ·sequenceDecs_decode_56_amd64(SB), $8-32
-	MOVQ    br+8(FP), CX
-	MOVQ    24(CX), DX
-	MOVBQZX 32(CX), BX
-	MOVQ    (CX), AX
-	MOVQ    8(CX), SI
-	ADDQ    SI, AX
-	MOVQ    AX, (SP)
-	MOVQ    ctx+16(FP), AX
-	MOVQ    72(AX), DI
-	MOVQ    80(AX), R8
-	MOVQ    88(AX), R9
-	MOVQ    104(AX), R10
-	MOVQ    s+0(FP), AX
-	MOVQ    144(AX), R11
-	MOVQ    152(AX), R12
-	MOVQ    160(AX), R13
-
-sequenceDecs_decode_56_amd64_main_loop:
-	MOVQ (SP), R14
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ SI, $0x08
-	JL   sequenceDecs_decode_56_amd64_fill_byte_by_byte
-	MOVQ BX, AX
-	SHRQ $0x03, AX
-	SUBQ AX, R14
-	MOVQ (R14), DX
-	SUBQ AX, SI
-	ANDQ $0x07, BX
-	JMP  sequenceDecs_decode_56_amd64_fill_end
-
-sequenceDecs_decode_56_amd64_fill_byte_by_byte:
-	CMPQ    SI, $0x00
-	JLE     sequenceDecs_decode_56_amd64_fill_check_overread
-	CMPQ    BX, $0x07
-	JLE     sequenceDecs_decode_56_amd64_fill_end
-	SHLQ    $0x08, DX
-	SUBQ    $0x01, R14
-	SUBQ    $0x01, SI
-	SUBQ    $0x08, BX
-	MOVBQZX (R14), AX
-	ORQ     AX, DX
-	JMP     sequenceDecs_decode_56_amd64_fill_byte_by_byte
-
-sequenceDecs_decode_56_amd64_fill_check_overread:
-	CMPQ BX, $0x40
-	JA   error_overread
-
-sequenceDecs_decode_56_amd64_fill_end:
-	// Update offset
-	MOVQ  R9, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R15
-	SHLQ  CL, R15
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decode_56_amd64_of_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decode_56_amd64_of_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decode_56_amd64_of_update_zero
-	NEGQ  CX
-	SHRQ  CL, R15
-	ADDQ  R15, AX
-
-sequenceDecs_decode_56_amd64_of_update_zero:
-	MOVQ AX, 16(R10)
-
-	// Update match length
-	MOVQ  R8, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R15
-	SHLQ  CL, R15
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decode_56_amd64_ml_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decode_56_amd64_ml_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decode_56_amd64_ml_update_zero
-	NEGQ  CX
-	SHRQ  CL, R15
-	ADDQ  R15, AX
-
-sequenceDecs_decode_56_amd64_ml_update_zero:
-	MOVQ AX, 8(R10)
-
-	// Update literal length
-	MOVQ  DI, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R15
-	SHLQ  CL, R15
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decode_56_amd64_ll_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decode_56_amd64_ll_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decode_56_amd64_ll_update_zero
-	NEGQ  CX
-	SHRQ  CL, R15
-	ADDQ  R15, AX
-
-sequenceDecs_decode_56_amd64_ll_update_zero:
-	MOVQ AX, (R10)
-
-	// Fill bitreader for state updates
-	MOVQ    R14, (SP)
-	MOVQ    R9, AX
-	SHRQ    $0x08, AX
-	MOVBQZX AL, AX
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decode_56_amd64_skip_update
-
-	// Update Literal Length State
-	MOVBQZX DI, R14
-	SHRL    $0x10, DI
-	LEAQ    (BX)(R14*1), CX
-	MOVQ    DX, R15
-	MOVQ    CX, BX
-	ROLQ    CL, R15
-	MOVL    $0x00000001, BP
-	MOVB    R14, CL
-	SHLL    CL, BP
-	DECL    BP
-	ANDQ    BP, R15
-	ADDQ    R15, DI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Match Length State
-	MOVBQZX R8, R14
-	SHRL    $0x10, R8
-	LEAQ    (BX)(R14*1), CX
-	MOVQ    DX, R15
-	MOVQ    CX, BX
-	ROLQ    CL, R15
-	MOVL    $0x00000001, BP
-	MOVB    R14, CL
-	SHLL    CL, BP
-	DECL    BP
-	ANDQ    BP, R15
-	ADDQ    R15, R8
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Offset State
-	MOVBQZX R9, R14
-	SHRL    $0x10, R9
-	LEAQ    (BX)(R14*1), CX
-	MOVQ    DX, R15
-	MOVQ    CX, BX
-	ROLQ    CL, R15
-	MOVL    $0x00000001, BP
-	MOVB    R14, CL
-	SHLL    CL, BP
-	DECL    BP
-	ANDQ    BP, R15
-	ADDQ    R15, R9
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R9*8), R9
-
-sequenceDecs_decode_56_amd64_skip_update:
-	// Adjust offset
-	MOVQ 16(R10), CX
-	CMPQ AX, $0x01
-	JBE  sequenceDecs_decode_56_amd64_adjust_offsetB_1_or_0
-	MOVQ R12, R13
-	MOVQ R11, R12
-	MOVQ CX, R11
-	JMP  sequenceDecs_decode_56_amd64_after_adjust
-
-sequenceDecs_decode_56_amd64_adjust_offsetB_1_or_0:
-	CMPQ (R10), $0x00000000
-	JNE  sequenceDecs_decode_56_amd64_adjust_offset_maybezero
-	INCQ CX
-	JMP  sequenceDecs_decode_56_amd64_adjust_offset_nonzero
-
-sequenceDecs_decode_56_amd64_adjust_offset_maybezero:
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_56_amd64_adjust_offset_nonzero
-	MOVQ  R11, CX
-	JMP   sequenceDecs_decode_56_amd64_after_adjust
-
-sequenceDecs_decode_56_amd64_adjust_offset_nonzero:
-	CMPQ CX, $0x01
-	JB   sequenceDecs_decode_56_amd64_adjust_zero
-	JEQ  sequenceDecs_decode_56_amd64_adjust_one
-	CMPQ CX, $0x02
-	JA   sequenceDecs_decode_56_amd64_adjust_three
-	JMP  sequenceDecs_decode_56_amd64_adjust_two
-
-sequenceDecs_decode_56_amd64_adjust_zero:
-	MOVQ R11, AX
-	JMP  sequenceDecs_decode_56_amd64_adjust_test_temp_valid
-
-sequenceDecs_decode_56_amd64_adjust_one:
-	MOVQ R12, AX
-	JMP  sequenceDecs_decode_56_amd64_adjust_test_temp_valid
-
-sequenceDecs_decode_56_amd64_adjust_two:
-	MOVQ R13, AX
-	JMP  sequenceDecs_decode_56_amd64_adjust_test_temp_valid
-
-sequenceDecs_decode_56_amd64_adjust_three:
-	LEAQ -1(R11), AX
-
-sequenceDecs_decode_56_amd64_adjust_test_temp_valid:
-	TESTQ AX, AX
-	JNZ   sequenceDecs_decode_56_amd64_adjust_temp_valid
-	MOVQ  $0x00000001, AX
-
-sequenceDecs_decode_56_amd64_adjust_temp_valid:
-	CMPQ    CX, $0x01
-	CMOVQNE R12, R13
-	MOVQ    R11, R12
-	MOVQ    AX, R11
-	MOVQ    AX, CX
-
-sequenceDecs_decode_56_amd64_after_adjust:
-	MOVQ CX, 16(R10)
-
-	// Check values
-	MOVQ  8(R10), AX
-	MOVQ  (R10), R14
-	LEAQ  (AX)(R14*1), R15
-	MOVQ  s+0(FP), BP
-	ADDQ  R15, 256(BP)
-	MOVQ  ctx+16(FP), R15
-	SUBQ  R14, 128(R15)
-	JS    error_not_enough_literals
-	CMPQ  AX, $0x00020002
-	JA    sequenceDecs_decode_56_amd64_error_match_len_too_big
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_56_amd64_match_len_ofs_ok
-	TESTQ AX, AX
-	JNZ   sequenceDecs_decode_56_amd64_error_match_len_ofs_mismatch
-
-sequenceDecs_decode_56_amd64_match_len_ofs_ok:
-	ADDQ $0x18, R10
-	MOVQ ctx+16(FP), AX
-	DECQ 96(AX)
-	JNS  sequenceDecs_decode_56_amd64_main_loop
-	MOVQ s+0(FP), AX
-	MOVQ R11, 144(AX)
-	MOVQ R12, 152(AX)
-	MOVQ R13, 160(AX)
-	MOVQ br+8(FP), AX
-	MOVQ DX, 24(AX)
-	MOVB BL, 32(AX)
-	MOVQ SI, 8(AX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decode_56_amd64_error_match_len_ofs_mismatch:
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decode_56_amd64_error_match_len_too_big:
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-// func sequenceDecs_decode_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-// Requires: BMI, BMI2, CMOV
-TEXT ·sequenceDecs_decode_bmi2(SB), $8-32
-	MOVQ    br+8(FP), BX
-	MOVQ    24(BX), AX
-	MOVBQZX 32(BX), DX
-	MOVQ    (BX), CX
-	MOVQ    8(BX), BX
-	ADDQ    BX, CX
-	MOVQ    CX, (SP)
-	MOVQ    ctx+16(FP), CX
-	MOVQ    72(CX), SI
-	MOVQ    80(CX), DI
-	MOVQ    88(CX), R8
-	MOVQ    104(CX), R9
-	MOVQ    s+0(FP), CX
-	MOVQ    144(CX), R10
-	MOVQ    152(CX), R11
-	MOVQ    160(CX), R12
-
-sequenceDecs_decode_bmi2_main_loop:
-	MOVQ (SP), R13
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ BX, $0x08
-	JL   sequenceDecs_decode_bmi2_fill_byte_by_byte
-	MOVQ DX, CX
-	SHRQ $0x03, CX
-	SUBQ CX, R13
-	MOVQ (R13), AX
-	SUBQ CX, BX
-	ANDQ $0x07, DX
-	JMP  sequenceDecs_decode_bmi2_fill_end
-
-sequenceDecs_decode_bmi2_fill_byte_by_byte:
-	CMPQ    BX, $0x00
-	JLE     sequenceDecs_decode_bmi2_fill_check_overread
-	CMPQ    DX, $0x07
-	JLE     sequenceDecs_decode_bmi2_fill_end
-	SHLQ    $0x08, AX
-	SUBQ    $0x01, R13
-	SUBQ    $0x01, BX
-	SUBQ    $0x08, DX
-	MOVBQZX (R13), CX
-	ORQ     CX, AX
-	JMP     sequenceDecs_decode_bmi2_fill_byte_by_byte
-
-sequenceDecs_decode_bmi2_fill_check_overread:
-	CMPQ DX, $0x40
-	JA   error_overread
-
-sequenceDecs_decode_bmi2_fill_end:
-	// Update offset
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, R8, R14
-	MOVQ   AX, R15
-	LEAQ   (DX)(R14*1), CX
-	ROLQ   CL, R15
-	BZHIQ  R14, R15, R15
-	MOVQ   CX, DX
-	MOVQ   R8, CX
-	SHRQ   $0x20, CX
-	ADDQ   R15, CX
-	MOVQ   CX, 16(R9)
-
-	// Update match length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, DI, R14
-	MOVQ   AX, R15
-	LEAQ   (DX)(R14*1), CX
-	ROLQ   CL, R15
-	BZHIQ  R14, R15, R15
-	MOVQ   CX, DX
-	MOVQ   DI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R15, CX
-	MOVQ   CX, 8(R9)
-
-	// Fill bitreader to have enough for the remaining
-	CMPQ BX, $0x08
-	JL   sequenceDecs_decode_bmi2_fill_2_byte_by_byte
-	MOVQ DX, CX
-	SHRQ $0x03, CX
-	SUBQ CX, R13
-	MOVQ (R13), AX
-	SUBQ CX, BX
-	ANDQ $0x07, DX
-	JMP  sequenceDecs_decode_bmi2_fill_2_end
-
-sequenceDecs_decode_bmi2_fill_2_byte_by_byte:
-	CMPQ    BX, $0x00
-	JLE     sequenceDecs_decode_bmi2_fill_2_check_overread
-	CMPQ    DX, $0x07
-	JLE     sequenceDecs_decode_bmi2_fill_2_end
-	SHLQ    $0x08, AX
-	SUBQ    $0x01, R13
-	SUBQ    $0x01, BX
-	SUBQ    $0x08, DX
-	MOVBQZX (R13), CX
-	ORQ     CX, AX
-	JMP     sequenceDecs_decode_bmi2_fill_2_byte_by_byte
-
-sequenceDecs_decode_bmi2_fill_2_check_overread:
-	CMPQ DX, $0x40
-	JA   error_overread
-
-sequenceDecs_decode_bmi2_fill_2_end:
-	// Update literal length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, SI, R14
-	MOVQ   AX, R15
-	LEAQ   (DX)(R14*1), CX
-	ROLQ   CL, R15
-	BZHIQ  R14, R15, R15
-	MOVQ   CX, DX
-	MOVQ   SI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R15, CX
-	MOVQ   CX, (R9)
-
-	// Fill bitreader for state updates
-	MOVQ    R13, (SP)
-	MOVQ    $0x00000808, CX
-	BEXTRQ  CX, R8, R13
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decode_bmi2_skip_update
-	LEAQ    (SI)(DI*1), R14
-	ADDQ    R8, R14
-	MOVBQZX R14, R14
-	LEAQ    (DX)(R14*1), CX
-	MOVQ    AX, R15
-	MOVQ    CX, DX
-	ROLQ    CL, R15
-	BZHIQ   R14, R15, R15
-
-	// Update Offset State
-	BZHIQ R8, R15, CX
-	SHRXQ R8, R15, R15
-	SHRL  $0x10, R8
-	ADDQ  CX, R8
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Match Length State
-	BZHIQ DI, R15, CX
-	SHRXQ DI, R15, R15
-	SHRL  $0x10, DI
-	ADDQ  CX, DI
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Literal Length State
-	BZHIQ SI, R15, CX
-	SHRL  $0x10, SI
-	ADDQ  CX, SI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(SI*8), SI
-
-sequenceDecs_decode_bmi2_skip_update:
-	// Adjust offset
-	MOVQ 16(R9), CX
-	CMPQ R13, $0x01
-	JBE  sequenceDecs_decode_bmi2_adjust_offsetB_1_or_0
-	MOVQ R11, R12
-	MOVQ R10, R11
-	MOVQ CX, R10
-	JMP  sequenceDecs_decode_bmi2_after_adjust
-
-sequenceDecs_decode_bmi2_adjust_offsetB_1_or_0:
-	CMPQ (R9), $0x00000000
-	JNE  sequenceDecs_decode_bmi2_adjust_offset_maybezero
-	INCQ CX
-	JMP  sequenceDecs_decode_bmi2_adjust_offset_nonzero
-
-sequenceDecs_decode_bmi2_adjust_offset_maybezero:
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_bmi2_adjust_offset_nonzero
-	MOVQ  R10, CX
-	JMP   sequenceDecs_decode_bmi2_after_adjust
-
-sequenceDecs_decode_bmi2_adjust_offset_nonzero:
-	CMPQ CX, $0x01
-	JB   sequenceDecs_decode_bmi2_adjust_zero
-	JEQ  sequenceDecs_decode_bmi2_adjust_one
-	CMPQ CX, $0x02
-	JA   sequenceDecs_decode_bmi2_adjust_three
-	JMP  sequenceDecs_decode_bmi2_adjust_two
-
-sequenceDecs_decode_bmi2_adjust_zero:
-	MOVQ R10, R13
-	JMP  sequenceDecs_decode_bmi2_adjust_test_temp_valid
-
-sequenceDecs_decode_bmi2_adjust_one:
-	MOVQ R11, R13
-	JMP  sequenceDecs_decode_bmi2_adjust_test_temp_valid
-
-sequenceDecs_decode_bmi2_adjust_two:
-	MOVQ R12, R13
-	JMP  sequenceDecs_decode_bmi2_adjust_test_temp_valid
-
-sequenceDecs_decode_bmi2_adjust_three:
-	LEAQ -1(R10), R13
-
-sequenceDecs_decode_bmi2_adjust_test_temp_valid:
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decode_bmi2_adjust_temp_valid
-	MOVQ  $0x00000001, R13
-
-sequenceDecs_decode_bmi2_adjust_temp_valid:
-	CMPQ    CX, $0x01
-	CMOVQNE R11, R12
-	MOVQ    R10, R11
-	MOVQ    R13, R10
-	MOVQ    R13, CX
-
-sequenceDecs_decode_bmi2_after_adjust:
-	MOVQ CX, 16(R9)
-
-	// Check values
-	MOVQ  8(R9), R13
-	MOVQ  (R9), R14
-	LEAQ  (R13)(R14*1), R15
-	MOVQ  s+0(FP), BP
-	ADDQ  R15, 256(BP)
-	MOVQ  ctx+16(FP), R15
-	SUBQ  R14, 128(R15)
-	JS    error_not_enough_literals
-	CMPQ  R13, $0x00020002
-	JA    sequenceDecs_decode_bmi2_error_match_len_too_big
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_bmi2_match_len_ofs_ok
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decode_bmi2_error_match_len_ofs_mismatch
-
-sequenceDecs_decode_bmi2_match_len_ofs_ok:
-	ADDQ $0x18, R9
-	MOVQ ctx+16(FP), CX
-	DECQ 96(CX)
-	JNS  sequenceDecs_decode_bmi2_main_loop
-	MOVQ s+0(FP), CX
-	MOVQ R10, 144(CX)
-	MOVQ R11, 152(CX)
-	MOVQ R12, 160(CX)
-	MOVQ br+8(FP), CX
-	MOVQ AX, 24(CX)
-	MOVB DL, 32(CX)
-	MOVQ BX, 8(CX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decode_bmi2_error_match_len_ofs_mismatch:
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decode_bmi2_error_match_len_too_big:
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-// func sequenceDecs_decode_56_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeAsmContext) int
-// Requires: BMI, BMI2, CMOV
-TEXT ·sequenceDecs_decode_56_bmi2(SB), $8-32
-	MOVQ    br+8(FP), BX
-	MOVQ    24(BX), AX
-	MOVBQZX 32(BX), DX
-	MOVQ    (BX), CX
-	MOVQ    8(BX), BX
-	ADDQ    BX, CX
-	MOVQ    CX, (SP)
-	MOVQ    ctx+16(FP), CX
-	MOVQ    72(CX), SI
-	MOVQ    80(CX), DI
-	MOVQ    88(CX), R8
-	MOVQ    104(CX), R9
-	MOVQ    s+0(FP), CX
-	MOVQ    144(CX), R10
-	MOVQ    152(CX), R11
-	MOVQ    160(CX), R12
-
-sequenceDecs_decode_56_bmi2_main_loop:
-	MOVQ (SP), R13
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ BX, $0x08
-	JL   sequenceDecs_decode_56_bmi2_fill_byte_by_byte
-	MOVQ DX, CX
-	SHRQ $0x03, CX
-	SUBQ CX, R13
-	MOVQ (R13), AX
-	SUBQ CX, BX
-	ANDQ $0x07, DX
-	JMP  sequenceDecs_decode_56_bmi2_fill_end
-
-sequenceDecs_decode_56_bmi2_fill_byte_by_byte:
-	CMPQ    BX, $0x00
-	JLE     sequenceDecs_decode_56_bmi2_fill_check_overread
-	CMPQ    DX, $0x07
-	JLE     sequenceDecs_decode_56_bmi2_fill_end
-	SHLQ    $0x08, AX
-	SUBQ    $0x01, R13
-	SUBQ    $0x01, BX
-	SUBQ    $0x08, DX
-	MOVBQZX (R13), CX
-	ORQ     CX, AX
-	JMP     sequenceDecs_decode_56_bmi2_fill_byte_by_byte
-
-sequenceDecs_decode_56_bmi2_fill_check_overread:
-	CMPQ DX, $0x40
-	JA   error_overread
-
-sequenceDecs_decode_56_bmi2_fill_end:
-	// Update offset
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, R8, R14
-	MOVQ   AX, R15
-	LEAQ   (DX)(R14*1), CX
-	ROLQ   CL, R15
-	BZHIQ  R14, R15, R15
-	MOVQ   CX, DX
-	MOVQ   R8, CX
-	SHRQ   $0x20, CX
-	ADDQ   R15, CX
-	MOVQ   CX, 16(R9)
-
-	// Update match length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, DI, R14
-	MOVQ   AX, R15
-	LEAQ   (DX)(R14*1), CX
-	ROLQ   CL, R15
-	BZHIQ  R14, R15, R15
-	MOVQ   CX, DX
-	MOVQ   DI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R15, CX
-	MOVQ   CX, 8(R9)
-
-	// Update literal length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, SI, R14
-	MOVQ   AX, R15
-	LEAQ   (DX)(R14*1), CX
-	ROLQ   CL, R15
-	BZHIQ  R14, R15, R15
-	MOVQ   CX, DX
-	MOVQ   SI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R15, CX
-	MOVQ   CX, (R9)
-
-	// Fill bitreader for state updates
-	MOVQ    R13, (SP)
-	MOVQ    $0x00000808, CX
-	BEXTRQ  CX, R8, R13
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decode_56_bmi2_skip_update
-	LEAQ    (SI)(DI*1), R14
-	ADDQ    R8, R14
-	MOVBQZX R14, R14
-	LEAQ    (DX)(R14*1), CX
-	MOVQ    AX, R15
-	MOVQ    CX, DX
-	ROLQ    CL, R15
-	BZHIQ   R14, R15, R15
-
-	// Update Offset State
-	BZHIQ R8, R15, CX
-	SHRXQ R8, R15, R15
-	SHRL  $0x10, R8
-	ADDQ  CX, R8
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Match Length State
-	BZHIQ DI, R15, CX
-	SHRXQ DI, R15, R15
-	SHRL  $0x10, DI
-	ADDQ  CX, DI
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Literal Length State
-	BZHIQ SI, R15, CX
-	SHRL  $0x10, SI
-	ADDQ  CX, SI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(SI*8), SI
-
-sequenceDecs_decode_56_bmi2_skip_update:
-	// Adjust offset
-	MOVQ 16(R9), CX
-	CMPQ R13, $0x01
-	JBE  sequenceDecs_decode_56_bmi2_adjust_offsetB_1_or_0
-	MOVQ R11, R12
-	MOVQ R10, R11
-	MOVQ CX, R10
-	JMP  sequenceDecs_decode_56_bmi2_after_adjust
-
-sequenceDecs_decode_56_bmi2_adjust_offsetB_1_or_0:
-	CMPQ (R9), $0x00000000
-	JNE  sequenceDecs_decode_56_bmi2_adjust_offset_maybezero
-	INCQ CX
-	JMP  sequenceDecs_decode_56_bmi2_adjust_offset_nonzero
-
-sequenceDecs_decode_56_bmi2_adjust_offset_maybezero:
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_56_bmi2_adjust_offset_nonzero
-	MOVQ  R10, CX
-	JMP   sequenceDecs_decode_56_bmi2_after_adjust
-
-sequenceDecs_decode_56_bmi2_adjust_offset_nonzero:
-	CMPQ CX, $0x01
-	JB   sequenceDecs_decode_56_bmi2_adjust_zero
-	JEQ  sequenceDecs_decode_56_bmi2_adjust_one
-	CMPQ CX, $0x02
-	JA   sequenceDecs_decode_56_bmi2_adjust_three
-	JMP  sequenceDecs_decode_56_bmi2_adjust_two
-
-sequenceDecs_decode_56_bmi2_adjust_zero:
-	MOVQ R10, R13
-	JMP  sequenceDecs_decode_56_bmi2_adjust_test_temp_valid
-
-sequenceDecs_decode_56_bmi2_adjust_one:
-	MOVQ R11, R13
-	JMP  sequenceDecs_decode_56_bmi2_adjust_test_temp_valid
-
-sequenceDecs_decode_56_bmi2_adjust_two:
-	MOVQ R12, R13
-	JMP  sequenceDecs_decode_56_bmi2_adjust_test_temp_valid
-
-sequenceDecs_decode_56_bmi2_adjust_three:
-	LEAQ -1(R10), R13
-
-sequenceDecs_decode_56_bmi2_adjust_test_temp_valid:
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decode_56_bmi2_adjust_temp_valid
-	MOVQ  $0x00000001, R13
-
-sequenceDecs_decode_56_bmi2_adjust_temp_valid:
-	CMPQ    CX, $0x01
-	CMOVQNE R11, R12
-	MOVQ    R10, R11
-	MOVQ    R13, R10
-	MOVQ    R13, CX
-
-sequenceDecs_decode_56_bmi2_after_adjust:
-	MOVQ CX, 16(R9)
-
-	// Check values
-	MOVQ  8(R9), R13
-	MOVQ  (R9), R14
-	LEAQ  (R13)(R14*1), R15
-	MOVQ  s+0(FP), BP
-	ADDQ  R15, 256(BP)
-	MOVQ  ctx+16(FP), R15
-	SUBQ  R14, 128(R15)
-	JS    error_not_enough_literals
-	CMPQ  R13, $0x00020002
-	JA    sequenceDecs_decode_56_bmi2_error_match_len_too_big
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decode_56_bmi2_match_len_ofs_ok
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decode_56_bmi2_error_match_len_ofs_mismatch
-
-sequenceDecs_decode_56_bmi2_match_len_ofs_ok:
-	ADDQ $0x18, R9
-	MOVQ ctx+16(FP), CX
-	DECQ 96(CX)
-	JNS  sequenceDecs_decode_56_bmi2_main_loop
-	MOVQ s+0(FP), CX
-	MOVQ R10, 144(CX)
-	MOVQ R11, 152(CX)
-	MOVQ R12, 160(CX)
-	MOVQ br+8(FP), CX
-	MOVQ AX, 24(CX)
-	MOVB DL, 32(CX)
-	MOVQ BX, 8(CX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decode_56_bmi2_error_match_len_ofs_mismatch:
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decode_56_bmi2_error_match_len_too_big:
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-// func sequenceDecs_executeSimple_amd64(ctx *executeAsmContext) bool
-// Requires: SSE
-TEXT ·sequenceDecs_executeSimple_amd64(SB), $8-9
-	MOVQ  ctx+0(FP), R10
-	MOVQ  8(R10), CX
-	TESTQ CX, CX
-	JZ    empty_seqs
-	MOVQ  (R10), AX
-	MOVQ  24(R10), DX
-	MOVQ  32(R10), BX
-	MOVQ  80(R10), SI
-	MOVQ  104(R10), DI
-	MOVQ  120(R10), R8
-	MOVQ  56(R10), R9
-	MOVQ  64(R10), R10
-	ADDQ  R10, R9
-
-	// seqsBase += 24 * seqIndex
-	LEAQ (DX)(DX*2), R11
-	SHLQ $0x03, R11
-	ADDQ R11, AX
-
-	// outBase += outPosition
-	ADDQ DI, BX
-
-main_loop:
-	MOVQ (AX), R11
-	MOVQ 16(AX), R12
-	MOVQ 8(AX), R13
-
-	// Copy literals
-	TESTQ R11, R11
-	JZ    check_offset
-	XORQ  R14, R14
-
-copy_1:
-	MOVUPS (SI)(R14*1), X0
-	MOVUPS X0, (BX)(R14*1)
-	ADDQ   $0x10, R14
-	CMPQ   R14, R11
-	JB     copy_1
-	ADDQ   R11, SI
-	ADDQ   R11, BX
-	ADDQ   R11, DI
-
-	// Malformed input if seq.mo > t+len(hist) || seq.mo > s.windowSize)
-check_offset:
-	LEAQ (DI)(R10*1), R11
-	CMPQ R12, R11
-	JG   error_match_off_too_big
-	CMPQ R12, R8
-	JG   error_match_off_too_big
-
-	// Copy match from history
-	MOVQ R12, R11
-	SUBQ DI, R11
-	JLS  copy_match
-	MOVQ R9, R14
-	SUBQ R11, R14
-	CMPQ R13, R11
-	JG   copy_all_from_history
-	MOVQ R13, R11
-	SUBQ $0x10, R11
-	JB   copy_4_small
-
-copy_4_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (BX)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, BX
-	SUBQ   $0x10, R11
-	JAE    copy_4_loop
-	LEAQ   16(R14)(R11*1), R14
-	LEAQ   16(BX)(R11*1), BX
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(BX)
-	JMP    copy_4_end
-
-copy_4_small:
-	CMPQ R13, $0x03
-	JE   copy_4_move_3
-	CMPQ R13, $0x08
-	JB   copy_4_move_4through7
-	JMP  copy_4_move_8through16
-
-copy_4_move_3:
-	MOVW (R14), R11
-	MOVB 2(R14), R12
-	MOVW R11, (BX)
-	MOVB R12, 2(BX)
-	ADDQ R13, R14
-	ADDQ R13, BX
-	JMP  copy_4_end
-
-copy_4_move_4through7:
-	MOVL (R14), R11
-	MOVL -4(R14)(R13*1), R12
-	MOVL R11, (BX)
-	MOVL R12, -4(BX)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, BX
-	JMP  copy_4_end
-
-copy_4_move_8through16:
-	MOVQ (R14), R11
-	MOVQ -8(R14)(R13*1), R12
-	MOVQ R11, (BX)
-	MOVQ R12, -8(BX)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, BX
-
-copy_4_end:
-	ADDQ R13, DI
-	ADDQ $0x18, AX
-	INCQ DX
-	CMPQ DX, CX
-	JB   main_loop
-	JMP  loop_finished
-
-copy_all_from_history:
-	MOVQ R11, R15
-	SUBQ $0x10, R15
-	JB   copy_5_small
-
-copy_5_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (BX)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, BX
-	SUBQ   $0x10, R15
-	JAE    copy_5_loop
-	LEAQ   16(R14)(R15*1), R14
-	LEAQ   16(BX)(R15*1), BX
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(BX)
-	JMP    copy_5_end
-
-copy_5_small:
-	CMPQ R11, $0x03
-	JE   copy_5_move_3
-	JB   copy_5_move_1or2
-	CMPQ R11, $0x08
-	JB   copy_5_move_4through7
-	JMP  copy_5_move_8through16
-
-copy_5_move_1or2:
-	MOVB (R14), R15
-	MOVB -1(R14)(R11*1), BP
-	MOVB R15, (BX)
-	MOVB BP, -1(BX)(R11*1)
-	ADDQ R11, R14
-	ADDQ R11, BX
-	JMP  copy_5_end
-
-copy_5_move_3:
-	MOVW (R14), R15
-	MOVB 2(R14), BP
-	MOVW R15, (BX)
-	MOVB BP, 2(BX)
-	ADDQ R11, R14
-	ADDQ R11, BX
-	JMP  copy_5_end
-
-copy_5_move_4through7:
-	MOVL (R14), R15
-	MOVL -4(R14)(R11*1), BP
-	MOVL R15, (BX)
-	MOVL BP, -4(BX)(R11*1)
-	ADDQ R11, R14
-	ADDQ R11, BX
-	JMP  copy_5_end
-
-copy_5_move_8through16:
-	MOVQ (R14), R15
-	MOVQ -8(R14)(R11*1), BP
-	MOVQ R15, (BX)
-	MOVQ BP, -8(BX)(R11*1)
-	ADDQ R11, R14
-	ADDQ R11, BX
-
-copy_5_end:
-	ADDQ R11, DI
-	SUBQ R11, R13
-
-	// Copy match from the current buffer
-copy_match:
-	MOVQ BX, R11
-	SUBQ R12, R11
-
-	// ml <= mo
-	CMPQ R13, R12
-	JA   copy_overlapping_match
-
-	// Copy non-overlapping match
-	ADDQ R13, DI
-	MOVQ BX, R12
-	ADDQ R13, BX
-
-copy_2:
-	MOVUPS (R11), X0
-	MOVUPS X0, (R12)
-	ADDQ   $0x10, R11
-	ADDQ   $0x10, R12
-	SUBQ   $0x10, R13
-	JHI    copy_2
-	JMP    handle_loop
-
-	// Copy overlapping match
-copy_overlapping_match:
-	ADDQ R13, DI
-
-copy_slow_3:
-	MOVB (R11), R12
-	MOVB R12, (BX)
-	INCQ R11
-	INCQ BX
-	DECQ R13
-	JNZ  copy_slow_3
-
-handle_loop:
-	ADDQ $0x18, AX
-	INCQ DX
-	CMPQ DX, CX
-	JB   main_loop
-
-loop_finished:
-	// Return value
-	MOVB $0x01, ret+8(FP)
-
-	// Update the context
-	MOVQ ctx+0(FP), AX
-	MOVQ DX, 24(AX)
-	MOVQ DI, 104(AX)
-	SUBQ 80(AX), SI
-	MOVQ SI, 112(AX)
-	RET
-
-error_match_off_too_big:
-	// Return value
-	MOVB $0x00, ret+8(FP)
-
-	// Update the context
-	MOVQ ctx+0(FP), AX
-	MOVQ DX, 24(AX)
-	MOVQ DI, 104(AX)
-	SUBQ 80(AX), SI
-	MOVQ SI, 112(AX)
-	RET
-
-empty_seqs:
-	// Return value
-	MOVB $0x01, ret+8(FP)
-	RET
-
-// func sequenceDecs_executeSimple_safe_amd64(ctx *executeAsmContext) bool
-// Requires: SSE
-TEXT ·sequenceDecs_executeSimple_safe_amd64(SB), $8-9
-	MOVQ  ctx+0(FP), R10
-	MOVQ  8(R10), CX
-	TESTQ CX, CX
-	JZ    empty_seqs
-	MOVQ  (R10), AX
-	MOVQ  24(R10), DX
-	MOVQ  32(R10), BX
-	MOVQ  80(R10), SI
-	MOVQ  104(R10), DI
-	MOVQ  120(R10), R8
-	MOVQ  56(R10), R9
-	MOVQ  64(R10), R10
-	ADDQ  R10, R9
-
-	// seqsBase += 24 * seqIndex
-	LEAQ (DX)(DX*2), R11
-	SHLQ $0x03, R11
-	ADDQ R11, AX
-
-	// outBase += outPosition
-	ADDQ DI, BX
-
-main_loop:
-	MOVQ (AX), R11
-	MOVQ 16(AX), R12
-	MOVQ 8(AX), R13
-
-	// Copy literals
-	TESTQ R11, R11
-	JZ    check_offset
-	MOVQ  R11, R14
-	SUBQ  $0x10, R14
-	JB    copy_1_small
-
-copy_1_loop:
-	MOVUPS (SI), X0
-	MOVUPS X0, (BX)
-	ADDQ   $0x10, SI
-	ADDQ   $0x10, BX
-	SUBQ   $0x10, R14
-	JAE    copy_1_loop
-	LEAQ   16(SI)(R14*1), SI
-	LEAQ   16(BX)(R14*1), BX
-	MOVUPS -16(SI), X0
-	MOVUPS X0, -16(BX)
-	JMP    copy_1_end
-
-copy_1_small:
-	CMPQ R11, $0x03
-	JE   copy_1_move_3
-	JB   copy_1_move_1or2
-	CMPQ R11, $0x08
-	JB   copy_1_move_4through7
-	JMP  copy_1_move_8through16
-
-copy_1_move_1or2:
-	MOVB (SI), R14
-	MOVB -1(SI)(R11*1), R15
-	MOVB R14, (BX)
-	MOVB R15, -1(BX)(R11*1)
-	ADDQ R11, SI
-	ADDQ R11, BX
-	JMP  copy_1_end
-
-copy_1_move_3:
-	MOVW (SI), R14
-	MOVB 2(SI), R15
-	MOVW R14, (BX)
-	MOVB R15, 2(BX)
-	ADDQ R11, SI
-	ADDQ R11, BX
-	JMP  copy_1_end
-
-copy_1_move_4through7:
-	MOVL (SI), R14
-	MOVL -4(SI)(R11*1), R15
-	MOVL R14, (BX)
-	MOVL R15, -4(BX)(R11*1)
-	ADDQ R11, SI
-	ADDQ R11, BX
-	JMP  copy_1_end
-
-copy_1_move_8through16:
-	MOVQ (SI), R14
-	MOVQ -8(SI)(R11*1), R15
-	MOVQ R14, (BX)
-	MOVQ R15, -8(BX)(R11*1)
-	ADDQ R11, SI
-	ADDQ R11, BX
-
-copy_1_end:
-	ADDQ R11, DI
-
-	// Malformed input if seq.mo > t+len(hist) || seq.mo > s.windowSize)
-check_offset:
-	LEAQ (DI)(R10*1), R11
-	CMPQ R12, R11
-	JG   error_match_off_too_big
-	CMPQ R12, R8
-	JG   error_match_off_too_big
-
-	// Copy match from history
-	MOVQ R12, R11
-	SUBQ DI, R11
-	JLS  copy_match
-	MOVQ R9, R14
-	SUBQ R11, R14
-	CMPQ R13, R11
-	JG   copy_all_from_history
-	MOVQ R13, R11
-	SUBQ $0x10, R11
-	JB   copy_4_small
-
-copy_4_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (BX)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, BX
-	SUBQ   $0x10, R11
-	JAE    copy_4_loop
-	LEAQ   16(R14)(R11*1), R14
-	LEAQ   16(BX)(R11*1), BX
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(BX)
-	JMP    copy_4_end
-
-copy_4_small:
-	CMPQ R13, $0x03
-	JE   copy_4_move_3
-	CMPQ R13, $0x08
-	JB   copy_4_move_4through7
-	JMP  copy_4_move_8through16
-
-copy_4_move_3:
-	MOVW (R14), R11
-	MOVB 2(R14), R12
-	MOVW R11, (BX)
-	MOVB R12, 2(BX)
-	ADDQ R13, R14
-	ADDQ R13, BX
-	JMP  copy_4_end
-
-copy_4_move_4through7:
-	MOVL (R14), R11
-	MOVL -4(R14)(R13*1), R12
-	MOVL R11, (BX)
-	MOVL R12, -4(BX)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, BX
-	JMP  copy_4_end
-
-copy_4_move_8through16:
-	MOVQ (R14), R11
-	MOVQ -8(R14)(R13*1), R12
-	MOVQ R11, (BX)
-	MOVQ R12, -8(BX)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, BX
-
-copy_4_end:
-	ADDQ R13, DI
-	ADDQ $0x18, AX
-	INCQ DX
-	CMPQ DX, CX
-	JB   main_loop
-	JMP  loop_finished
-
-copy_all_from_history:
-	MOVQ R11, R15
-	SUBQ $0x10, R15
-	JB   copy_5_small
-
-copy_5_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (BX)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, BX
-	SUBQ   $0x10, R15
-	JAE    copy_5_loop
-	LEAQ   16(R14)(R15*1), R14
-	LEAQ   16(BX)(R15*1), BX
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(BX)
-	JMP    copy_5_end
-
-copy_5_small:
-	CMPQ R11, $0x03
-	JE   copy_5_move_3
-	JB   copy_5_move_1or2
-	CMPQ R11, $0x08
-	JB   copy_5_move_4through7
-	JMP  copy_5_move_8through16
-
-copy_5_move_1or2:
-	MOVB (R14), R15
-	MOVB -1(R14)(R11*1), BP
-	MOVB R15, (BX)
-	MOVB BP, -1(BX)(R11*1)
-	ADDQ R11, R14
-	ADDQ R11, BX
-	JMP  copy_5_end
-
-copy_5_move_3:
-	MOVW (R14), R15
-	MOVB 2(R14), BP
-	MOVW R15, (BX)
-	MOVB BP, 2(BX)
-	ADDQ R11, R14
-	ADDQ R11, BX
-	JMP  copy_5_end
-
-copy_5_move_4through7:
-	MOVL (R14), R15
-	MOVL -4(R14)(R11*1), BP
-	MOVL R15, (BX)
-	MOVL BP, -4(BX)(R11*1)
-	ADDQ R11, R14
-	ADDQ R11, BX
-	JMP  copy_5_end
-
-copy_5_move_8through16:
-	MOVQ (R14), R15
-	MOVQ -8(R14)(R11*1), BP
-	MOVQ R15, (BX)
-	MOVQ BP, -8(BX)(R11*1)
-	ADDQ R11, R14
-	ADDQ R11, BX
-
-copy_5_end:
-	ADDQ R11, DI
-	SUBQ R11, R13
-
-	// Copy match from the current buffer
-copy_match:
-	MOVQ BX, R11
-	SUBQ R12, R11
-
-	// ml <= mo
-	CMPQ R13, R12
-	JA   copy_overlapping_match
-
-	// Copy non-overlapping match
-	ADDQ R13, DI
-	MOVQ R13, R12
-	SUBQ $0x10, R12
-	JB   copy_2_small
-
-copy_2_loop:
-	MOVUPS (R11), X0
-	MOVUPS X0, (BX)
-	ADDQ   $0x10, R11
-	ADDQ   $0x10, BX
-	SUBQ   $0x10, R12
-	JAE    copy_2_loop
-	LEAQ   16(R11)(R12*1), R11
-	LEAQ   16(BX)(R12*1), BX
-	MOVUPS -16(R11), X0
-	MOVUPS X0, -16(BX)
-	JMP    copy_2_end
-
-copy_2_small:
-	CMPQ R13, $0x03
-	JE   copy_2_move_3
-	JB   copy_2_move_1or2
-	CMPQ R13, $0x08
-	JB   copy_2_move_4through7
-	JMP  copy_2_move_8through16
-
-copy_2_move_1or2:
-	MOVB (R11), R12
-	MOVB -1(R11)(R13*1), R14
-	MOVB R12, (BX)
-	MOVB R14, -1(BX)(R13*1)
-	ADDQ R13, R11
-	ADDQ R13, BX
-	JMP  copy_2_end
-
-copy_2_move_3:
-	MOVW (R11), R12
-	MOVB 2(R11), R14
-	MOVW R12, (BX)
-	MOVB R14, 2(BX)
-	ADDQ R13, R11
-	ADDQ R13, BX
-	JMP  copy_2_end
-
-copy_2_move_4through7:
-	MOVL (R11), R12
-	MOVL -4(R11)(R13*1), R14
-	MOVL R12, (BX)
-	MOVL R14, -4(BX)(R13*1)
-	ADDQ R13, R11
-	ADDQ R13, BX
-	JMP  copy_2_end
-
-copy_2_move_8through16:
-	MOVQ (R11), R12
-	MOVQ -8(R11)(R13*1), R14
-	MOVQ R12, (BX)
-	MOVQ R14, -8(BX)(R13*1)
-	ADDQ R13, R11
-	ADDQ R13, BX
-
-copy_2_end:
-	JMP handle_loop
-
-	// Copy overlapping match
-copy_overlapping_match:
-	ADDQ R13, DI
-
-copy_slow_3:
-	MOVB (R11), R12
-	MOVB R12, (BX)
-	INCQ R11
-	INCQ BX
-	DECQ R13
-	JNZ  copy_slow_3
-
-handle_loop:
-	ADDQ $0x18, AX
-	INCQ DX
-	CMPQ DX, CX
-	JB   main_loop
-
-loop_finished:
-	// Return value
-	MOVB $0x01, ret+8(FP)
-
-	// Update the context
-	MOVQ ctx+0(FP), AX
-	MOVQ DX, 24(AX)
-	MOVQ DI, 104(AX)
-	SUBQ 80(AX), SI
-	MOVQ SI, 112(AX)
-	RET
-
-error_match_off_too_big:
-	// Return value
-	MOVB $0x00, ret+8(FP)
-
-	// Update the context
-	MOVQ ctx+0(FP), AX
-	MOVQ DX, 24(AX)
-	MOVQ DI, 104(AX)
-	SUBQ 80(AX), SI
-	MOVQ SI, 112(AX)
-	RET
-
-empty_seqs:
-	// Return value
-	MOVB $0x01, ret+8(FP)
-	RET
-
-// func sequenceDecs_decodeSync_amd64(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-// Requires: CMOV, SSE
-TEXT ·sequenceDecs_decodeSync_amd64(SB), $64-32
-	MOVQ    br+8(FP), CX
-	MOVQ    24(CX), DX
-	MOVBQZX 32(CX), BX
-	MOVQ    (CX), AX
-	MOVQ    8(CX), SI
-	ADDQ    SI, AX
-	MOVQ    AX, (SP)
-	MOVQ    ctx+16(FP), AX
-	MOVQ    72(AX), DI
-	MOVQ    80(AX), R8
-	MOVQ    88(AX), R9
-	XORQ    CX, CX
-	MOVQ    CX, 8(SP)
-	MOVQ    CX, 16(SP)
-	MOVQ    CX, 24(SP)
-	MOVQ    112(AX), R10
-	MOVQ    128(AX), CX
-	MOVQ    CX, 32(SP)
-	MOVQ    144(AX), R11
-	MOVQ    136(AX), R12
-	MOVQ    200(AX), CX
-	MOVQ    CX, 56(SP)
-	MOVQ    176(AX), CX
-	MOVQ    CX, 48(SP)
-	MOVQ    184(AX), AX
-	MOVQ    AX, 40(SP)
-	MOVQ    40(SP), AX
-	ADDQ    AX, 48(SP)
-
-	// Calculate pointer to s.out[cap(s.out)] (a past-end pointer)
-	ADDQ R10, 32(SP)
-
-	// outBase += outPosition
-	ADDQ R12, R10
-
-sequenceDecs_decodeSync_amd64_main_loop:
-	MOVQ (SP), R13
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ SI, $0x08
-	JL   sequenceDecs_decodeSync_amd64_fill_byte_by_byte
-	MOVQ BX, AX
-	SHRQ $0x03, AX
-	SUBQ AX, R13
-	MOVQ (R13), DX
-	SUBQ AX, SI
-	ANDQ $0x07, BX
-	JMP  sequenceDecs_decodeSync_amd64_fill_end
-
-sequenceDecs_decodeSync_amd64_fill_byte_by_byte:
-	CMPQ    SI, $0x00
-	JLE     sequenceDecs_decodeSync_amd64_fill_check_overread
-	CMPQ    BX, $0x07
-	JLE     sequenceDecs_decodeSync_amd64_fill_end
-	SHLQ    $0x08, DX
-	SUBQ    $0x01, R13
-	SUBQ    $0x01, SI
-	SUBQ    $0x08, BX
-	MOVBQZX (R13), AX
-	ORQ     AX, DX
-	JMP     sequenceDecs_decodeSync_amd64_fill_byte_by_byte
-
-sequenceDecs_decodeSync_amd64_fill_check_overread:
-	CMPQ BX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_amd64_fill_end:
-	// Update offset
-	MOVQ  R9, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R14
-	SHLQ  CL, R14
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decodeSync_amd64_of_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decodeSync_amd64_of_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decodeSync_amd64_of_update_zero
-	NEGQ  CX
-	SHRQ  CL, R14
-	ADDQ  R14, AX
-
-sequenceDecs_decodeSync_amd64_of_update_zero:
-	MOVQ AX, 8(SP)
-
-	// Update match length
-	MOVQ  R8, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R14
-	SHLQ  CL, R14
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decodeSync_amd64_ml_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decodeSync_amd64_ml_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decodeSync_amd64_ml_update_zero
-	NEGQ  CX
-	SHRQ  CL, R14
-	ADDQ  R14, AX
-
-sequenceDecs_decodeSync_amd64_ml_update_zero:
-	MOVQ AX, 16(SP)
-
-	// Fill bitreader to have enough for the remaining
-	CMPQ SI, $0x08
-	JL   sequenceDecs_decodeSync_amd64_fill_2_byte_by_byte
-	MOVQ BX, AX
-	SHRQ $0x03, AX
-	SUBQ AX, R13
-	MOVQ (R13), DX
-	SUBQ AX, SI
-	ANDQ $0x07, BX
-	JMP  sequenceDecs_decodeSync_amd64_fill_2_end
-
-sequenceDecs_decodeSync_amd64_fill_2_byte_by_byte:
-	CMPQ    SI, $0x00
-	JLE     sequenceDecs_decodeSync_amd64_fill_2_check_overread
-	CMPQ    BX, $0x07
-	JLE     sequenceDecs_decodeSync_amd64_fill_2_end
-	SHLQ    $0x08, DX
-	SUBQ    $0x01, R13
-	SUBQ    $0x01, SI
-	SUBQ    $0x08, BX
-	MOVBQZX (R13), AX
-	ORQ     AX, DX
-	JMP     sequenceDecs_decodeSync_amd64_fill_2_byte_by_byte
-
-sequenceDecs_decodeSync_amd64_fill_2_check_overread:
-	CMPQ BX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_amd64_fill_2_end:
-	// Update literal length
-	MOVQ  DI, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R14
-	SHLQ  CL, R14
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decodeSync_amd64_ll_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decodeSync_amd64_ll_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decodeSync_amd64_ll_update_zero
-	NEGQ  CX
-	SHRQ  CL, R14
-	ADDQ  R14, AX
-
-sequenceDecs_decodeSync_amd64_ll_update_zero:
-	MOVQ AX, 24(SP)
-
-	// Fill bitreader for state updates
-	MOVQ    R13, (SP)
-	MOVQ    R9, AX
-	SHRQ    $0x08, AX
-	MOVBQZX AL, AX
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decodeSync_amd64_skip_update
-
-	// Update Literal Length State
-	MOVBQZX DI, R13
-	SHRL    $0x10, DI
-	LEAQ    (BX)(R13*1), CX
-	MOVQ    DX, R14
-	MOVQ    CX, BX
-	ROLQ    CL, R14
-	MOVL    $0x00000001, R15
-	MOVB    R13, CL
-	SHLL    CL, R15
-	DECL    R15
-	ANDQ    R15, R14
-	ADDQ    R14, DI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Match Length State
-	MOVBQZX R8, R13
-	SHRL    $0x10, R8
-	LEAQ    (BX)(R13*1), CX
-	MOVQ    DX, R14
-	MOVQ    CX, BX
-	ROLQ    CL, R14
-	MOVL    $0x00000001, R15
-	MOVB    R13, CL
-	SHLL    CL, R15
-	DECL    R15
-	ANDQ    R15, R14
-	ADDQ    R14, R8
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Offset State
-	MOVBQZX R9, R13
-	SHRL    $0x10, R9
-	LEAQ    (BX)(R13*1), CX
-	MOVQ    DX, R14
-	MOVQ    CX, BX
-	ROLQ    CL, R14
-	MOVL    $0x00000001, R15
-	MOVB    R13, CL
-	SHLL    CL, R15
-	DECL    R15
-	ANDQ    R15, R14
-	ADDQ    R14, R9
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R9*8), R9
-
-sequenceDecs_decodeSync_amd64_skip_update:
-	// Adjust offset
-	MOVQ   s+0(FP), CX
-	MOVQ   8(SP), R13
-	CMPQ   AX, $0x01
-	JBE    sequenceDecs_decodeSync_amd64_adjust_offsetB_1_or_0
-	MOVUPS 144(CX), X0
-	MOVQ   R13, 144(CX)
-	MOVUPS X0, 152(CX)
-	JMP    sequenceDecs_decodeSync_amd64_after_adjust
-
-sequenceDecs_decodeSync_amd64_adjust_offsetB_1_or_0:
-	CMPQ 24(SP), $0x00000000
-	JNE  sequenceDecs_decodeSync_amd64_adjust_offset_maybezero
-	INCQ R13
-	JMP  sequenceDecs_decodeSync_amd64_adjust_offset_nonzero
-
-sequenceDecs_decodeSync_amd64_adjust_offset_maybezero:
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_amd64_adjust_offset_nonzero
-	MOVQ  144(CX), R13
-	JMP   sequenceDecs_decodeSync_amd64_after_adjust
-
-sequenceDecs_decodeSync_amd64_adjust_offset_nonzero:
-	MOVQ    R13, AX
-	XORQ    R14, R14
-	MOVQ    $-1, R15
-	CMPQ    R13, $0x03
-	CMOVQEQ R14, AX
-	CMOVQEQ R15, R14
-	ADDQ    144(CX)(AX*8), R14
-	JNZ     sequenceDecs_decodeSync_amd64_adjust_temp_valid
-	MOVQ    $0x00000001, R14
-
-sequenceDecs_decodeSync_amd64_adjust_temp_valid:
-	CMPQ R13, $0x01
-	JZ   sequenceDecs_decodeSync_amd64_adjust_skip
-	MOVQ 152(CX), AX
-	MOVQ AX, 160(CX)
-
-sequenceDecs_decodeSync_amd64_adjust_skip:
-	MOVQ 144(CX), AX
-	MOVQ AX, 152(CX)
-	MOVQ R14, 144(CX)
-	MOVQ R14, R13
-
-sequenceDecs_decodeSync_amd64_after_adjust:
-	MOVQ R13, 8(SP)
-
-	// Check values
-	MOVQ  16(SP), AX
-	MOVQ  24(SP), CX
-	LEAQ  (AX)(CX*1), R14
-	MOVQ  s+0(FP), R15
-	ADDQ  R14, 256(R15)
-	MOVQ  ctx+16(FP), R14
-	SUBQ  CX, 104(R14)
-	JS    error_not_enough_literals
-	CMPQ  AX, $0x00020002
-	JA    sequenceDecs_decodeSync_amd64_error_match_len_too_big
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_amd64_match_len_ofs_ok
-	TESTQ AX, AX
-	JNZ   sequenceDecs_decodeSync_amd64_error_match_len_ofs_mismatch
-
-sequenceDecs_decodeSync_amd64_match_len_ofs_ok:
-	MOVQ 24(SP), AX
-	MOVQ 8(SP), CX
-	MOVQ 16(SP), R13
-
-	// Check if we have enough space in s.out
-	LEAQ (AX)(R13*1), R14
-	ADDQ R10, R14
-	CMPQ R14, 32(SP)
-	JA   error_not_enough_space
-
-	// Copy literals
-	TESTQ AX, AX
-	JZ    check_offset
-	XORQ  R14, R14
-
-copy_1:
-	MOVUPS (R11)(R14*1), X0
-	MOVUPS X0, (R10)(R14*1)
-	ADDQ   $0x10, R14
-	CMPQ   R14, AX
-	JB     copy_1
-	ADDQ   AX, R11
-	ADDQ   AX, R10
-	ADDQ   AX, R12
-
-	// Malformed input if seq.mo > t+len(hist) || seq.mo > s.windowSize)
-check_offset:
-	MOVQ R12, AX
-	ADDQ 40(SP), AX
-	CMPQ CX, AX
-	JG   error_match_off_too_big
-	CMPQ CX, 56(SP)
-	JG   error_match_off_too_big
-
-	// Copy match from history
-	MOVQ CX, AX
-	SUBQ R12, AX
-	JLS  copy_match
-	MOVQ 48(SP), R14
-	SUBQ AX, R14
-	CMPQ R13, AX
-	JG   copy_all_from_history
-	MOVQ R13, AX
-	SUBQ $0x10, AX
-	JB   copy_4_small
-
-copy_4_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R10)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R10
-	SUBQ   $0x10, AX
-	JAE    copy_4_loop
-	LEAQ   16(R14)(AX*1), R14
-	LEAQ   16(R10)(AX*1), R10
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R10)
-	JMP    copy_4_end
-
-copy_4_small:
-	CMPQ R13, $0x03
-	JE   copy_4_move_3
-	CMPQ R13, $0x08
-	JB   copy_4_move_4through7
-	JMP  copy_4_move_8through16
-
-copy_4_move_3:
-	MOVW (R14), AX
-	MOVB 2(R14), CL
-	MOVW AX, (R10)
-	MOVB CL, 2(R10)
-	ADDQ R13, R14
-	ADDQ R13, R10
-	JMP  copy_4_end
-
-copy_4_move_4through7:
-	MOVL (R14), AX
-	MOVL -4(R14)(R13*1), CX
-	MOVL AX, (R10)
-	MOVL CX, -4(R10)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R10
-	JMP  copy_4_end
-
-copy_4_move_8through16:
-	MOVQ (R14), AX
-	MOVQ -8(R14)(R13*1), CX
-	MOVQ AX, (R10)
-	MOVQ CX, -8(R10)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R10
-
-copy_4_end:
-	ADDQ R13, R12
-	JMP  handle_loop
-	JMP loop_finished
-
-copy_all_from_history:
-	MOVQ AX, R15
-	SUBQ $0x10, R15
-	JB   copy_5_small
-
-copy_5_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R10)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R10
-	SUBQ   $0x10, R15
-	JAE    copy_5_loop
-	LEAQ   16(R14)(R15*1), R14
-	LEAQ   16(R10)(R15*1), R10
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R10)
-	JMP    copy_5_end
-
-copy_5_small:
-	CMPQ AX, $0x03
-	JE   copy_5_move_3
-	JB   copy_5_move_1or2
-	CMPQ AX, $0x08
-	JB   copy_5_move_4through7
-	JMP  copy_5_move_8through16
-
-copy_5_move_1or2:
-	MOVB (R14), R15
-	MOVB -1(R14)(AX*1), BP
-	MOVB R15, (R10)
-	MOVB BP, -1(R10)(AX*1)
-	ADDQ AX, R14
-	ADDQ AX, R10
-	JMP  copy_5_end
-
-copy_5_move_3:
-	MOVW (R14), R15
-	MOVB 2(R14), BP
-	MOVW R15, (R10)
-	MOVB BP, 2(R10)
-	ADDQ AX, R14
-	ADDQ AX, R10
-	JMP  copy_5_end
-
-copy_5_move_4through7:
-	MOVL (R14), R15
-	MOVL -4(R14)(AX*1), BP
-	MOVL R15, (R10)
-	MOVL BP, -4(R10)(AX*1)
-	ADDQ AX, R14
-	ADDQ AX, R10
-	JMP  copy_5_end
-
-copy_5_move_8through16:
-	MOVQ (R14), R15
-	MOVQ -8(R14)(AX*1), BP
-	MOVQ R15, (R10)
-	MOVQ BP, -8(R10)(AX*1)
-	ADDQ AX, R14
-	ADDQ AX, R10
-
-copy_5_end:
-	ADDQ AX, R12
-	SUBQ AX, R13
-
-	// Copy match from the current buffer
-copy_match:
-	MOVQ R10, AX
-	SUBQ CX, AX
-
-	// ml <= mo
-	CMPQ R13, CX
-	JA   copy_overlapping_match
-
-	// Copy non-overlapping match
-	ADDQ R13, R12
-	MOVQ R10, CX
-	ADDQ R13, R10
-
-copy_2:
-	MOVUPS (AX), X0
-	MOVUPS X0, (CX)
-	ADDQ   $0x10, AX
-	ADDQ   $0x10, CX
-	SUBQ   $0x10, R13
-	JHI    copy_2
-	JMP    handle_loop
-
-	// Copy overlapping match
-copy_overlapping_match:
-	ADDQ R13, R12
-
-copy_slow_3:
-	MOVB (AX), CL
-	MOVB CL, (R10)
-	INCQ AX
-	INCQ R10
-	DECQ R13
-	JNZ  copy_slow_3
-
-handle_loop:
-	MOVQ ctx+16(FP), AX
-	DECQ 96(AX)
-	JNS  sequenceDecs_decodeSync_amd64_main_loop
-
-loop_finished:
-	MOVQ br+8(FP), AX
-	MOVQ DX, 24(AX)
-	MOVB BL, 32(AX)
-	MOVQ SI, 8(AX)
-
-	// Update the context
-	MOVQ ctx+16(FP), AX
-	MOVQ R12, 136(AX)
-	MOVQ 144(AX), CX
-	SUBQ CX, R11
-	MOVQ R11, 168(AX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decodeSync_amd64_error_match_len_ofs_mismatch:
-	MOVQ 16(SP), AX
-	MOVQ ctx+16(FP), CX
-	MOVQ AX, 216(CX)
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decodeSync_amd64_error_match_len_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-error_match_off_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 8(SP), CX
-	MOVQ CX, 224(AX)
-	MOVQ R12, 136(AX)
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-	// Return with not enough output space error
-error_not_enough_space:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ R12, 136(AX)
-	MOVQ $0x00000005, ret+24(FP)
-	RET
-
-// func sequenceDecs_decodeSync_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-// Requires: BMI, BMI2, CMOV, SSE
-TEXT ·sequenceDecs_decodeSync_bmi2(SB), $64-32
-	MOVQ    br+8(FP), BX
-	MOVQ    24(BX), AX
-	MOVBQZX 32(BX), DX
-	MOVQ    (BX), CX
-	MOVQ    8(BX), BX
-	ADDQ    BX, CX
-	MOVQ    CX, (SP)
-	MOVQ    ctx+16(FP), CX
-	MOVQ    72(CX), SI
-	MOVQ    80(CX), DI
-	MOVQ    88(CX), R8
-	XORQ    R9, R9
-	MOVQ    R9, 8(SP)
-	MOVQ    R9, 16(SP)
-	MOVQ    R9, 24(SP)
-	MOVQ    112(CX), R9
-	MOVQ    128(CX), R10
-	MOVQ    R10, 32(SP)
-	MOVQ    144(CX), R10
-	MOVQ    136(CX), R11
-	MOVQ    200(CX), R12
-	MOVQ    R12, 56(SP)
-	MOVQ    176(CX), R12
-	MOVQ    R12, 48(SP)
-	MOVQ    184(CX), CX
-	MOVQ    CX, 40(SP)
-	MOVQ    40(SP), CX
-	ADDQ    CX, 48(SP)
-
-	// Calculate pointer to s.out[cap(s.out)] (a past-end pointer)
-	ADDQ R9, 32(SP)
-
-	// outBase += outPosition
-	ADDQ R11, R9
-
-sequenceDecs_decodeSync_bmi2_main_loop:
-	MOVQ (SP), R12
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ BX, $0x08
-	JL   sequenceDecs_decodeSync_bmi2_fill_byte_by_byte
-	MOVQ DX, CX
-	SHRQ $0x03, CX
-	SUBQ CX, R12
-	MOVQ (R12), AX
-	SUBQ CX, BX
-	ANDQ $0x07, DX
-	JMP  sequenceDecs_decodeSync_bmi2_fill_end
-
-sequenceDecs_decodeSync_bmi2_fill_byte_by_byte:
-	CMPQ    BX, $0x00
-	JLE     sequenceDecs_decodeSync_bmi2_fill_check_overread
-	CMPQ    DX, $0x07
-	JLE     sequenceDecs_decodeSync_bmi2_fill_end
-	SHLQ    $0x08, AX
-	SUBQ    $0x01, R12
-	SUBQ    $0x01, BX
-	SUBQ    $0x08, DX
-	MOVBQZX (R12), CX
-	ORQ     CX, AX
-	JMP     sequenceDecs_decodeSync_bmi2_fill_byte_by_byte
-
-sequenceDecs_decodeSync_bmi2_fill_check_overread:
-	CMPQ DX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_bmi2_fill_end:
-	// Update offset
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, R8, R13
-	MOVQ   AX, R14
-	LEAQ   (DX)(R13*1), CX
-	ROLQ   CL, R14
-	BZHIQ  R13, R14, R14
-	MOVQ   CX, DX
-	MOVQ   R8, CX
-	SHRQ   $0x20, CX
-	ADDQ   R14, CX
-	MOVQ   CX, 8(SP)
-
-	// Update match length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, DI, R13
-	MOVQ   AX, R14
-	LEAQ   (DX)(R13*1), CX
-	ROLQ   CL, R14
-	BZHIQ  R13, R14, R14
-	MOVQ   CX, DX
-	MOVQ   DI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R14, CX
-	MOVQ   CX, 16(SP)
-
-	// Fill bitreader to have enough for the remaining
-	CMPQ BX, $0x08
-	JL   sequenceDecs_decodeSync_bmi2_fill_2_byte_by_byte
-	MOVQ DX, CX
-	SHRQ $0x03, CX
-	SUBQ CX, R12
-	MOVQ (R12), AX
-	SUBQ CX, BX
-	ANDQ $0x07, DX
-	JMP  sequenceDecs_decodeSync_bmi2_fill_2_end
-
-sequenceDecs_decodeSync_bmi2_fill_2_byte_by_byte:
-	CMPQ    BX, $0x00
-	JLE     sequenceDecs_decodeSync_bmi2_fill_2_check_overread
-	CMPQ    DX, $0x07
-	JLE     sequenceDecs_decodeSync_bmi2_fill_2_end
-	SHLQ    $0x08, AX
-	SUBQ    $0x01, R12
-	SUBQ    $0x01, BX
-	SUBQ    $0x08, DX
-	MOVBQZX (R12), CX
-	ORQ     CX, AX
-	JMP     sequenceDecs_decodeSync_bmi2_fill_2_byte_by_byte
-
-sequenceDecs_decodeSync_bmi2_fill_2_check_overread:
-	CMPQ DX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_bmi2_fill_2_end:
-	// Update literal length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, SI, R13
-	MOVQ   AX, R14
-	LEAQ   (DX)(R13*1), CX
-	ROLQ   CL, R14
-	BZHIQ  R13, R14, R14
-	MOVQ   CX, DX
-	MOVQ   SI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R14, CX
-	MOVQ   CX, 24(SP)
-
-	// Fill bitreader for state updates
-	MOVQ    R12, (SP)
-	MOVQ    $0x00000808, CX
-	BEXTRQ  CX, R8, R12
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decodeSync_bmi2_skip_update
-	LEAQ    (SI)(DI*1), R13
-	ADDQ    R8, R13
-	MOVBQZX R13, R13
-	LEAQ    (DX)(R13*1), CX
-	MOVQ    AX, R14
-	MOVQ    CX, DX
-	ROLQ    CL, R14
-	BZHIQ   R13, R14, R14
-
-	// Update Offset State
-	BZHIQ R8, R14, CX
-	SHRXQ R8, R14, R14
-	SHRL  $0x10, R8
-	ADDQ  CX, R8
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Match Length State
-	BZHIQ DI, R14, CX
-	SHRXQ DI, R14, R14
-	SHRL  $0x10, DI
-	ADDQ  CX, DI
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Literal Length State
-	BZHIQ SI, R14, CX
-	SHRL  $0x10, SI
-	ADDQ  CX, SI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(SI*8), SI
-
-sequenceDecs_decodeSync_bmi2_skip_update:
-	// Adjust offset
-	MOVQ   s+0(FP), CX
-	MOVQ   8(SP), R13
-	CMPQ   R12, $0x01
-	JBE    sequenceDecs_decodeSync_bmi2_adjust_offsetB_1_or_0
-	MOVUPS 144(CX), X0
-	MOVQ   R13, 144(CX)
-	MOVUPS X0, 152(CX)
-	JMP    sequenceDecs_decodeSync_bmi2_after_adjust
-
-sequenceDecs_decodeSync_bmi2_adjust_offsetB_1_or_0:
-	CMPQ 24(SP), $0x00000000
-	JNE  sequenceDecs_decodeSync_bmi2_adjust_offset_maybezero
-	INCQ R13
-	JMP  sequenceDecs_decodeSync_bmi2_adjust_offset_nonzero
-
-sequenceDecs_decodeSync_bmi2_adjust_offset_maybezero:
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_bmi2_adjust_offset_nonzero
-	MOVQ  144(CX), R13
-	JMP   sequenceDecs_decodeSync_bmi2_after_adjust
-
-sequenceDecs_decodeSync_bmi2_adjust_offset_nonzero:
-	MOVQ    R13, R12
-	XORQ    R14, R14
-	MOVQ    $-1, R15
-	CMPQ    R13, $0x03
-	CMOVQEQ R14, R12
-	CMOVQEQ R15, R14
-	ADDQ    144(CX)(R12*8), R14
-	JNZ     sequenceDecs_decodeSync_bmi2_adjust_temp_valid
-	MOVQ    $0x00000001, R14
-
-sequenceDecs_decodeSync_bmi2_adjust_temp_valid:
-	CMPQ R13, $0x01
-	JZ   sequenceDecs_decodeSync_bmi2_adjust_skip
-	MOVQ 152(CX), R12
-	MOVQ R12, 160(CX)
-
-sequenceDecs_decodeSync_bmi2_adjust_skip:
-	MOVQ 144(CX), R12
-	MOVQ R12, 152(CX)
-	MOVQ R14, 144(CX)
-	MOVQ R14, R13
-
-sequenceDecs_decodeSync_bmi2_after_adjust:
-	MOVQ R13, 8(SP)
-
-	// Check values
-	MOVQ  16(SP), CX
-	MOVQ  24(SP), R12
-	LEAQ  (CX)(R12*1), R14
-	MOVQ  s+0(FP), R15
-	ADDQ  R14, 256(R15)
-	MOVQ  ctx+16(FP), R14
-	SUBQ  R12, 104(R14)
-	JS    error_not_enough_literals
-	CMPQ  CX, $0x00020002
-	JA    sequenceDecs_decodeSync_bmi2_error_match_len_too_big
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_bmi2_match_len_ofs_ok
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decodeSync_bmi2_error_match_len_ofs_mismatch
-
-sequenceDecs_decodeSync_bmi2_match_len_ofs_ok:
-	MOVQ 24(SP), CX
-	MOVQ 8(SP), R12
-	MOVQ 16(SP), R13
-
-	// Check if we have enough space in s.out
-	LEAQ (CX)(R13*1), R14
-	ADDQ R9, R14
-	CMPQ R14, 32(SP)
-	JA   error_not_enough_space
-
-	// Copy literals
-	TESTQ CX, CX
-	JZ    check_offset
-	XORQ  R14, R14
-
-copy_1:
-	MOVUPS (R10)(R14*1), X0
-	MOVUPS X0, (R9)(R14*1)
-	ADDQ   $0x10, R14
-	CMPQ   R14, CX
-	JB     copy_1
-	ADDQ   CX, R10
-	ADDQ   CX, R9
-	ADDQ   CX, R11
-
-	// Malformed input if seq.mo > t+len(hist) || seq.mo > s.windowSize)
-check_offset:
-	MOVQ R11, CX
-	ADDQ 40(SP), CX
-	CMPQ R12, CX
-	JG   error_match_off_too_big
-	CMPQ R12, 56(SP)
-	JG   error_match_off_too_big
-
-	// Copy match from history
-	MOVQ R12, CX
-	SUBQ R11, CX
-	JLS  copy_match
-	MOVQ 48(SP), R14
-	SUBQ CX, R14
-	CMPQ R13, CX
-	JG   copy_all_from_history
-	MOVQ R13, CX
-	SUBQ $0x10, CX
-	JB   copy_4_small
-
-copy_4_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R9)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R9
-	SUBQ   $0x10, CX
-	JAE    copy_4_loop
-	LEAQ   16(R14)(CX*1), R14
-	LEAQ   16(R9)(CX*1), R9
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R9)
-	JMP    copy_4_end
-
-copy_4_small:
-	CMPQ R13, $0x03
-	JE   copy_4_move_3
-	CMPQ R13, $0x08
-	JB   copy_4_move_4through7
-	JMP  copy_4_move_8through16
-
-copy_4_move_3:
-	MOVW (R14), CX
-	MOVB 2(R14), R12
-	MOVW CX, (R9)
-	MOVB R12, 2(R9)
-	ADDQ R13, R14
-	ADDQ R13, R9
-	JMP  copy_4_end
-
-copy_4_move_4through7:
-	MOVL (R14), CX
-	MOVL -4(R14)(R13*1), R12
-	MOVL CX, (R9)
-	MOVL R12, -4(R9)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R9
-	JMP  copy_4_end
-
-copy_4_move_8through16:
-	MOVQ (R14), CX
-	MOVQ -8(R14)(R13*1), R12
-	MOVQ CX, (R9)
-	MOVQ R12, -8(R9)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R9
-
-copy_4_end:
-	ADDQ R13, R11
-	JMP  handle_loop
-	JMP loop_finished
-
-copy_all_from_history:
-	MOVQ CX, R15
-	SUBQ $0x10, R15
-	JB   copy_5_small
-
-copy_5_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R9)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R9
-	SUBQ   $0x10, R15
-	JAE    copy_5_loop
-	LEAQ   16(R14)(R15*1), R14
-	LEAQ   16(R9)(R15*1), R9
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R9)
-	JMP    copy_5_end
-
-copy_5_small:
-	CMPQ CX, $0x03
-	JE   copy_5_move_3
-	JB   copy_5_move_1or2
-	CMPQ CX, $0x08
-	JB   copy_5_move_4through7
-	JMP  copy_5_move_8through16
-
-copy_5_move_1or2:
-	MOVB (R14), R15
-	MOVB -1(R14)(CX*1), BP
-	MOVB R15, (R9)
-	MOVB BP, -1(R9)(CX*1)
-	ADDQ CX, R14
-	ADDQ CX, R9
-	JMP  copy_5_end
-
-copy_5_move_3:
-	MOVW (R14), R15
-	MOVB 2(R14), BP
-	MOVW R15, (R9)
-	MOVB BP, 2(R9)
-	ADDQ CX, R14
-	ADDQ CX, R9
-	JMP  copy_5_end
-
-copy_5_move_4through7:
-	MOVL (R14), R15
-	MOVL -4(R14)(CX*1), BP
-	MOVL R15, (R9)
-	MOVL BP, -4(R9)(CX*1)
-	ADDQ CX, R14
-	ADDQ CX, R9
-	JMP  copy_5_end
-
-copy_5_move_8through16:
-	MOVQ (R14), R15
-	MOVQ -8(R14)(CX*1), BP
-	MOVQ R15, (R9)
-	MOVQ BP, -8(R9)(CX*1)
-	ADDQ CX, R14
-	ADDQ CX, R9
-
-copy_5_end:
-	ADDQ CX, R11
-	SUBQ CX, R13
-
-	// Copy match from the current buffer
-copy_match:
-	MOVQ R9, CX
-	SUBQ R12, CX
-
-	// ml <= mo
-	CMPQ R13, R12
-	JA   copy_overlapping_match
-
-	// Copy non-overlapping match
-	ADDQ R13, R11
-	MOVQ R9, R12
-	ADDQ R13, R9
-
-copy_2:
-	MOVUPS (CX), X0
-	MOVUPS X0, (R12)
-	ADDQ   $0x10, CX
-	ADDQ   $0x10, R12
-	SUBQ   $0x10, R13
-	JHI    copy_2
-	JMP    handle_loop
-
-	// Copy overlapping match
-copy_overlapping_match:
-	ADDQ R13, R11
-
-copy_slow_3:
-	MOVB (CX), R12
-	MOVB R12, (R9)
-	INCQ CX
-	INCQ R9
-	DECQ R13
-	JNZ  copy_slow_3
-
-handle_loop:
-	MOVQ ctx+16(FP), CX
-	DECQ 96(CX)
-	JNS  sequenceDecs_decodeSync_bmi2_main_loop
-
-loop_finished:
-	MOVQ br+8(FP), CX
-	MOVQ AX, 24(CX)
-	MOVB DL, 32(CX)
-	MOVQ BX, 8(CX)
-
-	// Update the context
-	MOVQ ctx+16(FP), AX
-	MOVQ R11, 136(AX)
-	MOVQ 144(AX), CX
-	SUBQ CX, R10
-	MOVQ R10, 168(AX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decodeSync_bmi2_error_match_len_ofs_mismatch:
-	MOVQ 16(SP), AX
-	MOVQ ctx+16(FP), CX
-	MOVQ AX, 216(CX)
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decodeSync_bmi2_error_match_len_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-error_match_off_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 8(SP), CX
-	MOVQ CX, 224(AX)
-	MOVQ R11, 136(AX)
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-	// Return with not enough output space error
-error_not_enough_space:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ R11, 136(AX)
-	MOVQ $0x00000005, ret+24(FP)
-	RET
-
-// func sequenceDecs_decodeSync_safe_amd64(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-// Requires: CMOV, SSE
-TEXT ·sequenceDecs_decodeSync_safe_amd64(SB), $64-32
-	MOVQ    br+8(FP), CX
-	MOVQ    24(CX), DX
-	MOVBQZX 32(CX), BX
-	MOVQ    (CX), AX
-	MOVQ    8(CX), SI
-	ADDQ    SI, AX
-	MOVQ    AX, (SP)
-	MOVQ    ctx+16(FP), AX
-	MOVQ    72(AX), DI
-	MOVQ    80(AX), R8
-	MOVQ    88(AX), R9
-	XORQ    CX, CX
-	MOVQ    CX, 8(SP)
-	MOVQ    CX, 16(SP)
-	MOVQ    CX, 24(SP)
-	MOVQ    112(AX), R10
-	MOVQ    128(AX), CX
-	MOVQ    CX, 32(SP)
-	MOVQ    144(AX), R11
-	MOVQ    136(AX), R12
-	MOVQ    200(AX), CX
-	MOVQ    CX, 56(SP)
-	MOVQ    176(AX), CX
-	MOVQ    CX, 48(SP)
-	MOVQ    184(AX), AX
-	MOVQ    AX, 40(SP)
-	MOVQ    40(SP), AX
-	ADDQ    AX, 48(SP)
-
-	// Calculate pointer to s.out[cap(s.out)] (a past-end pointer)
-	ADDQ R10, 32(SP)
-
-	// outBase += outPosition
-	ADDQ R12, R10
-
-sequenceDecs_decodeSync_safe_amd64_main_loop:
-	MOVQ (SP), R13
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ SI, $0x08
-	JL   sequenceDecs_decodeSync_safe_amd64_fill_byte_by_byte
-	MOVQ BX, AX
-	SHRQ $0x03, AX
-	SUBQ AX, R13
-	MOVQ (R13), DX
-	SUBQ AX, SI
-	ANDQ $0x07, BX
-	JMP  sequenceDecs_decodeSync_safe_amd64_fill_end
-
-sequenceDecs_decodeSync_safe_amd64_fill_byte_by_byte:
-	CMPQ    SI, $0x00
-	JLE     sequenceDecs_decodeSync_safe_amd64_fill_check_overread
-	CMPQ    BX, $0x07
-	JLE     sequenceDecs_decodeSync_safe_amd64_fill_end
-	SHLQ    $0x08, DX
-	SUBQ    $0x01, R13
-	SUBQ    $0x01, SI
-	SUBQ    $0x08, BX
-	MOVBQZX (R13), AX
-	ORQ     AX, DX
-	JMP     sequenceDecs_decodeSync_safe_amd64_fill_byte_by_byte
-
-sequenceDecs_decodeSync_safe_amd64_fill_check_overread:
-	CMPQ BX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_safe_amd64_fill_end:
-	// Update offset
-	MOVQ  R9, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R14
-	SHLQ  CL, R14
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decodeSync_safe_amd64_of_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decodeSync_safe_amd64_of_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decodeSync_safe_amd64_of_update_zero
-	NEGQ  CX
-	SHRQ  CL, R14
-	ADDQ  R14, AX
-
-sequenceDecs_decodeSync_safe_amd64_of_update_zero:
-	MOVQ AX, 8(SP)
-
-	// Update match length
-	MOVQ  R8, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R14
-	SHLQ  CL, R14
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decodeSync_safe_amd64_ml_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decodeSync_safe_amd64_ml_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decodeSync_safe_amd64_ml_update_zero
-	NEGQ  CX
-	SHRQ  CL, R14
-	ADDQ  R14, AX
-
-sequenceDecs_decodeSync_safe_amd64_ml_update_zero:
-	MOVQ AX, 16(SP)
-
-	// Fill bitreader to have enough for the remaining
-	CMPQ SI, $0x08
-	JL   sequenceDecs_decodeSync_safe_amd64_fill_2_byte_by_byte
-	MOVQ BX, AX
-	SHRQ $0x03, AX
-	SUBQ AX, R13
-	MOVQ (R13), DX
-	SUBQ AX, SI
-	ANDQ $0x07, BX
-	JMP  sequenceDecs_decodeSync_safe_amd64_fill_2_end
-
-sequenceDecs_decodeSync_safe_amd64_fill_2_byte_by_byte:
-	CMPQ    SI, $0x00
-	JLE     sequenceDecs_decodeSync_safe_amd64_fill_2_check_overread
-	CMPQ    BX, $0x07
-	JLE     sequenceDecs_decodeSync_safe_amd64_fill_2_end
-	SHLQ    $0x08, DX
-	SUBQ    $0x01, R13
-	SUBQ    $0x01, SI
-	SUBQ    $0x08, BX
-	MOVBQZX (R13), AX
-	ORQ     AX, DX
-	JMP     sequenceDecs_decodeSync_safe_amd64_fill_2_byte_by_byte
-
-sequenceDecs_decodeSync_safe_amd64_fill_2_check_overread:
-	CMPQ BX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_safe_amd64_fill_2_end:
-	// Update literal length
-	MOVQ  DI, AX
-	MOVQ  BX, CX
-	MOVQ  DX, R14
-	SHLQ  CL, R14
-	MOVB  AH, CL
-	SHRQ  $0x20, AX
-	TESTQ CX, CX
-	JZ    sequenceDecs_decodeSync_safe_amd64_ll_update_zero
-	ADDQ  CX, BX
-	CMPQ  BX, $0x40
-	JA    sequenceDecs_decodeSync_safe_amd64_ll_update_zero
-	CMPQ  CX, $0x40
-	JAE   sequenceDecs_decodeSync_safe_amd64_ll_update_zero
-	NEGQ  CX
-	SHRQ  CL, R14
-	ADDQ  R14, AX
-
-sequenceDecs_decodeSync_safe_amd64_ll_update_zero:
-	MOVQ AX, 24(SP)
-
-	// Fill bitreader for state updates
-	MOVQ    R13, (SP)
-	MOVQ    R9, AX
-	SHRQ    $0x08, AX
-	MOVBQZX AL, AX
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decodeSync_safe_amd64_skip_update
-
-	// Update Literal Length State
-	MOVBQZX DI, R13
-	SHRL    $0x10, DI
-	LEAQ    (BX)(R13*1), CX
-	MOVQ    DX, R14
-	MOVQ    CX, BX
-	ROLQ    CL, R14
-	MOVL    $0x00000001, R15
-	MOVB    R13, CL
-	SHLL    CL, R15
-	DECL    R15
-	ANDQ    R15, R14
-	ADDQ    R14, DI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Match Length State
-	MOVBQZX R8, R13
-	SHRL    $0x10, R8
-	LEAQ    (BX)(R13*1), CX
-	MOVQ    DX, R14
-	MOVQ    CX, BX
-	ROLQ    CL, R14
-	MOVL    $0x00000001, R15
-	MOVB    R13, CL
-	SHLL    CL, R15
-	DECL    R15
-	ANDQ    R15, R14
-	ADDQ    R14, R8
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Offset State
-	MOVBQZX R9, R13
-	SHRL    $0x10, R9
-	LEAQ    (BX)(R13*1), CX
-	MOVQ    DX, R14
-	MOVQ    CX, BX
-	ROLQ    CL, R14
-	MOVL    $0x00000001, R15
-	MOVB    R13, CL
-	SHLL    CL, R15
-	DECL    R15
-	ANDQ    R15, R14
-	ADDQ    R14, R9
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R9*8), R9
-
-sequenceDecs_decodeSync_safe_amd64_skip_update:
-	// Adjust offset
-	MOVQ   s+0(FP), CX
-	MOVQ   8(SP), R13
-	CMPQ   AX, $0x01
-	JBE    sequenceDecs_decodeSync_safe_amd64_adjust_offsetB_1_or_0
-	MOVUPS 144(CX), X0
-	MOVQ   R13, 144(CX)
-	MOVUPS X0, 152(CX)
-	JMP    sequenceDecs_decodeSync_safe_amd64_after_adjust
-
-sequenceDecs_decodeSync_safe_amd64_adjust_offsetB_1_or_0:
-	CMPQ 24(SP), $0x00000000
-	JNE  sequenceDecs_decodeSync_safe_amd64_adjust_offset_maybezero
-	INCQ R13
-	JMP  sequenceDecs_decodeSync_safe_amd64_adjust_offset_nonzero
-
-sequenceDecs_decodeSync_safe_amd64_adjust_offset_maybezero:
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_safe_amd64_adjust_offset_nonzero
-	MOVQ  144(CX), R13
-	JMP   sequenceDecs_decodeSync_safe_amd64_after_adjust
-
-sequenceDecs_decodeSync_safe_amd64_adjust_offset_nonzero:
-	MOVQ    R13, AX
-	XORQ    R14, R14
-	MOVQ    $-1, R15
-	CMPQ    R13, $0x03
-	CMOVQEQ R14, AX
-	CMOVQEQ R15, R14
-	ADDQ    144(CX)(AX*8), R14
-	JNZ     sequenceDecs_decodeSync_safe_amd64_adjust_temp_valid
-	MOVQ    $0x00000001, R14
-
-sequenceDecs_decodeSync_safe_amd64_adjust_temp_valid:
-	CMPQ R13, $0x01
-	JZ   sequenceDecs_decodeSync_safe_amd64_adjust_skip
-	MOVQ 152(CX), AX
-	MOVQ AX, 160(CX)
-
-sequenceDecs_decodeSync_safe_amd64_adjust_skip:
-	MOVQ 144(CX), AX
-	MOVQ AX, 152(CX)
-	MOVQ R14, 144(CX)
-	MOVQ R14, R13
-
-sequenceDecs_decodeSync_safe_amd64_after_adjust:
-	MOVQ R13, 8(SP)
-
-	// Check values
-	MOVQ  16(SP), AX
-	MOVQ  24(SP), CX
-	LEAQ  (AX)(CX*1), R14
-	MOVQ  s+0(FP), R15
-	ADDQ  R14, 256(R15)
-	MOVQ  ctx+16(FP), R14
-	SUBQ  CX, 104(R14)
-	JS    error_not_enough_literals
-	CMPQ  AX, $0x00020002
-	JA    sequenceDecs_decodeSync_safe_amd64_error_match_len_too_big
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_safe_amd64_match_len_ofs_ok
-	TESTQ AX, AX
-	JNZ   sequenceDecs_decodeSync_safe_amd64_error_match_len_ofs_mismatch
-
-sequenceDecs_decodeSync_safe_amd64_match_len_ofs_ok:
-	MOVQ 24(SP), AX
-	MOVQ 8(SP), CX
-	MOVQ 16(SP), R13
-
-	// Check if we have enough space in s.out
-	LEAQ (AX)(R13*1), R14
-	ADDQ R10, R14
-	CMPQ R14, 32(SP)
-	JA   error_not_enough_space
-
-	// Copy literals
-	TESTQ AX, AX
-	JZ    check_offset
-	MOVQ  AX, R14
-	SUBQ  $0x10, R14
-	JB    copy_1_small
-
-copy_1_loop:
-	MOVUPS (R11), X0
-	MOVUPS X0, (R10)
-	ADDQ   $0x10, R11
-	ADDQ   $0x10, R10
-	SUBQ   $0x10, R14
-	JAE    copy_1_loop
-	LEAQ   16(R11)(R14*1), R11
-	LEAQ   16(R10)(R14*1), R10
-	MOVUPS -16(R11), X0
-	MOVUPS X0, -16(R10)
-	JMP    copy_1_end
-
-copy_1_small:
-	CMPQ AX, $0x03
-	JE   copy_1_move_3
-	JB   copy_1_move_1or2
-	CMPQ AX, $0x08
-	JB   copy_1_move_4through7
-	JMP  copy_1_move_8through16
-
-copy_1_move_1or2:
-	MOVB (R11), R14
-	MOVB -1(R11)(AX*1), R15
-	MOVB R14, (R10)
-	MOVB R15, -1(R10)(AX*1)
-	ADDQ AX, R11
-	ADDQ AX, R10
-	JMP  copy_1_end
-
-copy_1_move_3:
-	MOVW (R11), R14
-	MOVB 2(R11), R15
-	MOVW R14, (R10)
-	MOVB R15, 2(R10)
-	ADDQ AX, R11
-	ADDQ AX, R10
-	JMP  copy_1_end
-
-copy_1_move_4through7:
-	MOVL (R11), R14
-	MOVL -4(R11)(AX*1), R15
-	MOVL R14, (R10)
-	MOVL R15, -4(R10)(AX*1)
-	ADDQ AX, R11
-	ADDQ AX, R10
-	JMP  copy_1_end
-
-copy_1_move_8through16:
-	MOVQ (R11), R14
-	MOVQ -8(R11)(AX*1), R15
-	MOVQ R14, (R10)
-	MOVQ R15, -8(R10)(AX*1)
-	ADDQ AX, R11
-	ADDQ AX, R10
-
-copy_1_end:
-	ADDQ AX, R12
-
-	// Malformed input if seq.mo > t+len(hist) || seq.mo > s.windowSize)
-check_offset:
-	MOVQ R12, AX
-	ADDQ 40(SP), AX
-	CMPQ CX, AX
-	JG   error_match_off_too_big
-	CMPQ CX, 56(SP)
-	JG   error_match_off_too_big
-
-	// Copy match from history
-	MOVQ CX, AX
-	SUBQ R12, AX
-	JLS  copy_match
-	MOVQ 48(SP), R14
-	SUBQ AX, R14
-	CMPQ R13, AX
-	JG   copy_all_from_history
-	MOVQ R13, AX
-	SUBQ $0x10, AX
-	JB   copy_4_small
-
-copy_4_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R10)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R10
-	SUBQ   $0x10, AX
-	JAE    copy_4_loop
-	LEAQ   16(R14)(AX*1), R14
-	LEAQ   16(R10)(AX*1), R10
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R10)
-	JMP    copy_4_end
-
-copy_4_small:
-	CMPQ R13, $0x03
-	JE   copy_4_move_3
-	CMPQ R13, $0x08
-	JB   copy_4_move_4through7
-	JMP  copy_4_move_8through16
-
-copy_4_move_3:
-	MOVW (R14), AX
-	MOVB 2(R14), CL
-	MOVW AX, (R10)
-	MOVB CL, 2(R10)
-	ADDQ R13, R14
-	ADDQ R13, R10
-	JMP  copy_4_end
-
-copy_4_move_4through7:
-	MOVL (R14), AX
-	MOVL -4(R14)(R13*1), CX
-	MOVL AX, (R10)
-	MOVL CX, -4(R10)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R10
-	JMP  copy_4_end
-
-copy_4_move_8through16:
-	MOVQ (R14), AX
-	MOVQ -8(R14)(R13*1), CX
-	MOVQ AX, (R10)
-	MOVQ CX, -8(R10)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R10
-
-copy_4_end:
-	ADDQ R13, R12
-	JMP  handle_loop
-	JMP loop_finished
-
-copy_all_from_history:
-	MOVQ AX, R15
-	SUBQ $0x10, R15
-	JB   copy_5_small
-
-copy_5_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R10)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R10
-	SUBQ   $0x10, R15
-	JAE    copy_5_loop
-	LEAQ   16(R14)(R15*1), R14
-	LEAQ   16(R10)(R15*1), R10
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R10)
-	JMP    copy_5_end
-
-copy_5_small:
-	CMPQ AX, $0x03
-	JE   copy_5_move_3
-	JB   copy_5_move_1or2
-	CMPQ AX, $0x08
-	JB   copy_5_move_4through7
-	JMP  copy_5_move_8through16
-
-copy_5_move_1or2:
-	MOVB (R14), R15
-	MOVB -1(R14)(AX*1), BP
-	MOVB R15, (R10)
-	MOVB BP, -1(R10)(AX*1)
-	ADDQ AX, R14
-	ADDQ AX, R10
-	JMP  copy_5_end
-
-copy_5_move_3:
-	MOVW (R14), R15
-	MOVB 2(R14), BP
-	MOVW R15, (R10)
-	MOVB BP, 2(R10)
-	ADDQ AX, R14
-	ADDQ AX, R10
-	JMP  copy_5_end
-
-copy_5_move_4through7:
-	MOVL (R14), R15
-	MOVL -4(R14)(AX*1), BP
-	MOVL R15, (R10)
-	MOVL BP, -4(R10)(AX*1)
-	ADDQ AX, R14
-	ADDQ AX, R10
-	JMP  copy_5_end
-
-copy_5_move_8through16:
-	MOVQ (R14), R15
-	MOVQ -8(R14)(AX*1), BP
-	MOVQ R15, (R10)
-	MOVQ BP, -8(R10)(AX*1)
-	ADDQ AX, R14
-	ADDQ AX, R10
-
-copy_5_end:
-	ADDQ AX, R12
-	SUBQ AX, R13
-
-	// Copy match from the current buffer
-copy_match:
-	MOVQ R10, AX
-	SUBQ CX, AX
-
-	// ml <= mo
-	CMPQ R13, CX
-	JA   copy_overlapping_match
-
-	// Copy non-overlapping match
-	ADDQ R13, R12
-	MOVQ R13, CX
-	SUBQ $0x10, CX
-	JB   copy_2_small
-
-copy_2_loop:
-	MOVUPS (AX), X0
-	MOVUPS X0, (R10)
-	ADDQ   $0x10, AX
-	ADDQ   $0x10, R10
-	SUBQ   $0x10, CX
-	JAE    copy_2_loop
-	LEAQ   16(AX)(CX*1), AX
-	LEAQ   16(R10)(CX*1), R10
-	MOVUPS -16(AX), X0
-	MOVUPS X0, -16(R10)
-	JMP    copy_2_end
-
-copy_2_small:
-	CMPQ R13, $0x03
-	JE   copy_2_move_3
-	JB   copy_2_move_1or2
-	CMPQ R13, $0x08
-	JB   copy_2_move_4through7
-	JMP  copy_2_move_8through16
-
-copy_2_move_1or2:
-	MOVB (AX), CL
-	MOVB -1(AX)(R13*1), R14
-	MOVB CL, (R10)
-	MOVB R14, -1(R10)(R13*1)
-	ADDQ R13, AX
-	ADDQ R13, R10
-	JMP  copy_2_end
-
-copy_2_move_3:
-	MOVW (AX), CX
-	MOVB 2(AX), R14
-	MOVW CX, (R10)
-	MOVB R14, 2(R10)
-	ADDQ R13, AX
-	ADDQ R13, R10
-	JMP  copy_2_end
-
-copy_2_move_4through7:
-	MOVL (AX), CX
-	MOVL -4(AX)(R13*1), R14
-	MOVL CX, (R10)
-	MOVL R14, -4(R10)(R13*1)
-	ADDQ R13, AX
-	ADDQ R13, R10
-	JMP  copy_2_end
-
-copy_2_move_8through16:
-	MOVQ (AX), CX
-	MOVQ -8(AX)(R13*1), R14
-	MOVQ CX, (R10)
-	MOVQ R14, -8(R10)(R13*1)
-	ADDQ R13, AX
-	ADDQ R13, R10
-
-copy_2_end:
-	JMP handle_loop
-
-	// Copy overlapping match
-copy_overlapping_match:
-	ADDQ R13, R12
-
-copy_slow_3:
-	MOVB (AX), CL
-	MOVB CL, (R10)
-	INCQ AX
-	INCQ R10
-	DECQ R13
-	JNZ  copy_slow_3
-
-handle_loop:
-	MOVQ ctx+16(FP), AX
-	DECQ 96(AX)
-	JNS  sequenceDecs_decodeSync_safe_amd64_main_loop
-
-loop_finished:
-	MOVQ br+8(FP), AX
-	MOVQ DX, 24(AX)
-	MOVB BL, 32(AX)
-	MOVQ SI, 8(AX)
-
-	// Update the context
-	MOVQ ctx+16(FP), AX
-	MOVQ R12, 136(AX)
-	MOVQ 144(AX), CX
-	SUBQ CX, R11
-	MOVQ R11, 168(AX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decodeSync_safe_amd64_error_match_len_ofs_mismatch:
-	MOVQ 16(SP), AX
-	MOVQ ctx+16(FP), CX
-	MOVQ AX, 216(CX)
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decodeSync_safe_amd64_error_match_len_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-error_match_off_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 8(SP), CX
-	MOVQ CX, 224(AX)
-	MOVQ R12, 136(AX)
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-	// Return with not enough output space error
-error_not_enough_space:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ R12, 136(AX)
-	MOVQ $0x00000005, ret+24(FP)
-	RET
-
-// func sequenceDecs_decodeSync_safe_bmi2(s *sequenceDecs, br *bitReader, ctx *decodeSyncAsmContext) int
-// Requires: BMI, BMI2, CMOV, SSE
-TEXT ·sequenceDecs_decodeSync_safe_bmi2(SB), $64-32
-	MOVQ    br+8(FP), BX
-	MOVQ    24(BX), AX
-	MOVBQZX 32(BX), DX
-	MOVQ    (BX), CX
-	MOVQ    8(BX), BX
-	ADDQ    BX, CX
-	MOVQ    CX, (SP)
-	MOVQ    ctx+16(FP), CX
-	MOVQ    72(CX), SI
-	MOVQ    80(CX), DI
-	MOVQ    88(CX), R8
-	XORQ    R9, R9
-	MOVQ    R9, 8(SP)
-	MOVQ    R9, 16(SP)
-	MOVQ    R9, 24(SP)
-	MOVQ    112(CX), R9
-	MOVQ    128(CX), R10
-	MOVQ    R10, 32(SP)
-	MOVQ    144(CX), R10
-	MOVQ    136(CX), R11
-	MOVQ    200(CX), R12
-	MOVQ    R12, 56(SP)
-	MOVQ    176(CX), R12
-	MOVQ    R12, 48(SP)
-	MOVQ    184(CX), CX
-	MOVQ    CX, 40(SP)
-	MOVQ    40(SP), CX
-	ADDQ    CX, 48(SP)
-
-	// Calculate pointer to s.out[cap(s.out)] (a past-end pointer)
-	ADDQ R9, 32(SP)
-
-	// outBase += outPosition
-	ADDQ R11, R9
-
-sequenceDecs_decodeSync_safe_bmi2_main_loop:
-	MOVQ (SP), R12
-
-	// Fill bitreader to have enough for the offset and match length.
-	CMPQ BX, $0x08
-	JL   sequenceDecs_decodeSync_safe_bmi2_fill_byte_by_byte
-	MOVQ DX, CX
-	SHRQ $0x03, CX
-	SUBQ CX, R12
-	MOVQ (R12), AX
-	SUBQ CX, BX
-	ANDQ $0x07, DX
-	JMP  sequenceDecs_decodeSync_safe_bmi2_fill_end
-
-sequenceDecs_decodeSync_safe_bmi2_fill_byte_by_byte:
-	CMPQ    BX, $0x00
-	JLE     sequenceDecs_decodeSync_safe_bmi2_fill_check_overread
-	CMPQ    DX, $0x07
-	JLE     sequenceDecs_decodeSync_safe_bmi2_fill_end
-	SHLQ    $0x08, AX
-	SUBQ    $0x01, R12
-	SUBQ    $0x01, BX
-	SUBQ    $0x08, DX
-	MOVBQZX (R12), CX
-	ORQ     CX, AX
-	JMP     sequenceDecs_decodeSync_safe_bmi2_fill_byte_by_byte
-
-sequenceDecs_decodeSync_safe_bmi2_fill_check_overread:
-	CMPQ DX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_safe_bmi2_fill_end:
-	// Update offset
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, R8, R13
-	MOVQ   AX, R14
-	LEAQ   (DX)(R13*1), CX
-	ROLQ   CL, R14
-	BZHIQ  R13, R14, R14
-	MOVQ   CX, DX
-	MOVQ   R8, CX
-	SHRQ   $0x20, CX
-	ADDQ   R14, CX
-	MOVQ   CX, 8(SP)
-
-	// Update match length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, DI, R13
-	MOVQ   AX, R14
-	LEAQ   (DX)(R13*1), CX
-	ROLQ   CL, R14
-	BZHIQ  R13, R14, R14
-	MOVQ   CX, DX
-	MOVQ   DI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R14, CX
-	MOVQ   CX, 16(SP)
-
-	// Fill bitreader to have enough for the remaining
-	CMPQ BX, $0x08
-	JL   sequenceDecs_decodeSync_safe_bmi2_fill_2_byte_by_byte
-	MOVQ DX, CX
-	SHRQ $0x03, CX
-	SUBQ CX, R12
-	MOVQ (R12), AX
-	SUBQ CX, BX
-	ANDQ $0x07, DX
-	JMP  sequenceDecs_decodeSync_safe_bmi2_fill_2_end
-
-sequenceDecs_decodeSync_safe_bmi2_fill_2_byte_by_byte:
-	CMPQ    BX, $0x00
-	JLE     sequenceDecs_decodeSync_safe_bmi2_fill_2_check_overread
-	CMPQ    DX, $0x07
-	JLE     sequenceDecs_decodeSync_safe_bmi2_fill_2_end
-	SHLQ    $0x08, AX
-	SUBQ    $0x01, R12
-	SUBQ    $0x01, BX
-	SUBQ    $0x08, DX
-	MOVBQZX (R12), CX
-	ORQ     CX, AX
-	JMP     sequenceDecs_decodeSync_safe_bmi2_fill_2_byte_by_byte
-
-sequenceDecs_decodeSync_safe_bmi2_fill_2_check_overread:
-	CMPQ DX, $0x40
-	JA   error_overread
-
-sequenceDecs_decodeSync_safe_bmi2_fill_2_end:
-	// Update literal length
-	MOVQ   $0x00000808, CX
-	BEXTRQ CX, SI, R13
-	MOVQ   AX, R14
-	LEAQ   (DX)(R13*1), CX
-	ROLQ   CL, R14
-	BZHIQ  R13, R14, R14
-	MOVQ   CX, DX
-	MOVQ   SI, CX
-	SHRQ   $0x20, CX
-	ADDQ   R14, CX
-	MOVQ   CX, 24(SP)
-
-	// Fill bitreader for state updates
-	MOVQ    R12, (SP)
-	MOVQ    $0x00000808, CX
-	BEXTRQ  CX, R8, R12
-	MOVQ    ctx+16(FP), CX
-	CMPQ    96(CX), $0x00
-	JZ      sequenceDecs_decodeSync_safe_bmi2_skip_update
-	LEAQ    (SI)(DI*1), R13
-	ADDQ    R8, R13
-	MOVBQZX R13, R13
-	LEAQ    (DX)(R13*1), CX
-	MOVQ    AX, R14
-	MOVQ    CX, DX
-	ROLQ    CL, R14
-	BZHIQ   R13, R14, R14
-
-	// Update Offset State
-	BZHIQ R8, R14, CX
-	SHRXQ R8, R14, R14
-	SHRL  $0x10, R8
-	ADDQ  CX, R8
-
-	// Load ctx.ofTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 48(CX), CX
-	MOVQ (CX)(R8*8), R8
-
-	// Update Match Length State
-	BZHIQ DI, R14, CX
-	SHRXQ DI, R14, R14
-	SHRL  $0x10, DI
-	ADDQ  CX, DI
-
-	// Load ctx.mlTable
-	MOVQ ctx+16(FP), CX
-	MOVQ 24(CX), CX
-	MOVQ (CX)(DI*8), DI
-
-	// Update Literal Length State
-	BZHIQ SI, R14, CX
-	SHRL  $0x10, SI
-	ADDQ  CX, SI
-
-	// Load ctx.llTable
-	MOVQ ctx+16(FP), CX
-	MOVQ (CX), CX
-	MOVQ (CX)(SI*8), SI
-
-sequenceDecs_decodeSync_safe_bmi2_skip_update:
-	// Adjust offset
-	MOVQ   s+0(FP), CX
-	MOVQ   8(SP), R13
-	CMPQ   R12, $0x01
-	JBE    sequenceDecs_decodeSync_safe_bmi2_adjust_offsetB_1_or_0
-	MOVUPS 144(CX), X0
-	MOVQ   R13, 144(CX)
-	MOVUPS X0, 152(CX)
-	JMP    sequenceDecs_decodeSync_safe_bmi2_after_adjust
-
-sequenceDecs_decodeSync_safe_bmi2_adjust_offsetB_1_or_0:
-	CMPQ 24(SP), $0x00000000
-	JNE  sequenceDecs_decodeSync_safe_bmi2_adjust_offset_maybezero
-	INCQ R13
-	JMP  sequenceDecs_decodeSync_safe_bmi2_adjust_offset_nonzero
-
-sequenceDecs_decodeSync_safe_bmi2_adjust_offset_maybezero:
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_safe_bmi2_adjust_offset_nonzero
-	MOVQ  144(CX), R13
-	JMP   sequenceDecs_decodeSync_safe_bmi2_after_adjust
-
-sequenceDecs_decodeSync_safe_bmi2_adjust_offset_nonzero:
-	MOVQ    R13, R12
-	XORQ    R14, R14
-	MOVQ    $-1, R15
-	CMPQ    R13, $0x03
-	CMOVQEQ R14, R12
-	CMOVQEQ R15, R14
-	ADDQ    144(CX)(R12*8), R14
-	JNZ     sequenceDecs_decodeSync_safe_bmi2_adjust_temp_valid
-	MOVQ    $0x00000001, R14
-
-sequenceDecs_decodeSync_safe_bmi2_adjust_temp_valid:
-	CMPQ R13, $0x01
-	JZ   sequenceDecs_decodeSync_safe_bmi2_adjust_skip
-	MOVQ 152(CX), R12
-	MOVQ R12, 160(CX)
-
-sequenceDecs_decodeSync_safe_bmi2_adjust_skip:
-	MOVQ 144(CX), R12
-	MOVQ R12, 152(CX)
-	MOVQ R14, 144(CX)
-	MOVQ R14, R13
-
-sequenceDecs_decodeSync_safe_bmi2_after_adjust:
-	MOVQ R13, 8(SP)
-
-	// Check values
-	MOVQ  16(SP), CX
-	MOVQ  24(SP), R12
-	LEAQ  (CX)(R12*1), R14
-	MOVQ  s+0(FP), R15
-	ADDQ  R14, 256(R15)
-	MOVQ  ctx+16(FP), R14
-	SUBQ  R12, 104(R14)
-	JS    error_not_enough_literals
-	CMPQ  CX, $0x00020002
-	JA    sequenceDecs_decodeSync_safe_bmi2_error_match_len_too_big
-	TESTQ R13, R13
-	JNZ   sequenceDecs_decodeSync_safe_bmi2_match_len_ofs_ok
-	TESTQ CX, CX
-	JNZ   sequenceDecs_decodeSync_safe_bmi2_error_match_len_ofs_mismatch
-
-sequenceDecs_decodeSync_safe_bmi2_match_len_ofs_ok:
-	MOVQ 24(SP), CX
-	MOVQ 8(SP), R12
-	MOVQ 16(SP), R13
-
-	// Check if we have enough space in s.out
-	LEAQ (CX)(R13*1), R14
-	ADDQ R9, R14
-	CMPQ R14, 32(SP)
-	JA   error_not_enough_space
-
-	// Copy literals
-	TESTQ CX, CX
-	JZ    check_offset
-	MOVQ  CX, R14
-	SUBQ  $0x10, R14
-	JB    copy_1_small
-
-copy_1_loop:
-	MOVUPS (R10), X0
-	MOVUPS X0, (R9)
-	ADDQ   $0x10, R10
-	ADDQ   $0x10, R9
-	SUBQ   $0x10, R14
-	JAE    copy_1_loop
-	LEAQ   16(R10)(R14*1), R10
-	LEAQ   16(R9)(R14*1), R9
-	MOVUPS -16(R10), X0
-	MOVUPS X0, -16(R9)
-	JMP    copy_1_end
-
-copy_1_small:
-	CMPQ CX, $0x03
-	JE   copy_1_move_3
-	JB   copy_1_move_1or2
-	CMPQ CX, $0x08
-	JB   copy_1_move_4through7
-	JMP  copy_1_move_8through16
-
-copy_1_move_1or2:
-	MOVB (R10), R14
-	MOVB -1(R10)(CX*1), R15
-	MOVB R14, (R9)
-	MOVB R15, -1(R9)(CX*1)
-	ADDQ CX, R10
-	ADDQ CX, R9
-	JMP  copy_1_end
-
-copy_1_move_3:
-	MOVW (R10), R14
-	MOVB 2(R10), R15
-	MOVW R14, (R9)
-	MOVB R15, 2(R9)
-	ADDQ CX, R10
-	ADDQ CX, R9
-	JMP  copy_1_end
-
-copy_1_move_4through7:
-	MOVL (R10), R14
-	MOVL -4(R10)(CX*1), R15
-	MOVL R14, (R9)
-	MOVL R15, -4(R9)(CX*1)
-	ADDQ CX, R10
-	ADDQ CX, R9
-	JMP  copy_1_end
-
-copy_1_move_8through16:
-	MOVQ (R10), R14
-	MOVQ -8(R10)(CX*1), R15
-	MOVQ R14, (R9)
-	MOVQ R15, -8(R9)(CX*1)
-	ADDQ CX, R10
-	ADDQ CX, R9
-
-copy_1_end:
-	ADDQ CX, R11
-
-	// Malformed input if seq.mo > t+len(hist) || seq.mo > s.windowSize)
-check_offset:
-	MOVQ R11, CX
-	ADDQ 40(SP), CX
-	CMPQ R12, CX
-	JG   error_match_off_too_big
-	CMPQ R12, 56(SP)
-	JG   error_match_off_too_big
-
-	// Copy match from history
-	MOVQ R12, CX
-	SUBQ R11, CX
-	JLS  copy_match
-	MOVQ 48(SP), R14
-	SUBQ CX, R14
-	CMPQ R13, CX
-	JG   copy_all_from_history
-	MOVQ R13, CX
-	SUBQ $0x10, CX
-	JB   copy_4_small
-
-copy_4_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R9)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R9
-	SUBQ   $0x10, CX
-	JAE    copy_4_loop
-	LEAQ   16(R14)(CX*1), R14
-	LEAQ   16(R9)(CX*1), R9
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R9)
-	JMP    copy_4_end
-
-copy_4_small:
-	CMPQ R13, $0x03
-	JE   copy_4_move_3
-	CMPQ R13, $0x08
-	JB   copy_4_move_4through7
-	JMP  copy_4_move_8through16
-
-copy_4_move_3:
-	MOVW (R14), CX
-	MOVB 2(R14), R12
-	MOVW CX, (R9)
-	MOVB R12, 2(R9)
-	ADDQ R13, R14
-	ADDQ R13, R9
-	JMP  copy_4_end
-
-copy_4_move_4through7:
-	MOVL (R14), CX
-	MOVL -4(R14)(R13*1), R12
-	MOVL CX, (R9)
-	MOVL R12, -4(R9)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R9
-	JMP  copy_4_end
-
-copy_4_move_8through16:
-	MOVQ (R14), CX
-	MOVQ -8(R14)(R13*1), R12
-	MOVQ CX, (R9)
-	MOVQ R12, -8(R9)(R13*1)
-	ADDQ R13, R14
-	ADDQ R13, R9
-
-copy_4_end:
-	ADDQ R13, R11
-	JMP  handle_loop
-	JMP loop_finished
-
-copy_all_from_history:
-	MOVQ CX, R15
-	SUBQ $0x10, R15
-	JB   copy_5_small
-
-copy_5_loop:
-	MOVUPS (R14), X0
-	MOVUPS X0, (R9)
-	ADDQ   $0x10, R14
-	ADDQ   $0x10, R9
-	SUBQ   $0x10, R15
-	JAE    copy_5_loop
-	LEAQ   16(R14)(R15*1), R14
-	LEAQ   16(R9)(R15*1), R9
-	MOVUPS -16(R14), X0
-	MOVUPS X0, -16(R9)
-	JMP    copy_5_end
-
-copy_5_small:
-	CMPQ CX, $0x03
-	JE   copy_5_move_3
-	JB   copy_5_move_1or2
-	CMPQ CX, $0x08
-	JB   copy_5_move_4through7
-	JMP  copy_5_move_8through16
-
-copy_5_move_1or2:
-	MOVB (R14), R15
-	MOVB -1(R14)(CX*1), BP
-	MOVB R15, (R9)
-	MOVB BP, -1(R9)(CX*1)
-	ADDQ CX, R14
-	ADDQ CX, R9
-	JMP  copy_5_end
-
-copy_5_move_3:
-	MOVW (R14), R15
-	MOVB 2(R14), BP
-	MOVW R15, (R9)
-	MOVB BP, 2(R9)
-	ADDQ CX, R14
-	ADDQ CX, R9
-	JMP  copy_5_end
-
-copy_5_move_4through7:
-	MOVL (R14), R15
-	MOVL -4(R14)(CX*1), BP
-	MOVL R15, (R9)
-	MOVL BP, -4(R9)(CX*1)
-	ADDQ CX, R14
-	ADDQ CX, R9
-	JMP  copy_5_end
-
-copy_5_move_8through16:
-	MOVQ (R14), R15
-	MOVQ -8(R14)(CX*1), BP
-	MOVQ R15, (R9)
-	MOVQ BP, -8(R9)(CX*1)
-	ADDQ CX, R14
-	ADDQ CX, R9
-
-copy_5_end:
-	ADDQ CX, R11
-	SUBQ CX, R13
-
-	// Copy match from the current buffer
-copy_match:
-	MOVQ R9, CX
-	SUBQ R12, CX
-
-	// ml <= mo
-	CMPQ R13, R12
-	JA   copy_overlapping_match
-
-	// Copy non-overlapping match
-	ADDQ R13, R11
-	MOVQ R13, R12
-	SUBQ $0x10, R12
-	JB   copy_2_small
-
-copy_2_loop:
-	MOVUPS (CX), X0
-	MOVUPS X0, (R9)
-	ADDQ   $0x10, CX
-	ADDQ   $0x10, R9
-	SUBQ   $0x10, R12
-	JAE    copy_2_loop
-	LEAQ   16(CX)(R12*1), CX
-	LEAQ   16(R9)(R12*1), R9
-	MOVUPS -16(CX), X0
-	MOVUPS X0, -16(R9)
-	JMP    copy_2_end
-
-copy_2_small:
-	CMPQ R13, $0x03
-	JE   copy_2_move_3
-	JB   copy_2_move_1or2
-	CMPQ R13, $0x08
-	JB   copy_2_move_4through7
-	JMP  copy_2_move_8through16
-
-copy_2_move_1or2:
-	MOVB (CX), R12
-	MOVB -1(CX)(R13*1), R14
-	MOVB R12, (R9)
-	MOVB R14, -1(R9)(R13*1)
-	ADDQ R13, CX
-	ADDQ R13, R9
-	JMP  copy_2_end
-
-copy_2_move_3:
-	MOVW (CX), R12
-	MOVB 2(CX), R14
-	MOVW R12, (R9)
-	MOVB R14, 2(R9)
-	ADDQ R13, CX
-	ADDQ R13, R9
-	JMP  copy_2_end
-
-copy_2_move_4through7:
-	MOVL (CX), R12
-	MOVL -4(CX)(R13*1), R14
-	MOVL R12, (R9)
-	MOVL R14, -4(R9)(R13*1)
-	ADDQ R13, CX
-	ADDQ R13, R9
-	JMP  copy_2_end
-
-copy_2_move_8through16:
-	MOVQ (CX), R12
-	MOVQ -8(CX)(R13*1), R14
-	MOVQ R12, (R9)
-	MOVQ R14, -8(R9)(R13*1)
-	ADDQ R13, CX
-	ADDQ R13, R9
-
-copy_2_end:
-	JMP handle_loop
-
-	// Copy overlapping match
-copy_overlapping_match:
-	ADDQ R13, R11
-
-copy_slow_3:
-	MOVB (CX), R12
-	MOVB R12, (R9)
-	INCQ CX
-	INCQ R9
-	DECQ R13
-	JNZ  copy_slow_3
-
-handle_loop:
-	MOVQ ctx+16(FP), CX
-	DECQ 96(CX)
-	JNS  sequenceDecs_decodeSync_safe_bmi2_main_loop
-
-loop_finished:
-	MOVQ br+8(FP), CX
-	MOVQ AX, 24(CX)
-	MOVB DL, 32(CX)
-	MOVQ BX, 8(CX)
-
-	// Update the context
-	MOVQ ctx+16(FP), AX
-	MOVQ R11, 136(AX)
-	MOVQ 144(AX), CX
-	SUBQ CX, R10
-	MOVQ R10, 168(AX)
-
-	// Return success
-	MOVQ $0x00000000, ret+24(FP)
-	RET
-
-	// Return with match length error
-sequenceDecs_decodeSync_safe_bmi2_error_match_len_ofs_mismatch:
-	MOVQ 16(SP), AX
-	MOVQ ctx+16(FP), CX
-	MOVQ AX, 216(CX)
-	MOVQ $0x00000001, ret+24(FP)
-	RET
-
-	// Return with match too long error
-sequenceDecs_decodeSync_safe_bmi2_error_match_len_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ $0x00000002, ret+24(FP)
-	RET
-
-	// Return with match offset too long error
-error_match_off_too_big:
-	MOVQ ctx+16(FP), AX
-	MOVQ 8(SP), CX
-	MOVQ CX, 224(AX)
-	MOVQ R11, 136(AX)
-	MOVQ $0x00000003, ret+24(FP)
-	RET
-
-	// Return with not enough literals error
-error_not_enough_literals:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ $0x00000004, ret+24(FP)
-	RET
-
-	// Return with overread error
-error_overread:
-	MOVQ $0x00000006, ret+24(FP)
-	RET
-
-	// Return with not enough output space error
-error_not_enough_space:
-	MOVQ ctx+16(FP), AX
-	MOVQ 24(SP), CX
-	MOVQ CX, 208(AX)
-	MOVQ 16(SP), CX
-	MOVQ CX, 216(AX)
-	MOVQ R11, 136(AX)
-	MOVQ $0x00000005, ret+24(FP)
-	RET
diff --git a/vendor/github.com/klauspost/compress/zstd/seqdec_generic.go b/vendor/github.com/klauspost/compress/zstd/seqdec_generic.go
deleted file mode 100644
index 2fb35b788..000000000
--- a/vendor/github.com/klauspost/compress/zstd/seqdec_generic.go
+++ /dev/null
@@ -1,237 +0,0 @@
-//go:build !amd64 || appengine || !gc || noasm
-// +build !amd64 appengine !gc noasm
-
-package zstd
-
-import (
-	"fmt"
-	"io"
-)
-
-// decode sequences from the stream with the provided history but without dictionary.
-func (s *sequenceDecs) decodeSyncSimple(hist []byte) (bool, error) {
-	return false, nil
-}
-
-// decode sequences from the stream without the provided history.
-func (s *sequenceDecs) decode(seqs []seqVals) error {
-	br := s.br
-
-	// Grab full sizes tables, to avoid bounds checks.
-	llTable, mlTable, ofTable := s.litLengths.fse.dt[:maxTablesize], s.matchLengths.fse.dt[:maxTablesize], s.offsets.fse.dt[:maxTablesize]
-	llState, mlState, ofState := s.litLengths.state.state, s.matchLengths.state.state, s.offsets.state.state
-	s.seqSize = 0
-	litRemain := len(s.literals)
-
-	maxBlockSize := maxCompressedBlockSize
-	if s.windowSize < maxBlockSize {
-		maxBlockSize = s.windowSize
-	}
-	for i := range seqs {
-		var ll, mo, ml int
-		if len(br.in) > 4+((maxOffsetBits+16+16)>>3) {
-			// inlined function:
-			// ll, mo, ml = s.nextFast(br, llState, mlState, ofState)
-
-			// Final will not read from stream.
-			var llB, mlB, moB uint8
-			ll, llB = llState.final()
-			ml, mlB = mlState.final()
-			mo, moB = ofState.final()
-
-			// extra bits are stored in reverse order.
-			br.fillFast()
-			mo += br.getBits(moB)
-			if s.maxBits > 32 {
-				br.fillFast()
-			}
-			ml += br.getBits(mlB)
-			ll += br.getBits(llB)
-
-			if moB > 1 {
-				s.prevOffset[2] = s.prevOffset[1]
-				s.prevOffset[1] = s.prevOffset[0]
-				s.prevOffset[0] = mo
-			} else {
-				// mo = s.adjustOffset(mo, ll, moB)
-				// Inlined for rather big speedup
-				if ll == 0 {
-					// There is an exception though, when current sequence's literals_length = 0.
-					// In this case, repeated offsets are shifted by one, so an offset_value of 1 means Repeated_Offset2,
-					// an offset_value of 2 means Repeated_Offset3, and an offset_value of 3 means Repeated_Offset1 - 1_byte.
-					mo++
-				}
-
-				if mo == 0 {
-					mo = s.prevOffset[0]
-				} else {
-					var temp int
-					if mo == 3 {
-						temp = s.prevOffset[0] - 1
-					} else {
-						temp = s.prevOffset[mo]
-					}
-
-					if temp == 0 {
-						// 0 is not valid; input is corrupted; force offset to 1
-						println("WARNING: temp was 0")
-						temp = 1
-					}
-
-					if mo != 1 {
-						s.prevOffset[2] = s.prevOffset[1]
-					}
-					s.prevOffset[1] = s.prevOffset[0]
-					s.prevOffset[0] = temp
-					mo = temp
-				}
-			}
-			br.fillFast()
-		} else {
-			if br.overread() {
-				if debugDecoder {
-					printf("reading sequence %d, exceeded available data\n", i)
-				}
-				return io.ErrUnexpectedEOF
-			}
-			ll, mo, ml = s.next(br, llState, mlState, ofState)
-			br.fill()
-		}
-
-		if debugSequences {
-			println("Seq", i, "Litlen:", ll, "mo:", mo, "(abs) ml:", ml)
-		}
-		// Evaluate.
-		// We might be doing this async, so do it early.
-		if mo == 0 && ml > 0 {
-			return fmt.Errorf("zero matchoff and matchlen (%d) > 0", ml)
-		}
-		if ml > maxMatchLen {
-			return fmt.Errorf("match len (%d) bigger than max allowed length", ml)
-		}
-		s.seqSize += ll + ml
-		if s.seqSize > maxBlockSize {
-			return fmt.Errorf("output bigger than max block size (%d)", maxBlockSize)
-		}
-		litRemain -= ll
-		if litRemain < 0 {
-			return fmt.Errorf("unexpected literal count, want %d bytes, but only %d is available", ll, litRemain+ll)
-		}
-		seqs[i] = seqVals{
-			ll: ll,
-			ml: ml,
-			mo: mo,
-		}
-		if i == len(seqs)-1 {
-			// This is the last sequence, so we shouldn't update state.
-			break
-		}
-
-		// Manually inlined, ~ 5-20% faster
-		// Update all 3 states at once. Approx 20% faster.
-		nBits := llState.nbBits() + mlState.nbBits() + ofState.nbBits()
-		if nBits == 0 {
-			llState = llTable[llState.newState()&maxTableMask]
-			mlState = mlTable[mlState.newState()&maxTableMask]
-			ofState = ofTable[ofState.newState()&maxTableMask]
-		} else {
-			bits := br.get32BitsFast(nBits)
-			lowBits := uint16(bits >> ((ofState.nbBits() + mlState.nbBits()) & 31))
-			llState = llTable[(llState.newState()+lowBits)&maxTableMask]
-
-			lowBits = uint16(bits >> (ofState.nbBits() & 31))
-			lowBits &= bitMask[mlState.nbBits()&15]
-			mlState = mlTable[(mlState.newState()+lowBits)&maxTableMask]
-
-			lowBits = uint16(bits) & bitMask[ofState.nbBits()&15]
-			ofState = ofTable[(ofState.newState()+lowBits)&maxTableMask]
-		}
-	}
-	s.seqSize += litRemain
-	if s.seqSize > maxBlockSize {
-		return fmt.Errorf("output bigger than max block size (%d)", maxBlockSize)
-	}
-	err := br.close()
-	if err != nil {
-		printf("Closing sequences: %v, %+v\n", err, *br)
-	}
-	return err
-}
-
-// executeSimple handles cases when a dictionary is not used.
-func (s *sequenceDecs) executeSimple(seqs []seqVals, hist []byte) error {
-	// Ensure we have enough output size...
-	if len(s.out)+s.seqSize > cap(s.out) {
-		addBytes := s.seqSize + len(s.out)
-		s.out = append(s.out, make([]byte, addBytes)...)
-		s.out = s.out[:len(s.out)-addBytes]
-	}
-
-	if debugDecoder {
-		printf("Execute %d seqs with literals: %d into %d bytes\n", len(seqs), len(s.literals), s.seqSize)
-	}
-
-	var t = len(s.out)
-	out := s.out[:t+s.seqSize]
-
-	for _, seq := range seqs {
-		// Add literals
-		copy(out[t:], s.literals[:seq.ll])
-		t += seq.ll
-		s.literals = s.literals[seq.ll:]
-
-		// Malformed input
-		if seq.mo > t+len(hist) || seq.mo > s.windowSize {
-			return fmt.Errorf("match offset (%d) bigger than current history (%d)", seq.mo, t+len(hist))
-		}
-
-		// Copy from history.
-		if v := seq.mo - t; v > 0 {
-			// v is the start position in history from end.
-			start := len(hist) - v
-			if seq.ml > v {
-				// Some goes into the current block.
-				// Copy remainder of history
-				copy(out[t:], hist[start:])
-				t += v
-				seq.ml -= v
-			} else {
-				copy(out[t:], hist[start:start+seq.ml])
-				t += seq.ml
-				continue
-			}
-		}
-
-		// We must be in the current buffer now
-		if seq.ml > 0 {
-			start := t - seq.mo
-			if seq.ml <= t-start {
-				// No overlap
-				copy(out[t:], out[start:start+seq.ml])
-				t += seq.ml
-			} else {
-				// Overlapping copy
-				// Extend destination slice and copy one byte at the time.
-				src := out[start : start+seq.ml]
-				dst := out[t:]
-				dst = dst[:len(src)]
-				t += len(src)
-				// Destination is the space we just added.
-				for i := range src {
-					dst[i] = src[i]
-				}
-			}
-		}
-	}
-	// Add final literals
-	copy(out[t:], s.literals)
-	if debugDecoder {
-		t += len(s.literals)
-		if t != len(out) {
-			panic(fmt.Errorf("length mismatch, want %d, got %d, ss: %d", len(out), t, s.seqSize))
-		}
-	}
-	s.out = out
-
-	return nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/seqenc.go b/vendor/github.com/klauspost/compress/zstd/seqenc.go
deleted file mode 100644
index 8014174a7..000000000
--- a/vendor/github.com/klauspost/compress/zstd/seqenc.go
+++ /dev/null
@@ -1,114 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import "math/bits"
-
-type seqCoders struct {
-	llEnc, ofEnc, mlEnc    *fseEncoder
-	llPrev, ofPrev, mlPrev *fseEncoder
-}
-
-// swap coders with another (block).
-func (s *seqCoders) swap(other *seqCoders) {
-	*s, *other = *other, *s
-}
-
-// setPrev will update the previous encoders to the actually used ones
-// and make sure a fresh one is in the main slot.
-func (s *seqCoders) setPrev(ll, ml, of *fseEncoder) {
-	compareSwap := func(used *fseEncoder, current, prev **fseEncoder) {
-		// We used the new one, more current to history and reuse the previous history
-		if *current == used {
-			*prev, *current = *current, *prev
-			c := *current
-			p := *prev
-			c.reUsed = false
-			p.reUsed = true
-			return
-		}
-		if used == *prev {
-			return
-		}
-		// Ensure we cannot reuse by accident
-		prevEnc := *prev
-		prevEnc.symbolLen = 0
-	}
-	compareSwap(ll, &s.llEnc, &s.llPrev)
-	compareSwap(ml, &s.mlEnc, &s.mlPrev)
-	compareSwap(of, &s.ofEnc, &s.ofPrev)
-}
-
-func highBit(val uint32) (n uint32) {
-	return uint32(bits.Len32(val) - 1)
-}
-
-var llCodeTable = [64]byte{0, 1, 2, 3, 4, 5, 6, 7,
-	8, 9, 10, 11, 12, 13, 14, 15,
-	16, 16, 17, 17, 18, 18, 19, 19,
-	20, 20, 20, 20, 21, 21, 21, 21,
-	22, 22, 22, 22, 22, 22, 22, 22,
-	23, 23, 23, 23, 23, 23, 23, 23,
-	24, 24, 24, 24, 24, 24, 24, 24,
-	24, 24, 24, 24, 24, 24, 24, 24}
-
-// Up to 6 bits
-const maxLLCode = 35
-
-// llBitsTable translates from ll code to number of bits.
-var llBitsTable = [maxLLCode + 1]byte{
-	0, 0, 0, 0, 0, 0, 0, 0,
-	0, 0, 0, 0, 0, 0, 0, 0,
-	1, 1, 1, 1, 2, 2, 3, 3,
-	4, 6, 7, 8, 9, 10, 11, 12,
-	13, 14, 15, 16}
-
-// llCode returns the code that represents the literal length requested.
-func llCode(litLength uint32) uint8 {
-	const llDeltaCode = 19
-	if litLength <= 63 {
-		// Compiler insists on bounds check (Go 1.12)
-		return llCodeTable[litLength&63]
-	}
-	return uint8(highBit(litLength)) + llDeltaCode
-}
-
-var mlCodeTable = [128]byte{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
-	16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
-	32, 32, 33, 33, 34, 34, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37,
-	38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39,
-	40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
-	41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41,
-	42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42,
-	42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42}
-
-// Up to 6 bits
-const maxMLCode = 52
-
-// mlBitsTable translates from ml code to number of bits.
-var mlBitsTable = [maxMLCode + 1]byte{
-	0, 0, 0, 0, 0, 0, 0, 0,
-	0, 0, 0, 0, 0, 0, 0, 0,
-	0, 0, 0, 0, 0, 0, 0, 0,
-	0, 0, 0, 0, 0, 0, 0, 0,
-	1, 1, 1, 1, 2, 2, 3, 3,
-	4, 4, 5, 7, 8, 9, 10, 11,
-	12, 13, 14, 15, 16}
-
-// note : mlBase = matchLength - MINMATCH;
-// because it's the format it's stored in seqStore->sequences
-func mlCode(mlBase uint32) uint8 {
-	const mlDeltaCode = 36
-	if mlBase <= 127 {
-		// Compiler insists on bounds check (Go 1.12)
-		return mlCodeTable[mlBase&127]
-	}
-	return uint8(highBit(mlBase)) + mlDeltaCode
-}
-
-func ofCode(offset uint32) uint8 {
-	// A valid offset will always be > 0.
-	return uint8(bits.Len32(offset) - 1)
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/snappy.go b/vendor/github.com/klauspost/compress/zstd/snappy.go
deleted file mode 100644
index ec13594e8..000000000
--- a/vendor/github.com/klauspost/compress/zstd/snappy.go
+++ /dev/null
@@ -1,434 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-// Based on work by Yann Collet, released under BSD License.
-
-package zstd
-
-import (
-	"encoding/binary"
-	"errors"
-	"hash/crc32"
-	"io"
-
-	"github.com/klauspost/compress/huff0"
-	snappy "github.com/klauspost/compress/internal/snapref"
-)
-
-const (
-	snappyTagLiteral = 0x00
-	snappyTagCopy1   = 0x01
-	snappyTagCopy2   = 0x02
-	snappyTagCopy4   = 0x03
-)
-
-const (
-	snappyChecksumSize = 4
-	snappyMagicBody    = "sNaPpY"
-
-	// snappyMaxBlockSize is the maximum size of the input to encodeBlock. It is not
-	// part of the wire format per se, but some parts of the encoder assume
-	// that an offset fits into a uint16.
-	//
-	// Also, for the framing format (Writer type instead of Encode function),
-	// https://github.com/google/snappy/blob/master/framing_format.txt says
-	// that "the uncompressed data in a chunk must be no longer than 65536
-	// bytes".
-	snappyMaxBlockSize = 65536
-
-	// snappyMaxEncodedLenOfMaxBlockSize equals MaxEncodedLen(snappyMaxBlockSize), but is
-	// hard coded to be a const instead of a variable, so that obufLen can also
-	// be a const. Their equivalence is confirmed by
-	// TestMaxEncodedLenOfMaxBlockSize.
-	snappyMaxEncodedLenOfMaxBlockSize = 76490
-)
-
-const (
-	chunkTypeCompressedData   = 0x00
-	chunkTypeUncompressedData = 0x01
-	chunkTypePadding          = 0xfe
-	chunkTypeStreamIdentifier = 0xff
-)
-
-var (
-	// ErrSnappyCorrupt reports that the input is invalid.
-	ErrSnappyCorrupt = errors.New("snappy: corrupt input")
-	// ErrSnappyTooLarge reports that the uncompressed length is too large.
-	ErrSnappyTooLarge = errors.New("snappy: decoded block is too large")
-	// ErrSnappyUnsupported reports that the input isn't supported.
-	ErrSnappyUnsupported = errors.New("snappy: unsupported input")
-
-	errUnsupportedLiteralLength = errors.New("snappy: unsupported literal length")
-)
-
-// SnappyConverter can read SnappyConverter-compressed streams and convert them to zstd.
-// Conversion is done by converting the stream directly from Snappy without intermediate
-// full decoding.
-// Therefore the compression ratio is much less than what can be done by a full decompression
-// and compression, and a faulty Snappy stream may lead to a faulty Zstandard stream without
-// any errors being generated.
-// No CRC value is being generated and not all CRC values of the Snappy stream are checked.
-// However, it provides really fast recompression of Snappy streams.
-// The converter can be reused to avoid allocations, even after errors.
-type SnappyConverter struct {
-	r     io.Reader
-	err   error
-	buf   []byte
-	block *blockEnc
-}
-
-// Convert the Snappy stream supplied in 'in' and write the zStandard stream to 'w'.
-// If any error is detected on the Snappy stream it is returned.
-// The number of bytes written is returned.
-func (r *SnappyConverter) Convert(in io.Reader, w io.Writer) (int64, error) {
-	initPredefined()
-	r.err = nil
-	r.r = in
-	if r.block == nil {
-		r.block = &blockEnc{}
-		r.block.init()
-	}
-	r.block.initNewEncode()
-	if len(r.buf) != snappyMaxEncodedLenOfMaxBlockSize+snappyChecksumSize {
-		r.buf = make([]byte, snappyMaxEncodedLenOfMaxBlockSize+snappyChecksumSize)
-	}
-	r.block.litEnc.Reuse = huff0.ReusePolicyNone
-	var written int64
-	var readHeader bool
-	{
-		header := frameHeader{WindowSize: snappyMaxBlockSize}.appendTo(r.buf[:0])
-
-		var n int
-		n, r.err = w.Write(header)
-		if r.err != nil {
-			return written, r.err
-		}
-		written += int64(n)
-	}
-
-	for {
-		if !r.readFull(r.buf[:4], true) {
-			// Add empty last block
-			r.block.reset(nil)
-			r.block.last = true
-			err := r.block.encodeLits(r.block.literals, false)
-			if err != nil {
-				return written, err
-			}
-			n, err := w.Write(r.block.output)
-			if err != nil {
-				return written, err
-			}
-			written += int64(n)
-
-			return written, r.err
-		}
-		chunkType := r.buf[0]
-		if !readHeader {
-			if chunkType != chunkTypeStreamIdentifier {
-				println("chunkType != chunkTypeStreamIdentifier", chunkType)
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			readHeader = true
-		}
-		chunkLen := int(r.buf[1]) | int(r.buf[2])<<8 | int(r.buf[3])<<16
-		if chunkLen > len(r.buf) {
-			println("chunkLen > len(r.buf)", chunkType)
-			r.err = ErrSnappyUnsupported
-			return written, r.err
-		}
-
-		// The chunk types are specified at
-		// https://github.com/google/snappy/blob/master/framing_format.txt
-		switch chunkType {
-		case chunkTypeCompressedData:
-			// Section 4.2. Compressed data (chunk type 0x00).
-			if chunkLen < snappyChecksumSize {
-				println("chunkLen < snappyChecksumSize", chunkLen, snappyChecksumSize)
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			buf := r.buf[:chunkLen]
-			if !r.readFull(buf, false) {
-				return written, r.err
-			}
-			//checksum := uint32(buf[0]) | uint32(buf[1])<<8 | uint32(buf[2])<<16 | uint32(buf[3])<<24
-			buf = buf[snappyChecksumSize:]
-
-			n, hdr, err := snappyDecodedLen(buf)
-			if err != nil {
-				r.err = err
-				return written, r.err
-			}
-			buf = buf[hdr:]
-			if n > snappyMaxBlockSize {
-				println("n > snappyMaxBlockSize", n, snappyMaxBlockSize)
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			r.block.reset(nil)
-			r.block.pushOffsets()
-			if err := decodeSnappy(r.block, buf); err != nil {
-				r.err = err
-				return written, r.err
-			}
-			if r.block.size+r.block.extraLits != n {
-				printf("invalid size, want %d, got %d\n", n, r.block.size+r.block.extraLits)
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			err = r.block.encode(nil, false, false)
-			switch err {
-			case errIncompressible:
-				r.block.popOffsets()
-				r.block.reset(nil)
-				r.block.literals, err = snappy.Decode(r.block.literals[:n], r.buf[snappyChecksumSize:chunkLen])
-				if err != nil {
-					return written, err
-				}
-				err = r.block.encodeLits(r.block.literals, false)
-				if err != nil {
-					return written, err
-				}
-			case nil:
-			default:
-				return written, err
-			}
-
-			n, r.err = w.Write(r.block.output)
-			if r.err != nil {
-				return written, err
-			}
-			written += int64(n)
-			continue
-		case chunkTypeUncompressedData:
-			if debugEncoder {
-				println("Uncompressed, chunklen", chunkLen)
-			}
-			// Section 4.3. Uncompressed data (chunk type 0x01).
-			if chunkLen < snappyChecksumSize {
-				println("chunkLen < snappyChecksumSize", chunkLen, snappyChecksumSize)
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			r.block.reset(nil)
-			buf := r.buf[:snappyChecksumSize]
-			if !r.readFull(buf, false) {
-				return written, r.err
-			}
-			checksum := uint32(buf[0]) | uint32(buf[1])<<8 | uint32(buf[2])<<16 | uint32(buf[3])<<24
-			// Read directly into r.decoded instead of via r.buf.
-			n := chunkLen - snappyChecksumSize
-			if n > snappyMaxBlockSize {
-				println("n > snappyMaxBlockSize", n, snappyMaxBlockSize)
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			r.block.literals = r.block.literals[:n]
-			if !r.readFull(r.block.literals, false) {
-				return written, r.err
-			}
-			if snappyCRC(r.block.literals) != checksum {
-				println("literals crc mismatch")
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			err := r.block.encodeLits(r.block.literals, false)
-			if err != nil {
-				return written, err
-			}
-			n, r.err = w.Write(r.block.output)
-			if r.err != nil {
-				return written, err
-			}
-			written += int64(n)
-			continue
-
-		case chunkTypeStreamIdentifier:
-			if debugEncoder {
-				println("stream id", chunkLen, len(snappyMagicBody))
-			}
-			// Section 4.1. Stream identifier (chunk type 0xff).
-			if chunkLen != len(snappyMagicBody) {
-				println("chunkLen != len(snappyMagicBody)", chunkLen, len(snappyMagicBody))
-				r.err = ErrSnappyCorrupt
-				return written, r.err
-			}
-			if !r.readFull(r.buf[:len(snappyMagicBody)], false) {
-				return written, r.err
-			}
-			for i := 0; i < len(snappyMagicBody); i++ {
-				if r.buf[i] != snappyMagicBody[i] {
-					println("r.buf[i] != snappyMagicBody[i]", r.buf[i], snappyMagicBody[i], i)
-					r.err = ErrSnappyCorrupt
-					return written, r.err
-				}
-			}
-			continue
-		}
-
-		if chunkType <= 0x7f {
-			// Section 4.5. Reserved unskippable chunks (chunk types 0x02-0x7f).
-			println("chunkType <= 0x7f")
-			r.err = ErrSnappyUnsupported
-			return written, r.err
-		}
-		// Section 4.4 Padding (chunk type 0xfe).
-		// Section 4.6. Reserved skippable chunks (chunk types 0x80-0xfd).
-		if !r.readFull(r.buf[:chunkLen], false) {
-			return written, r.err
-		}
-	}
-}
-
-// decodeSnappy writes the decoding of src to dst. It assumes that the varint-encoded
-// length of the decompressed bytes has already been read.
-func decodeSnappy(blk *blockEnc, src []byte) error {
-	//decodeRef(make([]byte, snappyMaxBlockSize), src)
-	var s, length int
-	lits := blk.extraLits
-	var offset uint32
-	for s < len(src) {
-		switch src[s] & 0x03 {
-		case snappyTagLiteral:
-			x := uint32(src[s] >> 2)
-			switch {
-			case x < 60:
-				s++
-			case x == 60:
-				s += 2
-				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
-					println("uint(s) > uint(len(src)", s, src)
-					return ErrSnappyCorrupt
-				}
-				x = uint32(src[s-1])
-			case x == 61:
-				s += 3
-				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
-					println("uint(s) > uint(len(src)", s, src)
-					return ErrSnappyCorrupt
-				}
-				x = uint32(src[s-2]) | uint32(src[s-1])<<8
-			case x == 62:
-				s += 4
-				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
-					println("uint(s) > uint(len(src)", s, src)
-					return ErrSnappyCorrupt
-				}
-				x = uint32(src[s-3]) | uint32(src[s-2])<<8 | uint32(src[s-1])<<16
-			case x == 63:
-				s += 5
-				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
-					println("uint(s) > uint(len(src)", s, src)
-					return ErrSnappyCorrupt
-				}
-				x = uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24
-			}
-			if x > snappyMaxBlockSize {
-				println("x > snappyMaxBlockSize", x, snappyMaxBlockSize)
-				return ErrSnappyCorrupt
-			}
-			length = int(x) + 1
-			if length <= 0 {
-				println("length <= 0 ", length)
-
-				return errUnsupportedLiteralLength
-			}
-			//if length > snappyMaxBlockSize-d || uint32(length) > len(src)-s {
-			//	return ErrSnappyCorrupt
-			//}
-
-			blk.literals = append(blk.literals, src[s:s+length]...)
-			//println(length, "litLen")
-			lits += length
-			s += length
-			continue
-
-		case snappyTagCopy1:
-			s += 2
-			if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
-				println("uint(s) > uint(len(src)", s, len(src))
-				return ErrSnappyCorrupt
-			}
-			length = 4 + int(src[s-2])>>2&0x7
-			offset = uint32(src[s-2])&0xe0<<3 | uint32(src[s-1])
-
-		case snappyTagCopy2:
-			s += 3
-			if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
-				println("uint(s) > uint(len(src)", s, len(src))
-				return ErrSnappyCorrupt
-			}
-			length = 1 + int(src[s-3])>>2
-			offset = uint32(src[s-2]) | uint32(src[s-1])<<8
-
-		case snappyTagCopy4:
-			s += 5
-			if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
-				println("uint(s) > uint(len(src)", s, len(src))
-				return ErrSnappyCorrupt
-			}
-			length = 1 + int(src[s-5])>>2
-			offset = uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24
-		}
-
-		if offset <= 0 || blk.size+lits < int(offset) /*|| length > len(blk)-d */ {
-			println("offset <= 0 || blk.size+lits < int(offset)", offset, blk.size+lits, int(offset), blk.size, lits)
-
-			return ErrSnappyCorrupt
-		}
-
-		// Check if offset is one of the recent offsets.
-		// Adjusts the output offset accordingly.
-		// Gives a tiny bit of compression, typically around 1%.
-		if false {
-			offset = blk.matchOffset(offset, uint32(lits))
-		} else {
-			offset += 3
-		}
-
-		blk.sequences = append(blk.sequences, seq{
-			litLen:   uint32(lits),
-			offset:   offset,
-			matchLen: uint32(length) - zstdMinMatch,
-		})
-		blk.size += length + lits
-		lits = 0
-	}
-	blk.extraLits = lits
-	return nil
-}
-
-func (r *SnappyConverter) readFull(p []byte, allowEOF bool) (ok bool) {
-	if _, r.err = io.ReadFull(r.r, p); r.err != nil {
-		if r.err == io.ErrUnexpectedEOF || (r.err == io.EOF && !allowEOF) {
-			r.err = ErrSnappyCorrupt
-		}
-		return false
-	}
-	return true
-}
-
-var crcTable = crc32.MakeTable(crc32.Castagnoli)
-
-// crc implements the checksum specified in section 3 of
-// https://github.com/google/snappy/blob/master/framing_format.txt
-func snappyCRC(b []byte) uint32 {
-	c := crc32.Update(0, crcTable, b)
-	return c>>15 | c<<17 + 0xa282ead8
-}
-
-// snappyDecodedLen returns the length of the decoded block and the number of bytes
-// that the length header occupied.
-func snappyDecodedLen(src []byte) (blockLen, headerLen int, err error) {
-	v, n := binary.Uvarint(src)
-	if n <= 0 || v > 0xffffffff {
-		return 0, 0, ErrSnappyCorrupt
-	}
-
-	const wordSize = 32 << (^uint(0) >> 32 & 1)
-	if wordSize == 32 && v > 0x7fffffff {
-		return 0, 0, ErrSnappyTooLarge
-	}
-	return int(v), n, nil
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/zip.go b/vendor/github.com/klauspost/compress/zstd/zip.go
deleted file mode 100644
index 29c15c8c4..000000000
--- a/vendor/github.com/klauspost/compress/zstd/zip.go
+++ /dev/null
@@ -1,141 +0,0 @@
-// Copyright 2019+ Klaus Post. All rights reserved.
-// License information can be found in the LICENSE file.
-
-package zstd
-
-import (
-	"errors"
-	"io"
-	"sync"
-)
-
-// ZipMethodWinZip is the method for Zstandard compressed data inside Zip files for WinZip.
-// See https://www.winzip.com/win/en/comp_info.html
-const ZipMethodWinZip = 93
-
-// ZipMethodPKWare is the original method number used by PKWARE to indicate Zstandard compression.
-// Deprecated: This has been deprecated by PKWARE, use ZipMethodWinZip instead for compression.
-// See https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.9.TXT
-const ZipMethodPKWare = 20
-
-// zipReaderPool is the default reader pool.
-var zipReaderPool = sync.Pool{New: func() interface{} {
-	z, err := NewReader(nil, WithDecoderLowmem(true), WithDecoderMaxWindow(128<<20), WithDecoderConcurrency(1))
-	if err != nil {
-		panic(err)
-	}
-	return z
-}}
-
-// newZipReader creates a pooled zip decompressor.
-func newZipReader(opts ...DOption) func(r io.Reader) io.ReadCloser {
-	pool := &zipReaderPool
-	if len(opts) > 0 {
-		opts = append([]DOption{WithDecoderLowmem(true), WithDecoderMaxWindow(128 << 20)}, opts...)
-		// Force concurrency 1
-		opts = append(opts, WithDecoderConcurrency(1))
-		// Create our own pool
-		pool = &sync.Pool{}
-	}
-	return func(r io.Reader) io.ReadCloser {
-		dec, ok := pool.Get().(*Decoder)
-		if ok {
-			dec.Reset(r)
-		} else {
-			d, err := NewReader(r, opts...)
-			if err != nil {
-				panic(err)
-			}
-			dec = d
-		}
-		return &pooledZipReader{dec: dec, pool: pool}
-	}
-}
-
-type pooledZipReader struct {
-	mu   sync.Mutex // guards Close and Read
-	pool *sync.Pool
-	dec  *Decoder
-}
-
-func (r *pooledZipReader) Read(p []byte) (n int, err error) {
-	r.mu.Lock()
-	defer r.mu.Unlock()
-	if r.dec == nil {
-		return 0, errors.New("read after close or EOF")
-	}
-	dec, err := r.dec.Read(p)
-	if err == io.EOF {
-		r.dec.Reset(nil)
-		r.pool.Put(r.dec)
-		r.dec = nil
-	}
-	return dec, err
-}
-
-func (r *pooledZipReader) Close() error {
-	r.mu.Lock()
-	defer r.mu.Unlock()
-	var err error
-	if r.dec != nil {
-		err = r.dec.Reset(nil)
-		r.pool.Put(r.dec)
-		r.dec = nil
-	}
-	return err
-}
-
-type pooledZipWriter struct {
-	mu   sync.Mutex // guards Close and Read
-	enc  *Encoder
-	pool *sync.Pool
-}
-
-func (w *pooledZipWriter) Write(p []byte) (n int, err error) {
-	w.mu.Lock()
-	defer w.mu.Unlock()
-	if w.enc == nil {
-		return 0, errors.New("Write after Close")
-	}
-	return w.enc.Write(p)
-}
-
-func (w *pooledZipWriter) Close() error {
-	w.mu.Lock()
-	defer w.mu.Unlock()
-	var err error
-	if w.enc != nil {
-		err = w.enc.Close()
-		w.pool.Put(w.enc)
-		w.enc = nil
-	}
-	return err
-}
-
-// ZipCompressor returns a compressor that can be registered with zip libraries.
-// The provided encoder options will be used on all encodes.
-func ZipCompressor(opts ...EOption) func(w io.Writer) (io.WriteCloser, error) {
-	var pool sync.Pool
-	return func(w io.Writer) (io.WriteCloser, error) {
-		enc, ok := pool.Get().(*Encoder)
-		if ok {
-			enc.Reset(w)
-		} else {
-			var err error
-			enc, err = NewWriter(w, opts...)
-			if err != nil {
-				return nil, err
-			}
-		}
-		return &pooledZipWriter{enc: enc, pool: &pool}, nil
-	}
-}
-
-// ZipDecompressor returns a decompressor that can be registered with zip libraries.
-// See ZipCompressor for example.
-// Options can be specified. WithDecoderConcurrency(1) is forced,
-// and by default a 128MB maximum decompression window is specified.
-// The window size can be overridden if required.
-func ZipDecompressor(opts ...DOption) func(r io.Reader) io.ReadCloser {
-	return newZipReader(opts...)
-}
diff --git a/vendor/github.com/klauspost/compress/zstd/zstd.go b/vendor/github.com/klauspost/compress/zstd/zstd.go
deleted file mode 100644
index 066bef2a4..000000000
--- a/vendor/github.com/klauspost/compress/zstd/zstd.go
+++ /dev/null
@@ -1,125 +0,0 @@
-// Package zstd provides decompression of zstandard files.
-//
-// For advanced usage and examples, go to the README: https://github.com/klauspost/compress/tree/master/zstd#zstd
-package zstd
-
-import (
-	"bytes"
-	"encoding/binary"
-	"errors"
-	"log"
-	"math"
-)
-
-// enable debug printing
-const debug = false
-
-// enable encoding debug printing
-const debugEncoder = debug
-
-// enable decoding debug printing
-const debugDecoder = debug
-
-// Enable extra assertions.
-const debugAsserts = debug || false
-
-// print sequence details
-const debugSequences = false
-
-// print detailed matching information
-const debugMatches = false
-
-// force encoder to use predefined tables.
-const forcePreDef = false
-
-// zstdMinMatch is the minimum zstd match length.
-const zstdMinMatch = 3
-
-// fcsUnknown is used for unknown frame content size.
-const fcsUnknown = math.MaxUint64
-
-var (
-	// ErrReservedBlockType is returned when a reserved block type is found.
-	// Typically this indicates wrong or corrupted input.
-	ErrReservedBlockType = errors.New("invalid input: reserved block type encountered")
-
-	// ErrCompressedSizeTooBig is returned when a block is bigger than allowed.
-	// Typically this indicates wrong or corrupted input.
-	ErrCompressedSizeTooBig = errors.New("invalid input: compressed size too big")
-
-	// ErrBlockTooSmall is returned when a block is too small to be decoded.
-	// Typically returned on invalid input.
-	ErrBlockTooSmall = errors.New("block too small")
-
-	// ErrUnexpectedBlockSize is returned when a block has unexpected size.
-	// Typically returned on invalid input.
-	ErrUnexpectedBlockSize = errors.New("unexpected block size")
-
-	// ErrMagicMismatch is returned when a "magic" number isn't what is expected.
-	// Typically this indicates wrong or corrupted input.
-	ErrMagicMismatch = errors.New("invalid input: magic number mismatch")
-
-	// ErrWindowSizeExceeded is returned when a reference exceeds the valid window size.
-	// Typically this indicates wrong or corrupted input.
-	ErrWindowSizeExceeded = errors.New("window size exceeded")
-
-	// ErrWindowSizeTooSmall is returned when no window size is specified.
-	// Typically this indicates wrong or corrupted input.
-	ErrWindowSizeTooSmall = errors.New("invalid input: window size was too small")
-
-	// ErrDecoderSizeExceeded is returned if decompressed size exceeds the configured limit.
-	ErrDecoderSizeExceeded = errors.New("decompressed size exceeds configured limit")
-
-	// ErrUnknownDictionary is returned if the dictionary ID is unknown.
-	ErrUnknownDictionary = errors.New("unknown dictionary")
-
-	// ErrFrameSizeExceeded is returned if the stated frame size is exceeded.
-	// This is only returned if SingleSegment is specified on the frame.
-	ErrFrameSizeExceeded = errors.New("frame size exceeded")
-
-	// ErrFrameSizeMismatch is returned if the stated frame size does not match the expected size.
-	// This is only returned if SingleSegment is specified on the frame.
-	ErrFrameSizeMismatch = errors.New("frame size does not match size on stream")
-
-	// ErrCRCMismatch is returned if CRC mismatches.
-	ErrCRCMismatch = errors.New("CRC check failed")
-
-	// ErrDecoderClosed will be returned if the Decoder was used after
-	// Close has been called.
-	ErrDecoderClosed = errors.New("decoder used after Close")
-
-	// ErrEncoderClosed will be returned if the Encoder was used after
-	// Close has been called.
-	ErrEncoderClosed = errors.New("encoder used after Close")
-
-	// ErrDecoderNilInput is returned when a nil Reader was provided
-	// and an operation other than Reset/DecodeAll/Close was attempted.
-	ErrDecoderNilInput = errors.New("nil input provided as reader")
-)
-
-func println(a ...interface{}) {
-	if debug || debugDecoder || debugEncoder {
-		log.Println(a...)
-	}
-}
-
-func printf(format string, a ...interface{}) {
-	if debug || debugDecoder || debugEncoder {
-		log.Printf(format, a...)
-	}
-}
-
-func load3232(b []byte, i int32) uint32 {
-	return binary.LittleEndian.Uint32(b[:len(b):len(b)][i:])
-}
-
-func load6432(b []byte, i int32) uint64 {
-	return binary.LittleEndian.Uint64(b[:len(b):len(b)][i:])
-}
-
-type byter interface {
-	Bytes() []byte
-	Len() int
-}
-
-var _ byter = &bytes.Buffer{}