| //! A `Source` for registry-based packages. |
| //! |
| //! # What's a Registry? |
| //! |
| //! [Registries] are central locations where packages can be uploaded to, |
| //! discovered, and searched for. The purpose of a registry is to have a |
| //! location that serves as permanent storage for versions of a crate over time. |
| //! |
| //! Compared to git sources (see [`GitSource`]), a registry provides many |
| //! packages as well as many versions simultaneously. Git sources can also |
| //! have commits deleted through rebasings where registries cannot have their |
| //! versions deleted. |
| //! |
| //! In Cargo, [`RegistryData`] is an abstraction over each kind of actual |
| //! registry, and [`RegistrySource`] connects those implementations to |
| //! [`Source`] trait. Two prominent features these abstractions provide are |
| //! |
| //! * A way to query the metadata of a package from a registry. The metadata |
| //! comes from the index. |
| //! * A way to download package contents (a.k.a source files) that are required |
| //! when building the package itself. |
| //! |
| //! We'll cover each functionality later. |
| //! |
| //! [Registries]: https://doc.rust-lang.org/nightly/cargo/reference/registries.html |
| //! [`GitSource`]: super::GitSource |
| //! |
| //! # Different Kinds of Registries |
| //! |
| //! Cargo provides multiple kinds of registries. Each of them serves the index |
| //! and package contents in a slightly different way. Namely, |
| //! |
| //! * [`LocalRegistry`] --- Serves the index and package contents entirely on |
| //! a local filesystem. |
| //! * [`RemoteRegistry`] --- Serves the index ahead of time from a Git |
| //! repository, and package contents are downloaded as needed. |
| //! * [`HttpRegistry`] --- Serves both the index and package contents on demand |
| //! over a HTTP-based registry API. This is the default starting from 1.70.0. |
| //! |
| //! Each registry has its own [`RegistryData`] implementation, and can be |
| //! created from either [`RegistrySource::local`] or [`RegistrySource::remote`]. |
| //! |
| //! [`LocalRegistry`]: local::LocalRegistry |
| //! [`RemoteRegistry`]: remote::RemoteRegistry |
| //! [`HttpRegistry`]: http_remote::HttpRegistry |
| //! |
| //! # The Index of a Registry |
| //! |
| //! One of the major difficulties with a registry is that hosting so many |
| //! packages may quickly run into performance problems when dealing with |
| //! dependency graphs. It's infeasible for cargo to download the entire contents |
| //! of the registry just to resolve one package's dependencies, for example. As |
| //! a result, cargo needs some efficient method of querying what packages are |
| //! available on a registry, what versions are available, and what the |
| //! dependencies for each version is. |
| //! |
| //! To solve the problem, a registry must provide an index of package metadata. |
| //! The index of a registry is essentially an easily query-able version of the |
| //! registry's database for a list of versions of a package as well as a list |
| //! of dependencies for each version. The exact format of the index is |
| //! described later. |
| //! |
| //! See the [`index`] module for topics about the management, parsing, caching, |
| //! and versioning for the on-disk index. |
| //! |
| //! ## The Format of The Index |
| //! |
| //! The index is a store for the list of versions for all packages known, so its |
| //! format on disk is optimized slightly to ensure that `ls registry` doesn't |
| //! produce a list of all packages ever known. The index also wants to ensure |
| //! that there's not a million files which may actually end up hitting |
| //! filesystem limits at some point. To this end, a few decisions were made |
| //! about the format of the registry: |
| //! |
| //! 1. Each crate will have one file corresponding to it. Each version for a |
| //! crate will just be a line in this file (see [`IndexPackage`] for its |
| //! representation). |
| //! 2. There will be two tiers of directories for crate names, under which |
| //! crates corresponding to those tiers will be located. |
| //! (See [`cargo_util::registry::make_dep_path`] for the implementation of |
| //! this layout hierarchy.) |
| //! |
| //! As an example, this is an example hierarchy of an index: |
| //! |
| //! ```notrust |
| //! . |
| //! ├── 3 |
| //! │ └── u |
| //! │ └── url |
| //! ├── bz |
| //! │ └── ip |
| //! │ └── bzip2 |
| //! ├── config.json |
| //! ├── en |
| //! │ └── co |
| //! │ └── encoding |
| //! └── li |
| //! ├── bg |
| //! │ └── libgit2 |
| //! └── nk |
| //! └── link-config |
| //! ``` |
| //! |
| //! The root of the index contains a `config.json` file with a few entries |
| //! corresponding to the registry (see [`RegistryConfig`] below). |
| //! |
| //! Otherwise, there are three numbered directories (1, 2, 3) for crates with |
| //! names 1, 2, and 3 characters in length. The 1/2 directories simply have the |
| //! crate files underneath them, while the 3 directory is sharded by the first |
| //! letter of the crate name. |
| //! |
| //! Otherwise the top-level directory contains many two-letter directory names, |
| //! each of which has many sub-folders with two letters. At the end of all these |
| //! are the actual crate files themselves. |
| //! |
| //! The purpose of this layout is to hopefully cut down on `ls` sizes as well as |
| //! efficient lookup based on the crate name itself. |
| //! |
| //! See [The Cargo Book: Registry Index][registry-index] for the public |
| //! interface on the index format. |
| //! |
| //! [registry-index]: https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html |
| //! |
| //! ## The Index Files |
| //! |
| //! Each file in the index is the history of one crate over time. Each line in |
| //! the file corresponds to one version of a crate, stored in JSON format (see |
| //! the [`IndexPackage`] structure). |
| //! |
| //! As new versions are published, new lines are appended to this file. **The |
| //! only modifications to this file that should happen over time are yanks of a |
| //! particular version.** |
| //! |
| //! # Downloading Packages |
| //! |
| //! The purpose of the index was to provide an efficient method to resolve the |
| //! dependency graph for a package. After resolution has been performed, we need |
| //! to download the contents of packages so we can read the full manifest and |
| //! build the source code. |
| //! |
| //! To accomplish this, [`RegistryData::download`] will "make" an HTTP request |
| //! per-package requested to download tarballs into a local cache. These |
| //! tarballs will then be unpacked into a destination folder. |
| //! |
| //! Note that because versions uploaded to the registry are frozen forever that |
| //! the HTTP download and unpacking can all be skipped if the version has |
| //! already been downloaded and unpacked. This caching allows us to only |
| //! download a package when absolutely necessary. |
| //! |
| //! # Filesystem Hierarchy |
| //! |
| //! Overall, the `$HOME/.cargo` looks like this when talking about the registry |
| //! (remote registries, specifically): |
| //! |
| //! ```notrust |
| //! # A folder under which all registry metadata is hosted (similar to |
| //! # $HOME/.cargo/git) |
| //! $HOME/.cargo/registry/ |
| //! |
| //! # For each registry that cargo knows about (keyed by hostname + hash) |
| //! # there is a folder which is the checked out version of the index for |
| //! # the registry in this location. Note that this is done so cargo can |
| //! # support multiple registries simultaneously |
| //! index/ |
| //! registry1-<hash>/ |
| //! registry2-<hash>/ |
| //! ... |
| //! |
| //! # This folder is a cache for all downloaded tarballs (`.crate` file) |
| //! # from a registry. Once downloaded and verified, a tarball never changes. |
| //! cache/ |
| //! registry1-<hash>/<pkg>-<version>.crate |
| //! ... |
| //! |
| //! # Location in which all tarballs are unpacked. Each tarball is known to |
| //! # be frozen after downloading, so transitively this folder is also |
| //! # frozen once its unpacked (it's never unpacked again) |
| //! # CAVEAT: They are not read-only. See rust-lang/cargo#9455. |
| //! src/ |
| //! registry1-<hash>/<pkg>-<version>/... |
| //! ... |
| //! ``` |
| //! |
| //! [`IndexPackage`]: index::IndexPackage |
| |
| use std::collections::HashSet; |
| use std::fs; |
| use std::fs::{File, OpenOptions}; |
| use std::io; |
| use std::io::Read; |
| use std::io::Write; |
| use std::path::{Path, PathBuf}; |
| use std::task::{ready, Poll}; |
| |
| use anyhow::Context as _; |
| use cargo_util::paths::{self, exclude_from_backups_and_indexing}; |
| use flate2::read::GzDecoder; |
| use serde::Deserialize; |
| use serde::Serialize; |
| use tar::Archive; |
| use tracing::debug; |
| |
| use crate::core::dependency::Dependency; |
| use crate::core::global_cache_tracker; |
| use crate::core::{Package, PackageId, SourceId}; |
| use crate::sources::source::MaybePackage; |
| use crate::sources::source::QueryKind; |
| use crate::sources::source::Source; |
| use crate::sources::PathSource; |
| use crate::util::cache_lock::CacheLockMode; |
| use crate::util::interning::InternedString; |
| use crate::util::network::PollExt; |
| use crate::util::{hex, VersionExt}; |
| use crate::util::{restricted_names, CargoResult, Filesystem, GlobalContext, LimitErrorReader}; |
| |
| /// The `.cargo-ok` file is used to track if the source is already unpacked. |
| /// See [`RegistrySource::unpack_package`] for more. |
| /// |
| /// Not to be confused with `.cargo-ok` file in git sources. |
| const PACKAGE_SOURCE_LOCK: &str = ".cargo-ok"; |
| |
| pub const CRATES_IO_INDEX: &str = "https://github.com/rust-lang/crates.io-index"; |
| pub const CRATES_IO_HTTP_INDEX: &str = "sparse+https://index.crates.io/"; |
| pub const CRATES_IO_REGISTRY: &str = "crates-io"; |
| pub const CRATES_IO_DOMAIN: &str = "crates.io"; |
| |
| /// The content inside `.cargo-ok`. |
| /// See [`RegistrySource::unpack_package`] for more. |
| #[derive(Deserialize, Serialize)] |
| #[serde(rename_all = "kebab-case")] |
| struct LockMetadata { |
| /// The version of `.cargo-ok` file |
| v: u32, |
| } |
| |
| /// A [`Source`] implementation for a local or a remote registry. |
| /// |
| /// This contains common functionality that is shared between each registry |
| /// kind, with the registry-specific logic implemented as part of the |
| /// [`RegistryData`] trait referenced via the `ops` field. |
| /// |
| /// For general concepts of registries, see the [module-level documentation](crate::sources::registry). |
| pub struct RegistrySource<'gctx> { |
| /// A unique name of the source (typically used as the directory name |
| /// where its cached content is stored). |
| name: InternedString, |
| /// The unique identifier of this source. |
| source_id: SourceId, |
| /// The path where crate files are extracted (`$CARGO_HOME/registry/src/$REG-HASH`). |
| src_path: Filesystem, |
| /// Local reference to [`GlobalContext`] for convenience. |
| gctx: &'gctx GlobalContext, |
| /// Abstraction for interfacing to the different registry kinds. |
| ops: Box<dyn RegistryData + 'gctx>, |
| /// Interface for managing the on-disk index. |
| index: index::RegistryIndex<'gctx>, |
| /// A set of packages that should be allowed to be used, even if they are |
| /// yanked. |
| /// |
| /// This is populated from the entries in `Cargo.lock` to ensure that |
| /// `cargo update somepkg` won't unlock yanked entries in `Cargo.lock`. |
| /// Otherwise, the resolver would think that those entries no longer |
| /// exist, and it would trigger updates to unrelated packages. |
| yanked_whitelist: HashSet<PackageId>, |
| /// Yanked versions that have already been selected during queries. |
| /// |
| /// As of this writing, this is for not emitting the `--precise <yanked>` |
| /// warning twice, with the assumption of (`dep.package_name()` + `--precise` |
| /// version) being sufficient to uniquely identify the same query result. |
| selected_precise_yanked: HashSet<(InternedString, semver::Version)>, |
| } |
| |
| /// The [`config.json`] file stored in the index. |
| /// |
| /// The config file may look like: |
| /// |
| /// ```json |
| /// { |
| /// "dl": "https://example.com/api/{crate}/{version}/download", |
| /// "api": "https://example.com/api", |
| /// "auth-required": false # unstable feature (RFC 3139) |
| /// } |
| /// ``` |
| /// |
| /// [`config.json`]: https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html#index-configuration |
| #[derive(Deserialize, Debug, Clone)] |
| #[serde(rename_all = "kebab-case")] |
| pub struct RegistryConfig { |
| /// Download endpoint for all crates. |
| /// |
| /// The string is a template which will generate the download URL for the |
| /// tarball of a specific version of a crate. The substrings `{crate}` and |
| /// `{version}` will be replaced with the crate's name and version |
| /// respectively. The substring `{prefix}` will be replaced with the |
| /// crate's prefix directory name, and the substring `{lowerprefix}` will |
| /// be replaced with the crate's prefix directory name converted to |
| /// lowercase. The substring `{sha256-checksum}` will be replaced with the |
| /// crate's sha256 checksum. |
| /// |
| /// For backwards compatibility, if the string does not contain any |
| /// markers (`{crate}`, `{version}`, `{prefix}`, or `{lowerprefix}`), it |
| /// will be extended with `/{crate}/{version}/download` to |
| /// support registries like crates.io which were created before the |
| /// templating setup was created. |
| /// |
| /// For more on the template of the download URL, see [Index Configuration]( |
| /// https://doc.rust-lang.org/nightly/cargo/reference/registry-index.html#index-configuration). |
| pub dl: String, |
| |
| /// API endpoint for the registry. This is what's actually hit to perform |
| /// operations like yanks, owner modifications, publish new crates, etc. |
| /// If this is None, the registry does not support API commands. |
| pub api: Option<String>, |
| |
| /// Whether all operations require authentication. See [RFC 3139]. |
| /// |
| /// [RFC 3139]: https://rust-lang.github.io/rfcs/3139-cargo-alternative-registry-auth.html |
| #[serde(default)] |
| pub auth_required: bool, |
| } |
| |
| /// Result from loading data from a registry. |
| pub enum LoadResponse { |
| /// The cache is valid. The cached data should be used. |
| CacheValid, |
| |
| /// The cache is out of date. Returned data should be used. |
| Data { |
| raw_data: Vec<u8>, |
| /// Version of this data to determine whether it is out of date. |
| index_version: Option<String>, |
| }, |
| |
| /// The requested crate was found. |
| NotFound, |
| } |
| |
| /// An abstract interface to handle both a local and remote registry. |
| /// |
| /// This allows [`RegistrySource`] to abstractly handle each registry kind. |
| /// |
| /// For general concepts of registries, see the [module-level documentation](crate::sources::registry). |
| pub trait RegistryData { |
| /// Performs initialization for the registry. |
| /// |
| /// This should be safe to call multiple times, the implementation is |
| /// expected to not do any work if it is already prepared. |
| fn prepare(&self) -> CargoResult<()>; |
| |
| /// Returns the path to the index. |
| /// |
| /// Note that different registries store the index in different formats |
| /// (remote = git, http & local = files). |
| fn index_path(&self) -> &Filesystem; |
| |
| /// Loads the JSON for a specific named package from the index. |
| /// |
| /// * `root` is the root path to the index. |
| /// * `path` is the relative path to the package to load (like `ca/rg/cargo`). |
| /// * `index_version` is the version of the requested crate data currently |
| /// in cache. This is useful for checking if a local cache is outdated. |
| fn load( |
| &mut self, |
| root: &Path, |
| path: &Path, |
| index_version: Option<&str>, |
| ) -> Poll<CargoResult<LoadResponse>>; |
| |
| /// Loads the `config.json` file and returns it. |
| /// |
| /// Local registries don't have a config, and return `None`. |
| fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>>; |
| |
| /// Invalidates locally cached data. |
| fn invalidate_cache(&mut self); |
| |
| /// If quiet, the source should not display any progress or status messages. |
| fn set_quiet(&mut self, quiet: bool); |
| |
| /// Is the local cached data up-to-date? |
| fn is_updated(&self) -> bool; |
| |
| /// Prepare to start downloading a `.crate` file. |
| /// |
| /// Despite the name, this doesn't actually download anything. If the |
| /// `.crate` is already downloaded, then it returns [`MaybeLock::Ready`]. |
| /// If it hasn't been downloaded, then it returns [`MaybeLock::Download`] |
| /// which contains the URL to download. The [`crate::core::package::Downloads`] |
| /// system handles the actual download process. After downloading, it |
| /// calls [`Self::finish_download`] to save the downloaded file. |
| /// |
| /// `checksum` is currently only used by local registries to verify the |
| /// file contents (because local registries never actually download |
| /// anything). Remote registries will validate the checksum in |
| /// `finish_download`. For already downloaded `.crate` files, it does not |
| /// validate the checksum, assuming the filesystem does not suffer from |
| /// corruption or manipulation. |
| fn download(&mut self, pkg: PackageId, checksum: &str) -> CargoResult<MaybeLock>; |
| |
| /// Finish a download by saving a `.crate` file to disk. |
| /// |
| /// After [`crate::core::package::Downloads`] has finished a download, |
| /// it will call this to save the `.crate` file. This is only relevant |
| /// for remote registries. This should validate the checksum and save |
| /// the given data to the on-disk cache. |
| /// |
| /// Returns a [`File`] handle to the `.crate` file, positioned at the start. |
| fn finish_download(&mut self, pkg: PackageId, checksum: &str, data: &[u8]) |
| -> CargoResult<File>; |
| |
| /// Returns whether or not the `.crate` file is already downloaded. |
| fn is_crate_downloaded(&self, _pkg: PackageId) -> bool { |
| true |
| } |
| |
| /// Validates that the global package cache lock is held. |
| /// |
| /// Given the [`Filesystem`], this will make sure that the package cache |
| /// lock is held. If not, it will panic. See |
| /// [`GlobalContext::acquire_package_cache_lock`] for acquiring the global lock. |
| /// |
| /// Returns the [`Path`] to the [`Filesystem`]. |
| fn assert_index_locked<'a>(&self, path: &'a Filesystem) -> &'a Path; |
| |
| /// Block until all outstanding Poll::Pending requests are Poll::Ready. |
| fn block_until_ready(&mut self) -> CargoResult<()>; |
| } |
| |
| /// The status of [`RegistryData::download`] which indicates if a `.crate` |
| /// file has already been downloaded, or if not then the URL to download. |
| pub enum MaybeLock { |
| /// The `.crate` file is already downloaded. [`File`] is a handle to the |
| /// opened `.crate` file on the filesystem. |
| Ready(File), |
| /// The `.crate` file is not downloaded, here's the URL to download it from. |
| /// |
| /// `descriptor` is just a text string to display to the user of what is |
| /// being downloaded. |
| Download { |
| url: String, |
| descriptor: String, |
| authorization: Option<String>, |
| }, |
| } |
| |
| mod download; |
| mod http_remote; |
| mod index; |
| pub use index::IndexSummary; |
| mod local; |
| mod remote; |
| |
| /// Generates a unique name for [`SourceId`] to have a unique path to put their |
| /// index files. |
| fn short_name(id: SourceId, is_shallow: bool) -> String { |
| // CAUTION: This should not change between versions. If you change how |
| // this is computed, it will orphan previously cached data, forcing the |
| // cache to be rebuilt and potentially wasting significant disk space. If |
| // you change it, be cautious of the impact. See `test_cratesio_hash` for |
| // a similar discussion. |
| let hash = hex::short_hash(&id); |
| let ident = id.url().host_str().unwrap_or("").to_string(); |
| let mut name = format!("{}-{}", ident, hash); |
| if is_shallow { |
| name.push_str("-shallow"); |
| } |
| name |
| } |
| |
| impl<'gctx> RegistrySource<'gctx> { |
| /// Creates a [`Source`] of a "remote" registry. |
| /// It could be either an HTTP-based [`http_remote::HttpRegistry`] or |
| /// a Git-based [`remote::RemoteRegistry`]. |
| /// |
| /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked. |
| pub fn remote( |
| source_id: SourceId, |
| yanked_whitelist: &HashSet<PackageId>, |
| gctx: &'gctx GlobalContext, |
| ) -> CargoResult<RegistrySource<'gctx>> { |
| assert!(source_id.is_remote_registry()); |
| let name = short_name( |
| source_id, |
| gctx.cli_unstable() |
| .git |
| .map_or(false, |features| features.shallow_index) |
| && !source_id.is_sparse(), |
| ); |
| let ops = if source_id.is_sparse() { |
| Box::new(http_remote::HttpRegistry::new(source_id, gctx, &name)?) as Box<_> |
| } else { |
| Box::new(remote::RemoteRegistry::new(source_id, gctx, &name)) as Box<_> |
| }; |
| |
| Ok(RegistrySource::new( |
| source_id, |
| gctx, |
| &name, |
| ops, |
| yanked_whitelist, |
| )) |
| } |
| |
| /// Creates a [`Source`] of a local registry, with [`local::LocalRegistry`] under the hood. |
| /// |
| /// * `path` --- The root path of a local registry on the file system. |
| /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked. |
| pub fn local( |
| source_id: SourceId, |
| path: &Path, |
| yanked_whitelist: &HashSet<PackageId>, |
| gctx: &'gctx GlobalContext, |
| ) -> RegistrySource<'gctx> { |
| let name = short_name(source_id, false); |
| let ops = local::LocalRegistry::new(path, gctx, &name); |
| RegistrySource::new(source_id, gctx, &name, Box::new(ops), yanked_whitelist) |
| } |
| |
| /// Creates a source of a registry. This is a inner helper function. |
| /// |
| /// * `name` --- Name of a path segment which may affect where `.crate` |
| /// tarballs, the registry index and cache are stored. Expect to be unique. |
| /// * `ops` --- The underlying [`RegistryData`] type. |
| /// * `yanked_whitelist` --- Packages allowed to be used, even if they are yanked. |
| fn new( |
| source_id: SourceId, |
| gctx: &'gctx GlobalContext, |
| name: &str, |
| ops: Box<dyn RegistryData + 'gctx>, |
| yanked_whitelist: &HashSet<PackageId>, |
| ) -> RegistrySource<'gctx> { |
| RegistrySource { |
| name: name.into(), |
| src_path: gctx.registry_source_path().join(name), |
| gctx, |
| source_id, |
| index: index::RegistryIndex::new(source_id, ops.index_path(), gctx), |
| yanked_whitelist: yanked_whitelist.clone(), |
| ops, |
| selected_precise_yanked: HashSet::new(), |
| } |
| } |
| |
| /// Decode the [configuration](RegistryConfig) stored within the registry. |
| /// |
| /// This requires that the index has been at least checked out. |
| pub fn config(&mut self) -> Poll<CargoResult<Option<RegistryConfig>>> { |
| self.ops.config() |
| } |
| |
| /// Unpacks a downloaded package into a location where it's ready to be |
| /// compiled. |
| /// |
| /// No action is taken if the source looks like it's already unpacked. |
| /// |
| /// # History of interruption detection with `.cargo-ok` file |
| /// |
| /// Cargo has always included a `.cargo-ok` file ([`PACKAGE_SOURCE_LOCK`]) |
| /// to detect if extraction was interrupted, but it was originally empty. |
| /// |
| /// In 1.34, Cargo was changed to create the `.cargo-ok` file before it |
| /// started extraction to implement fine-grained locking. After it was |
| /// finished extracting, it wrote two bytes to indicate it was complete. |
| /// It would use the length check to detect if it was possibly interrupted. |
| /// |
| /// In 1.36, Cargo changed to not use fine-grained locking, and instead used |
| /// a global lock. The use of `.cargo-ok` was no longer needed for locking |
| /// purposes, but was kept to detect when extraction was interrupted. |
| /// |
| /// In 1.49, Cargo changed to not create the `.cargo-ok` file before it |
| /// started extraction to deal with `.crate` files that inexplicably had |
| /// a `.cargo-ok` file in them. |
| /// |
| /// In 1.64, Cargo changed to detect `.crate` files with `.cargo-ok` files |
| /// in them in response to [CVE-2022-36113], which dealt with malicious |
| /// `.crate` files making `.cargo-ok` a symlink causing cargo to write "ok" |
| /// to any arbitrary file on the filesystem it has permission to. |
| /// |
| /// In 1.71, `.cargo-ok` changed to contain a JSON `{ v: 1 }` to indicate |
| /// the version of it. A failure of parsing will result in a heavy-hammer |
| /// approach that unpacks the `.crate` file again. This is in response to a |
| /// security issue that the unpacking didn't respect umask on Unix systems. |
| /// |
| /// This is all a long-winded way of explaining the circumstances that might |
| /// cause a directory to contain a `.cargo-ok` file that is empty or |
| /// otherwise corrupted. Either this was extracted by a version of Rust |
| /// before 1.34, in which case everything should be fine. However, an empty |
| /// file created by versions 1.36 to 1.49 indicates that the extraction was |
| /// interrupted and that we need to start again. |
| /// |
| /// Another possibility is that the filesystem is simply corrupted, in |
| /// which case deleting the directory might be the safe thing to do. That |
| /// is probably unlikely, though. |
| /// |
| /// To be safe, we deletes the directory and starts over again if an empty |
| /// `.cargo-ok` file is found. |
| /// |
| /// [CVE-2022-36113]: https://blog.rust-lang.org/2022/09/14/cargo-cves.html#arbitrary-file-corruption-cve-2022-36113 |
| fn unpack_package(&self, pkg: PackageId, tarball: &File) -> CargoResult<PathBuf> { |
| let package_dir = format!("{}-{}", pkg.name(), pkg.version()); |
| let dst = self.src_path.join(&package_dir); |
| let path = dst.join(PACKAGE_SOURCE_LOCK); |
| let path = self |
| .gctx |
| .assert_package_cache_locked(CacheLockMode::DownloadExclusive, &path); |
| let unpack_dir = path.parent().unwrap(); |
| match fs::read_to_string(path) { |
| Ok(ok) => match serde_json::from_str::<LockMetadata>(&ok) { |
| Ok(lock_meta) if lock_meta.v == 1 => { |
| self.gctx |
| .deferred_global_last_use()? |
| .mark_registry_src_used(global_cache_tracker::RegistrySrc { |
| encoded_registry_name: self.name, |
| package_dir: package_dir.into(), |
| size: None, |
| }); |
| return Ok(unpack_dir.to_path_buf()); |
| } |
| _ => { |
| if ok == "ok" { |
| tracing::debug!("old `ok` content found, clearing cache"); |
| } else { |
| tracing::warn!("unrecognized .cargo-ok content, clearing cache: {ok}"); |
| } |
| // See comment of `unpack_package` about why removing all stuff. |
| paths::remove_dir_all(dst.as_path_unlocked())?; |
| } |
| }, |
| Err(e) if e.kind() == io::ErrorKind::NotFound => {} |
| Err(e) => anyhow::bail!("unable to read .cargo-ok file at {path:?}: {e}"), |
| } |
| dst.create_dir()?; |
| let mut tar = { |
| let size_limit = max_unpack_size(self.gctx, tarball.metadata()?.len()); |
| let gz = GzDecoder::new(tarball); |
| let gz = LimitErrorReader::new(gz, size_limit); |
| let mut tar = Archive::new(gz); |
| set_mask(&mut tar); |
| tar |
| }; |
| let mut bytes_written = 0; |
| let prefix = unpack_dir.file_name().unwrap(); |
| let parent = unpack_dir.parent().unwrap(); |
| for entry in tar.entries()? { |
| let mut entry = entry.with_context(|| "failed to iterate over archive")?; |
| let entry_path = entry |
| .path() |
| .with_context(|| "failed to read entry path")? |
| .into_owned(); |
| |
| // We're going to unpack this tarball into the global source |
| // directory, but we want to make sure that it doesn't accidentally |
| // (or maliciously) overwrite source code from other crates. Cargo |
| // itself should never generate a tarball that hits this error, and |
| // crates.io should also block uploads with these sorts of tarballs, |
| // but be extra sure by adding a check here as well. |
| if !entry_path.starts_with(prefix) { |
| anyhow::bail!( |
| "invalid tarball downloaded, contains \ |
| a file at {:?} which isn't under {:?}", |
| entry_path, |
| prefix |
| ) |
| } |
| // Prevent unpacking the lockfile from the crate itself. |
| if entry_path |
| .file_name() |
| .map_or(false, |p| p == PACKAGE_SOURCE_LOCK) |
| { |
| continue; |
| } |
| // Unpacking failed |
| bytes_written += entry.size(); |
| let mut result = entry.unpack_in(parent).map_err(anyhow::Error::from); |
| if cfg!(windows) && restricted_names::is_windows_reserved_path(&entry_path) { |
| result = result.with_context(|| { |
| format!( |
| "`{}` appears to contain a reserved Windows path, \ |
| it cannot be extracted on Windows", |
| entry_path.display() |
| ) |
| }); |
| } |
| result |
| .with_context(|| format!("failed to unpack entry at `{}`", entry_path.display()))?; |
| } |
| |
| // Now that we've finished unpacking, create and write to the lock file to indicate that |
| // unpacking was successful. |
| let mut ok = OpenOptions::new() |
| .create_new(true) |
| .read(true) |
| .write(true) |
| .open(&path) |
| .with_context(|| format!("failed to open `{}`", path.display()))?; |
| |
| let lock_meta = LockMetadata { v: 1 }; |
| write!(ok, "{}", serde_json::to_string(&lock_meta).unwrap())?; |
| |
| self.gctx |
| .deferred_global_last_use()? |
| .mark_registry_src_used(global_cache_tracker::RegistrySrc { |
| encoded_registry_name: self.name, |
| package_dir: package_dir.into(), |
| size: Some(bytes_written), |
| }); |
| |
| Ok(unpack_dir.to_path_buf()) |
| } |
| |
| /// Turns the downloaded `.crate` tarball file into a [`Package`]. |
| /// |
| /// This unconditionally sets checksum for the returned package, so it |
| /// should only be called after doing integrity check. That is to say, |
| /// you need to call either [`RegistryData::download`] or |
| /// [`RegistryData::finish_download`] before calling this method. |
| fn get_pkg(&mut self, package: PackageId, path: &File) -> CargoResult<Package> { |
| let path = self |
| .unpack_package(package, path) |
| .with_context(|| format!("failed to unpack package `{}`", package))?; |
| let mut src = PathSource::new(&path, self.source_id, self.gctx); |
| src.load()?; |
| let mut pkg = match src.download(package)? { |
| MaybePackage::Ready(pkg) => pkg, |
| MaybePackage::Download { .. } => unreachable!(), |
| }; |
| |
| // After we've loaded the package configure its summary's `checksum` |
| // field with the checksum we know for this `PackageId`. |
| let cksum = self |
| .index |
| .hash(package, &mut *self.ops) |
| .expect("a downloaded dep now pending!?") |
| .expect("summary not found"); |
| pkg.manifest_mut() |
| .summary_mut() |
| .set_checksum(cksum.to_string()); |
| |
| Ok(pkg) |
| } |
| } |
| |
| impl<'gctx> Source for RegistrySource<'gctx> { |
| fn query( |
| &mut self, |
| dep: &Dependency, |
| kind: QueryKind, |
| f: &mut dyn FnMut(IndexSummary), |
| ) -> Poll<CargoResult<()>> { |
| let mut req = dep.version_req().clone(); |
| |
| // Handle `cargo update --precise` here. |
| if let Some((_, requested)) = self |
| .source_id |
| .precise_registry_version(dep.package_name().as_str()) |
| .filter(|(c, to)| { |
| if to.is_prerelease() && self.gctx.cli_unstable().unstable_options { |
| req.matches_prerelease(c) |
| } else { |
| req.matches(c) |
| } |
| }) |
| { |
| req.precise_to(&requested); |
| } |
| |
| let mut called = false; |
| let callback = &mut |s| { |
| called = true; |
| f(s); |
| }; |
| |
| // If this is a locked dependency, then it came from a lock file and in |
| // theory the registry is known to contain this version. If, however, we |
| // come back with no summaries, then our registry may need to be |
| // updated, so we fall back to performing a lazy update. |
| if kind == QueryKind::Exact && req.is_locked() && !self.ops.is_updated() { |
| debug!("attempting query without update"); |
| ready!(self |
| .index |
| .query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| { |
| if dep.matches(s.as_summary()) { |
| // We are looking for a package from a lock file so we do not care about yank |
| callback(s) |
| } |
| },))?; |
| if called { |
| Poll::Ready(Ok(())) |
| } else { |
| debug!("falling back to an update"); |
| self.invalidate_cache(); |
| Poll::Pending |
| } |
| } else { |
| let mut precise_yanked_in_use = false; |
| ready!(self |
| .index |
| .query_inner(dep.package_name(), &req, &mut *self.ops, &mut |s| { |
| let matched = match kind { |
| QueryKind::Exact => { |
| if req.is_precise() && self.gctx.cli_unstable().unstable_options { |
| dep.matches_prerelease(s.as_summary()) |
| } else { |
| dep.matches(s.as_summary()) |
| } |
| } |
| QueryKind::Alternatives => true, |
| QueryKind::Normalized => true, |
| }; |
| if !matched { |
| return; |
| } |
| // Next filter out all yanked packages. Some yanked packages may |
| // leak through if they're in a whitelist (aka if they were |
| // previously in `Cargo.lock` |
| if !s.is_yanked() { |
| callback(s); |
| } else if self.yanked_whitelist.contains(&s.package_id()) { |
| callback(s); |
| } else if req.is_precise() { |
| precise_yanked_in_use = true; |
| callback(s); |
| } |
| }))?; |
| if precise_yanked_in_use { |
| let name = dep.package_name(); |
| let version = req |
| .precise_version() |
| .expect("--precise <yanked-version> in use"); |
| if self.selected_precise_yanked.insert((name, version.clone())) { |
| let mut shell = self.gctx.shell(); |
| shell.warn(format_args!( |
| "selected package `{name}@{version}` was yanked by the author" |
| ))?; |
| shell.note("if possible, try a compatible non-yanked version")?; |
| } |
| } |
| if called { |
| return Poll::Ready(Ok(())); |
| } |
| let mut any_pending = false; |
| if kind == QueryKind::Alternatives || kind == QueryKind::Normalized { |
| // Attempt to handle misspellings by searching for a chain of related |
| // names to the original name. The resolver will later |
| // reject any candidates that have the wrong name, and with this it'll |
| // along the way produce helpful "did you mean?" suggestions. |
| // For now we only try the canonical lysing `-` to `_` and vice versa. |
| // More advanced fuzzy searching become in the future. |
| for name_permutation in [ |
| dep.package_name().replace('-', "_"), |
| dep.package_name().replace('_', "-"), |
| ] { |
| let name_permutation = InternedString::new(&name_permutation); |
| if name_permutation == dep.package_name() { |
| continue; |
| } |
| any_pending |= self |
| .index |
| .query_inner(name_permutation, &req, &mut *self.ops, f)? |
| .is_pending(); |
| } |
| } |
| if any_pending { |
| Poll::Pending |
| } else { |
| Poll::Ready(Ok(())) |
| } |
| } |
| } |
| |
| fn supports_checksums(&self) -> bool { |
| true |
| } |
| |
| fn requires_precise(&self) -> bool { |
| false |
| } |
| |
| fn source_id(&self) -> SourceId { |
| self.source_id |
| } |
| |
| fn invalidate_cache(&mut self) { |
| self.index.clear_summaries_cache(); |
| self.ops.invalidate_cache(); |
| } |
| |
| fn set_quiet(&mut self, quiet: bool) { |
| self.ops.set_quiet(quiet); |
| } |
| |
| fn download(&mut self, package: PackageId) -> CargoResult<MaybePackage> { |
| let hash = loop { |
| match self.index.hash(package, &mut *self.ops)? { |
| Poll::Pending => self.block_until_ready()?, |
| Poll::Ready(hash) => break hash, |
| } |
| }; |
| match self.ops.download(package, hash)? { |
| MaybeLock::Ready(file) => self.get_pkg(package, &file).map(MaybePackage::Ready), |
| MaybeLock::Download { |
| url, |
| descriptor, |
| authorization, |
| } => Ok(MaybePackage::Download { |
| url, |
| descriptor, |
| authorization, |
| }), |
| } |
| } |
| |
| fn finish_download(&mut self, package: PackageId, data: Vec<u8>) -> CargoResult<Package> { |
| let hash = loop { |
| match self.index.hash(package, &mut *self.ops)? { |
| Poll::Pending => self.block_until_ready()?, |
| Poll::Ready(hash) => break hash, |
| } |
| }; |
| let file = self.ops.finish_download(package, hash, &data)?; |
| self.get_pkg(package, &file) |
| } |
| |
| fn fingerprint(&self, pkg: &Package) -> CargoResult<String> { |
| Ok(pkg.package_id().version().to_string()) |
| } |
| |
| fn describe(&self) -> String { |
| self.source_id.display_index() |
| } |
| |
| fn add_to_yanked_whitelist(&mut self, pkgs: &[PackageId]) { |
| self.yanked_whitelist.extend(pkgs); |
| } |
| |
| fn is_yanked(&mut self, pkg: PackageId) -> Poll<CargoResult<bool>> { |
| self.index.is_yanked(pkg, &mut *self.ops) |
| } |
| |
| fn block_until_ready(&mut self) -> CargoResult<()> { |
| // Before starting to work on the registry, make sure that |
| // `<cargo_home>/registry` is marked as excluded from indexing and |
| // backups. Older versions of Cargo didn't do this, so we do it here |
| // regardless of whether `<cargo_home>` exists. |
| // |
| // This does not use `create_dir_all_excluded_from_backups_atomic` for |
| // the same reason: we want to exclude it even if the directory already |
| // exists. |
| // |
| // IO errors in creating and marking it are ignored, e.g. in case we're on a |
| // read-only filesystem. |
| let registry_base = self.gctx.registry_base_path(); |
| let _ = registry_base.create_dir(); |
| exclude_from_backups_and_indexing(®istry_base.into_path_unlocked()); |
| |
| self.ops.block_until_ready() |
| } |
| } |
| |
| impl RegistryConfig { |
| /// File name of [`RegistryConfig`]. |
| const NAME: &'static str = "config.json"; |
| } |
| |
| /// Get the maximum unpack size that Cargo permits |
| /// based on a given `size` of your compressed file. |
| /// |
| /// Returns the larger one between `size * max compression ratio` |
| /// and a fixed max unpacked size. |
| /// |
| /// In reality, the compression ratio usually falls in the range of 2:1 to 10:1. |
| /// We choose 20:1 to cover almost all possible cases hopefully. |
| /// Any ratio higher than this is considered as a zip bomb. |
| /// |
| /// In the future we might want to introduce a configurable size. |
| /// |
| /// Some of the real world data from common compression algorithms: |
| /// |
| /// * <https://www.zlib.net/zlib_tech.html> |
| /// * <https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf> |
| /// * <https://blog.cloudflare.com/results-experimenting-brotli/> |
| /// * <https://tukaani.org/lzma/benchmarks.html> |
| fn max_unpack_size(gctx: &GlobalContext, size: u64) -> u64 { |
| const SIZE_VAR: &str = "__CARGO_TEST_MAX_UNPACK_SIZE"; |
| const RATIO_VAR: &str = "__CARGO_TEST_MAX_UNPACK_RATIO"; |
| const MAX_UNPACK_SIZE: u64 = 512 * 1024 * 1024; // 512 MiB |
| const MAX_COMPRESSION_RATIO: usize = 20; // 20:1 |
| |
| let max_unpack_size = if cfg!(debug_assertions) && gctx.get_env(SIZE_VAR).is_ok() { |
| // For integration test only. |
| gctx.get_env(SIZE_VAR) |
| .unwrap() |
| .parse() |
| .expect("a max unpack size in bytes") |
| } else { |
| MAX_UNPACK_SIZE |
| }; |
| let max_compression_ratio = if cfg!(debug_assertions) && gctx.get_env(RATIO_VAR).is_ok() { |
| // For integration test only. |
| gctx.get_env(RATIO_VAR) |
| .unwrap() |
| .parse() |
| .expect("a max compression ratio in bytes") |
| } else { |
| MAX_COMPRESSION_RATIO |
| }; |
| |
| u64::max(max_unpack_size, size * max_compression_ratio as u64) |
| } |
| |
| /// Set the current [`umask`] value for the given tarball. No-op on non-Unix |
| /// platforms. |
| /// |
| /// On Windows, tar only looks at user permissions and tries to set the "read |
| /// only" attribute, so no-op as well. |
| /// |
| /// [`umask`]: https://man7.org/linux/man-pages/man2/umask.2.html |
| #[allow(unused_variables)] |
| fn set_mask<R: Read>(tar: &mut Archive<R>) { |
| #[cfg(unix)] |
| tar.set_mask(crate::util::get_umask()); |
| } |