| .\" ************************************************************************** |
| .\" * _ _ ____ _ |
| .\" * Project ___| | | | _ \| | |
| .\" * / __| | | | |_) | | |
| .\" * | (__| |_| | _ <| |___ |
| .\" * \___|\___/|_| \_\_____| |
| .\" * |
| .\" * Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al. |
| .\" * |
| .\" * This software is licensed as described in the file COPYING, which |
| .\" * you should have received as part of this distribution. The terms |
| .\" * are also available at https://curl.se/docs/copyright.html. |
| .\" * |
| .\" * You may opt to use, copy, modify, merge, publish, distribute and/or sell |
| .\" * copies of the Software, and permit persons to whom the Software is |
| .\" * furnished to do so, under the terms of the COPYING file. |
| .\" * |
| .\" * This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY |
| .\" * KIND, either express or implied. |
| .\" * |
| .\" * SPDX-License-Identifier: curl |
| .\" * |
| .\" ************************************************************************** |
| .TH curl_url_get 3 "6 Aug 2018" "libcurl" "libcurl" |
| .SH NAME |
| curl_url_get - extract a part from a URL |
| .SH SYNOPSIS |
| .nf |
| #include <curl/curl.h> |
| |
| CURLUcode curl_url_get(CURLU *url, |
| CURLUPart part, |
| char **content, |
| unsigned int flags); |
| .fi |
| .SH DESCRIPTION |
| Given a \fIurl\fP handle of a URL object, this function extracts an individual |
| piece or the full URL from it. |
| |
| The \fIpart\fP argument specifies which part to extract (see list below) and |
| \fIcontent\fP points to a 'char *' to get updated to point to a newly |
| allocated string with the contents. |
| |
| The \fIflags\fP argument is a bitmask with individual features. |
| |
| The returned content pointer must be freed with \fIcurl_free(3)\fP after use. |
| .SH FLAGS |
| The flags argument is zero, one or more bits set in a bitmask. |
| .IP CURLU_DEFAULT_PORT |
| If the handle has no port stored, this option makes \fIcurl_url_get(3)\fP |
| return the default port for the used scheme. |
| .IP CURLU_DEFAULT_SCHEME |
| If the handle has no scheme stored, this option makes \fIcurl_url_get(3)\fP |
| return the default scheme instead of error. |
| .IP CURLU_NO_DEFAULT_PORT |
| Instructs \fIcurl_url_get(3)\fP to not return a port number if it matches the |
| default port for the scheme. |
| .IP CURLU_URLDECODE |
| Asks \fIcurl_url_get(3)\fP to URL decode the contents before returning it. It |
| does not decode the scheme, the port number or the full URL. |
| |
| The query component also gets plus-to-space conversion as a bonus when this |
| bit is set. |
| |
| Note that this URL decoding is charset unaware and you get a zero terminated |
| string back with data that could be intended for a particular encoding. |
| |
| If there are byte values lower than 32 in the decoded string, the get |
| operation returns an error instead. |
| .IP CURLU_URLENCODE |
| If set, \fIcurl_url_get(3)\fP URL encodes the host name part when a full URL |
| is retrieved. If not set (default), libcurl returns the URL with the host name |
| "raw" to support IDN names to appear as-is. IDN host names are typically using |
| non-ASCII bytes that otherwise gets percent-encoded. |
| |
| Note that even when not asking for URL encoding, the '%' (byte 37) is URL |
| encoded to make sure the host name remains valid. |
| .IP CURLU_PUNYCODE |
| If set and \fICURLU_URLENCODE\fP is not set, and asked to retrieve the |
| \fBCURLUPART_HOST\fP or \fBCURLUPART_URL\fP parts, libcurl returns the host |
| name in its punycode version if it contains any non-ASCII octets (and is an |
| IDN name). |
| |
| If libcurl is built without IDN capabilities, using this bit makes |
| \fIcurl_url_get(3)\fP return \fICURLUE_LACKS_IDN\fP if the host name contains |
| anything outside the ASCII range. |
| |
| (Added in curl 7.88.0) |
| .IP CURLU_PUNY2IDN |
| If set and asked to retrieve the \fBCURLUPART_HOST\fP or \fBCURLUPART_URL\fP |
| parts, libcurl returns the host name in its IDN (International Domain Name) |
| UTF-8 version if it otherwise is a punycode version. If the punycode name |
| cannot be converted to IDN correctly, libcurl returns |
| \fICURLUE_BAD_HOSTNAME\fP. |
| |
| If libcurl is built without IDN capabilities, using this bit makes |
| \fIcurl_url_get(3)\fP return \fICURLUE_LACKS_IDN\fP if the host name is using |
| punycode. |
| |
| (Added in curl 8.3.0) |
| .SH PARTS |
| .IP CURLUPART_URL |
| When asked to return the full URL, \fIcurl_url_get(3)\fP returns a normalized |
| and possibly cleaned up version using all available URL parts. |
| |
| We advise using the \fICURLU_PUNYCODE\fP option to get the URL as "normalized" |
| as possible since IDN allows host names to be written in many different ways |
| that still end up the same punycode version. |
| .IP CURLUPART_SCHEME |
| Scheme cannot be URL decoded on get. |
| .IP CURLUPART_USER |
| .IP CURLUPART_PASSWORD |
| .IP CURLUPART_OPTIONS |
| The options field is an optional field that might follow the password in the |
| userinfo part. It is only recognized/used when parsing URLs for the following |
| schemes: pop3, smtp and imap. The URL API still allows users to set and get |
| this field independently of scheme when not parsing full URLs. |
| .IP CURLUPART_HOST |
| The host name. If it is an IPv6 numeric address, the zone id is not part of it |
| but is provided separately in \fICURLUPART_ZONEID\fP. IPv6 numerical addresses |
| are returned within brackets ([]). |
| |
| IPv6 names are normalized when set, which should make them as short as |
| possible while maintaining correct syntax. |
| .IP CURLUPART_ZONEID |
| If the host name is a numeric IPv6 address, this field might also be set. |
| .IP CURLUPART_PORT |
| A port cannot be URL decoded on get. This number is returned in a string just |
| like all other parts. That string is guaranteed to hold a valid port number in |
| ASCII using base 10. |
| .IP CURLUPART_PATH |
| The \fIpart\fP is always at least a slash ('/') even if no path was supplied |
| in the URL. A URL path always starts with a slash. |
| .IP CURLUPART_QUERY |
| The initial question mark that denotes the beginning of the query part is a |
| delimiter only. It is not part of the query contents. |
| |
| A not-present query returns \fIpart\fP set to NULL. |
| A zero-length query returns \fIpart\fP as a zero-length string. |
| |
| The query part gets pluses converted to space when asked to URL decode on get |
| with the CURLU_URLDECODE bit. |
| .IP CURLUPART_FRAGMENT |
| The initial hash sign that denotes the beginning of the fragment is a |
| delimiter only. It is not part of the fragment contents. |
| .SH EXAMPLE |
| .nf |
| CURLUcode rc; |
| CURLU *url = curl_url(); |
| rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0); |
| if(!rc) { |
| char *scheme; |
| rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0); |
| if(!rc) { |
| printf("the scheme is %s\\n", scheme); |
| curl_free(scheme); |
| } |
| curl_url_cleanup(url); |
| } |
| .fi |
| .SH AVAILABILITY |
| Added in 7.62.0. CURLUPART_ZONEID was added in 7.65.0. |
| .SH RETURN VALUE |
| Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went |
| fine. See the \fIlibcurl-errors(3)\fP man page for the full list with |
| descriptions. |
| |
| If this function returns an error, no URL part is returned. |
| .SH "SEE ALSO" |
| .BR curl_url_cleanup "(3), " curl_url "(3), " curl_url_set "(3), " |
| .BR curl_url_dup "(3), " curl_url_strerror "(3), " CURLOPT_CURLU "(3)" |