Path traversal in spotipy

2 years ago

### Summary
If a malicious URI is passed to the library, the library can be tricked into performing an operation on a different API endpoint than intended.

### Details
The [code Spotipy uses to parse URIs and URLs ](https://github.com/spotipy-dev/spotipy/blob/master/spotipy/client.py#L1942) accepts user data too liberally which allows a malicious user to insert arbitrary characters into the path that is used for API requests. Because it is possible to include `..`, an attacker can redirect for example a track lookup via `spotifyApi.track()` to an arbitrary API endpoint like playlists, but this is possible for other endpoints as well.

Before the security advisory feature was enabled on GitHub, I was already in contact with StÃ©phane Bruckert via e-mail, and he asked me to look into a potential fix.

My recommendation is to perform stricter parsing of URLs and URIs, which I implemented in the patch included at the end of the report. If you prefer, I can also invite you to a private fork of the repository.

### PoC
The POC expects `SPOTIFY_CLIENT_ID` and `SPOTIFY_CLIENT_SECRET` environment variables to be set to authenticate against the API.
“`python
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

def main():
spotifyApi = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())

# This URL contains the example playlist ID from the spotify docs, a malicious
# playlist could instead contain a XSS payload in their title. A playlist with
# such a title was also included in the initial report via mail to maintainer.
malicious_spotify_url = ‘spotify:track:../playlists/3cEYpjA9oz9GiPac4AsH4n’

# Usage of the track function, expecting to get a non-user-controllable track name
# e.g. for displaying in a website.
# Our modified track uri however makes the library return the name of a playlist which
# may be created by anyone containing anything.
track = spotifyApi.track(malicious_spotify_url)

# Prints:
# ‘Name of the track: Spotify Web API Testing playlist’
# A malicious playlist could also have an XSS payload as title, which would result in:
# ‘Name of the track: ‘
print(f”Name of the track: {track[‘name’]}”)

if __name__ == ‘__main__’:
main()
“`

### Impact
The impact of this vulnerability depends heavily on what operations a client application performs when it handles a URI from a user and how it uses the responses it receives from the API.

### Possible Patch

#### Caviats of this patch
* The ID parsing functionality now newly raises `ValueError` if it cannot parse an ID, instead of logging a warning or silently passing back whatever it received as input.
* **WARNING** I only adjusted unit tests to expect `ValueError` that didn’t require a valid user session, other tests may also need adjustment
* Unfortunately, I could not find conclusive documentation on what constitutes a valid Spotify username, but apparently some exist that contain alphanumeric characters, mine just contains numbers and the ones of newly created accounts seem to follow the base-62 scheme. You as developers probably have deeper insight into this, otherwise it probably will have to be discovered via bug reports if additional characters are valid as well.

“`
From 30cf29b16e893dcac974dbd7481fb58a073b853c Mon Sep 17 00:00:00 2001
From: Shaderbug
Date: Tue, 10 Jan 2023 19:26:18 +0100
Subject: [PATCH] Improve URL and URI handling

—
spotipy/client.py | 61 +++++++++++++++—–
tests/integration/non_user_endpoints/test.py | 6 +-
2 files changed, 49 insertions(+), 18 deletions(-)

diff –git a/spotipy/client.py b/spotipy/client.py
index d7025a9..b094947 100644
— a/spotipy/client.py
+++ b/spotipy/client.py
@@ -6,6 +6,7 @@ __all__ = [“Spotify”, “SpotifyException”]

import json
import logging
+import re
import warnings

import requests
@@ -96,6 +97,29 @@ class Spotify(object):
“US”,
“UY”]

+ # Spotify URI scheme defined in [1], and the ID format as base-62 in [2].
+ #
+ # Unfortunately the IANA specification is out of date and doesn’t include the new types
+ # show and episode. Additionally, for the user URI, it does not specify which characters
+ # are valid for usernames, so the assumption is alphanumeric which coincidentially are also
+ # the same ones base-62 uses.
+ # In limited manual exploration this seems to hold true, as newly accounts are assigned an
+ # identifier that looks like the base-62 of all other IDs, but some older accounts only have
+ # numbers and even older ones seemed to have been allowed to freely pick this name.
+ #
+ # [1] https://www.iana.org/assignments/uri-schemes/prov/spotify
+ # [2] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+ _regex_spotify_uri = r’^spotify:(?Ptrack|artist|album|playlist|show|episode|user):(?P[0-9A-Za-z]+)$’
+
+ # Spotify URLs are defined at [1]. The assumption is made that they are all
+ # pointing to open.spotify.com, so a regex is used to parse them as well,
+ # instead of a more complex URL parsing function.
+ #
+ # [1] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+ _regex_spotify_url = r’^(http[s]?://)?open.spotify.com/(?Ptrack|artist|album|playlist|show|episode|user)/(?P[0-9A-Za-z]+)(?.*)?$’
+
+ _regex_base62 = r’^[0-9A-Za-z]+$’
+
def __init__(
self,
auth=None,
@@ -1940,20 +1964,27 @@ class Spotify(object):
return path

def _get_id(self, type, id):
– fields = id.split(“:”)
– if len(fields) >= 3:
– if type != fields[-2]:
– logger.warning(‘Expected id of type %s but found type %s %s’,
– type, fields[-2], id)
– return fields[-1].split(“?”)[0]
– fields = id.split(“/”)
– if len(fields) >= 3:
– itype = fields[-2]
– if type != itype:
– logger.warning(‘Expected id of type %s but found type %s %s’,
– type, itype, id)
– return fields[-1].split(“?”)[0]
– return id
+ uri_match = re.search(Spotify._regex_spotify_uri, id)
+ if uri_match is not None:
+ uri_match_groups = uri_match.groupdict()
+ if uri_match_groups[‘type’] != type:
+ raise ValueError(“Unexpected Spotify URI type.”)
+ else:
+ return uri_match_groups[‘id’]
+
+ url_match = re.search(Spotify._regex_spotify_url, id)
+ if url_match is not None:
+ url_match_groups = url_match.groupdict()
+ if url_match_groups[‘type’] != type:
+ raise ValueError(“Unexpected Spotify URL type.”)
+ else:
+ return url_match_groups[‘id’]
+
+ # Raw identifiers might be passed, ensure they are also base-62
+ if re.search(Spotify._regex_base62, id) is not None:
+ return id
+
+ raise ValueError(“Unsupported URL / URI”)

def _get_uri(self, type, id):
if self._is_uri(id):
@@ -1962,7 +1993,7 @@ class Spotify(object):
return “spotify:” + type + “:” + self._get_id(type, id)

def _is_uri(self, uri):
– return uri.startswith(“spotify:”) and len(uri.split(‘:’)) == 3
+ return re.search(Spotify._regex_spotify_uri, uri) is not None

def _search_multiple_markets(self, q, limit, offset, type, markets, total):
if total and limit > total:
diff –git a/tests/integration/non_user_endpoints/test.py b/tests/integration/non_user_endpoints/test.py
index 96ee4da..116e1d9 100644
— a/tests/integration/non_user_endpoints/test.py
+++ b/tests/integration/non_user_endpoints/test.py
@@ -280,7 +280,7 @@ class AuthTestSpotipy(unittest.TestCase):
try:
self.spotify.track(self.bad_id)
self.assertTrue(False)
– except SpotifyException:
+ except ValueError:
self.assertTrue(True)

def test_show_urn(self):
@@ -296,7 +296,7 @@ class AuthTestSpotipy(unittest.TestCase):
self.assertTrue(show[‘name’] == ‘Heavyweight’)

def test_show_bad_urn(self):
– with self.assertRaises(SpotifyException):
+ with self.assertRaises(ValueError):
self.spotify.show(“bogus_urn”, market=”US”)

def test_shows(self):
@@ -333,7 +333,7 @@ class AuthTestSpotipy(unittest.TestCase):
self.assertTrue(episode[‘name’] == ‘#1 Buzz’)

def test_episode_bad_urn(self):
– with self.assertRaises(SpotifyException):
+ with self.assertRaises(ValueError):
self.spotify.episode(“bogus_urn”, market=”US”)

def test_episodes(self):
—
2.34.1

“`Read More