Skip to content

Selenium Infrastructure

Utilities for managing WebDriver connections, Dagster resources, and browser interactions specific to the Scraper module.

Lifecycle Management

Tools to create, validate, and destroy WebDriver sessions connecting to the Selenium Grid.

Perhaps Selenium could be placed in the utils folder, and not the sources folder. Why? Here it's only present in the sources folder, but Selenium could be used in automated tests in the future. I'll leave that to my future self.

drugslm.utils.selenium

Selenium WebDriver Utilities

This module provides utilities for managing Selenium WebDriver connections, specifically for remote drivers connected to a Selenium Hub.

It includes: - A context manager 'webdriver_manager' for safe setup/teardown. - A 'validate_driver_connection' health check function. - Helper functions for UI interaction and configuration loading.

get_firefox_options()

Loads Firefox configurations from the config object and returns the options.

Returns:

Name Type Description
FirefoxOptions Options

A configured options object ready for the GeckoDriver, populated with arguments and preferences from the YAML/JSON file.

Raises:

Type Description
FileNotFoundError

If the config object is None or the underlying file is missing.

ValueError

If the file format is unsupported (handled by .load()).

Source code in drugslm/utils/selenium.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
def get_firefox_options() -> FirefoxOptions:
    """
    Loads Firefox configurations from the config object and returns the options.

    Returns:
        FirefoxOptions: A configured options object ready for the GeckoDriver,
            populated with arguments and preferences from the YAML/JSON file.

    Raises:
        FileNotFoundError: If the config object is None or the underlying file is missing.
        ValueError: If the file format is unsupported (handled by .load()).
    """

    if default_configs.browser.firefox is None:
        raise FileNotFoundError("Configuration tree is None")

    config = default_configs.browser.firefox.load()
    options = FirefoxOptions()

    arguments = config.get("arguments", [])
    if arguments:
        logger.info(f"Adding {len(arguments)} arguments to FirefoxOptions.")
        for arg in arguments:
            options.add_argument(arg)

    preferences = config.get("preferences", {})
    if preferences:
        logger.info(f"Setting {len(preferences)} preferences for FirefoxOptions.")
        for name, value in preferences.items():
            options.set_preference(name, value)

    return options

highlight(driver, element, color='green')

Highlights a specific WebElement by drawing a border around it using JS.

Parameters:

Name Type Description Default
driver WebDriver

The active Selenium WebDriver instance.

required
element WebElement

The target Selenium WebElement instance.

required
color str

The border color. Defaults to "green".

'green'
Source code in drugslm/utils/selenium.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
def highlight(driver: WebDriver, element: WebElement, color: str = "green") -> None:
    """
    Highlights a specific WebElement by drawing a border around it using JS.

    Args:
        driver (WebDriver): The active Selenium WebDriver instance.
        element (WebElement): The target Selenium WebElement instance.
        color (str, optional): The border color. Defaults to "green".
    """
    try:
        driver.execute_script(f"arguments[0].style.border='3px solid {color}';", element)
        # sleep(1)
    except JavascriptException as e:
        # Fail silently if highlighting isn't critical, but log trace for debug
        logger.info(f"Could not highlight element: {str(e).splitlines()[0]}")
        pass

scroll(driver, element)

Scrolls the page to bring the specified WebElement into view using JS.

Parameters:

Name Type Description Default
driver WebDriver

The active Selenium WebDriver instance.

required
element WebElement

The target Selenium WebElement instance.

required
Source code in drugslm/utils/selenium.py
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def scroll(driver: WebDriver, element: WebElement) -> None:
    """
    Scrolls the page to bring the specified WebElement into view using JS.

    Args:
        driver (WebDriver): The active Selenium WebDriver instance.
        element (WebElement): The target Selenium WebElement instance.
    """
    try:
        driver.execute_script(
            "arguments[0].scrollIntoView({behavior: 'smooth', block: 'center'});", element
        )
        # sleep(1)
    except JavascriptException as e:
        # Fail silently if scrolling isn't critical, but log trace for debug
        logger.info(f"Could not scroll to element: {str(e).splitlines()[0]}")
        pass

validate_driver_connection(driver)

Performs a basic health check on an active WebDriver instance.

It attempts to load a known page (google.com) and verifies that basic functionality (JS execution, element finding) is working.

Parameters:

Name Type Description Default
driver WebDriver

The WebDriver instance to test.

required

Returns:

Name Type Description
bool bool

True if the validation is successful.

Raises:

Type Description
Exception

If any assertion or step in the validation fails.

Source code in drugslm/utils/selenium.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
def validate_driver_connection(driver: WebDriver) -> bool:
    """
    Performs a basic health check on an active WebDriver instance.

    It attempts to load a known page (google.com) and verifies
    that basic functionality (JS execution, element finding) is working.

    Args:
        driver (WebDriver): The WebDriver instance to test.

    Returns:
        bool: True if the validation is successful.

    Raises:
        Exception: If any assertion or step in the validation fails.
    """
    try:
        logger.info("Validating driver connection...")
        driver.get("https://www.google.com")

        assert "Google" in driver.title, "Google.com did not load correctly."

        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "q")))

        logger.info("Driver connection validated successfully.")
        return True
    except Exception as e:
        logger.error(f"Driver health check failed: {str(e).splitlines()[0]}")
        raise

webdriver_manager(hub_url=DEFAULT_HUB_URL, browser='firefox')

Manages the life cycle of a remote WebDriver as a context manager.

Parameters:

Name Type Description Default
hub_url str

The URL of the Selenium Hub. Defaults to DEFAULT_HUB_URL.

DEFAULT_HUB_URL
browser Literal['firefox', 'chrome']

The browser name to be used. Default "firefox".

'firefox'

Yields:

Name Type Description
WebDriver WebDriver

A live, validated Selenium WebDriver instance.

Raises:

Type Description
Exception

Propagates any exception during connection, validation, or if the 'retry' attempts fail.

Source code in drugslm/utils/selenium.py
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
@contextmanager
def webdriver_manager(
    hub_url: str = DEFAULT_HUB_URL,
    browser: Literal["firefox", "chrome"] = "firefox",
) -> Iterator[WebDriver]:
    """
    Manages the life cycle of a remote WebDriver as a context manager.

    Args:
        hub_url (str): The URL of the Selenium Hub. Defaults to DEFAULT_HUB_URL.
        browser (Literal["firefox", "chrome"]): The browser name to be used. Default "firefox".

    Yields:
        WebDriver: A live, validated Selenium WebDriver instance.

    Raises:
        Exception: Propagates any exception during connection, validation,
            or if the 'retry' attempts fail.
    """
    driver = None

    if browser == "firefox":
        browser_options = get_firefox_options()
    elif browser == "chrome":
        raise NotImplementedError("Chrome not yet implemented")
    else:
        raise Exception(f"{browser} may be firefox or chrome")

    try:
        driver = _create_driver(hub_url, browser_options)
        yield driver

    except Exception as e:
        logger.error(f"Exception occurred during driver usage: {str(e).splitlines()[0]}")
        raise
    finally:
        if driver:
            logger.info(f"Closing Selenium session: {driver.session_id}")
            driver.quit()