Keynotes

Day 1 (08/13/2024)

Updates from Posit

Tuesday, Aug 13 9:00 AM - 10:00 AM PDT

Quarto Updates

Keynote Info
  • Presenter: Hadley Wickham
  • Quarto dashboards
    • Cards are the fundamental unit
      • Columns / Rows
      • format: dashboard in YAML
      • Markdown headings to specify display
      • Add dashboard components
        • Value boxes (highlighting numbers)
        • Input panels (way to group together interactive input into sidebars/toolbars)
        • Add interactivity
          • Native support for Observable JS
          • Use frameworks like leaflet, plotly, threejs, etc. via htmlwidgets (R) and Jupyter Widgets (Python)
          • Add Shiny & Shiny for Python inputs and outputs directly into code cells
      • quarto.org > Guide > Dashboards
  • The future of PDF
  • Community contributions

webR

Keynote Info
  • Presenter: George Stagg
  • webR
  • R for WebAssembly; Executive R code in your web browser
  • Example: https://webr.r-wasm.org/latest/
  • WebAssembly
    • Portable binary code format
    • High-performance web applications
    • Near-native execution speed
    • Supported by most modern browsers
    • Interactive through JavaScript
  • Shinylive
  • Quarto Live | GitHub
    • Interactive documents and code exercises in Quarto powered by WebAssembly
    • format: live-html; block to webr
    • Also works on mobile devices/tablets
    • Grading code: Custom feedback algorithms
    • Dynamic Documents: WebAssembly + Observable JS
    • Works in Python

Partnerships with Databricks and Snowflake

Keynote Info
  • Presenter: James Blair
  • Infrastructure Management
    • Posit Workbench (Databricks)
  • Data Access
    • Posit Workbench (odbc to connect to Databricks/Snowflake)
  • Application Deployment (authentication; data security)
    • Posit Connect

Positron

Keynote Info
  • Presenter: Hadley Wickham
  • New multilingual IDE in Posit built on VS Code

Practical Tips for using Generative AI in Data Science Workflows

Tuesday, Aug 13 4:15 PM - 5:15 PM PDT
Keynote Info
  • GPT 4.o has great image processing capabilities
  • Math
    • Example: Processing a math equation written on paper into Quarto code
      • Prompt: Convert the text in the image into a Quarto document, using format: html and keep the same text formatting (bold, italic, underline, etc.)
      • Prompt: Convert the text in the image into a Quarto document, using format: html, and use styling to make sure that the font styles (e.g., bold, italic) and colors match the image, where possible. Align the math around the equals signs.
        • Also asking it to add styling to match pen colors

  • Data entry
    • Prompt: Convert the table in the image into a downloadable CSV.
      • Shows you an interactive preview of the dataset; can also modify it on the fly (e.g., highlight different columns/cells, provide data cleaning/formatting instructions)
    • Prompt: Simulate additional observations so that there are 1000 rows. Create two additional variables, “Years of Experience” and “Education level.” Ensure that the generated information and salaries are realistic.
    • Prompt: Provide a short overview of the dataset, including the data types for each variable.
    • Prompt: Provide summary statistics.
    • Prompt: Analyze the dataset and extract interesting facts. Use data storytelling techniques to weave together a short (75 word) paragraph that pulls together key insights.
  • Analysis
    • Prompt: Create a data visualization using the salaries dataset. Put average salary on the y-axis and job role on the x-axis. Use a different color for each job role. Create a bar chart that’s faceted by the industry variable. Add a horizontal black line showing the average salary for each industry. Format the y-axis as dollar amounts. Use a modern color palette and a minimal theme. Display a legend on the right.
      • Write prompt to be consistent with the grammar of graphics
      • Will show you code and graph (you can now run Python code directly within chat)
  • Creating slides using Quarto themes
  • Creating a logo for a hex sticker
    • Transparent background, scaled easily without becoming pixelated
    • Text to svg generators
      • Example: Recraft AI
      • Text prompt and specify height, width, color palette, how simple/detailed, art styles
      • Generate mock-ups
  • Two tools that use Generative AI to make teaching and communicating about data easier
    • Scribe: Automatically create step-by-step tutorials without copy-pasting screenshots or recording videos
      • Scribe Chrome extension to create automatic screenshots/instructions (e.g., where you clicked will be highlighted)
      • Can share it for free (direct link or by email)
      • Can embed as an iframe, for example, on a Quarto website
      • Pro-subscription for desktop version of Scribe; can export in variety of formats (including Markdown)
    • descript: Edit videos and audio/podcasts as if you’re editing a Word document
      • Upload video/audio file and it will be transcribed (available in over 20 different language)
      • Once the transcript is ready, you can edit the transcript in order to edit the video itself
      • Can remove awkward silences (shorten word gaps) and remove filler words
      • AI speech generation capabilities
        • Regenerate: You can use your voice clone to regenerate content
        • Overdub: You can use to replace wordings
      • Other AI features
        • Eye contact
        • Remove retakes
        • YouTube descriptions: from your transcript, descript will generate a YouTube-style description and provide timestamps for sections
  • Guidelines for responsible use

Day 2 (08/14/2024)

A future of data science

Wednesday, Aug 14 9:00 AM - 10:00 AM PDT
Keynote Info
  • Presenter: Allen Downey
  • [Slides]
  • Keynote speaker argues data science exists because statistics missed the boat on computers
    • Technological trigger for data science was computation because statistics as a field missed the boat on computation
  • Hype of data science
    • Technology trigger
    • Peak of inflated expectations (2012)
    • Trough of disillusionment (2016-)
    • Slope of enlightenment
    • Plateau of productivity
  • Slope of enlightenment
    • Two things that make me optimistic
      • Improving data literacy
      • More and more available data
    • One that makes me worry
      • Excessive consumption of relentless negative media

Data Wrangling [for Python or R] Like a Boss With DuckDB

Wednesday, Aug 14 4:15 PM - 5:15 PM PDT
Keynote Info
  • Presenter: Hannes Mühleisen
  • DuckDB package
    • Advanced CSV reader
    • Native support for Parquet files and Arrow structures
    • An efficient parallel vectorized query processing engine
    • Support for efficient atomic updates to tables
  • Advantages
    • Zero-dependency package available in multiple languages
    • Batteries included: Fully featured data management system
    • Fast
    • Free!
  • Simple queries
    • Read dataset (CREATE TABLE ... AS FROM Data8277.csv)
    • Compute summary statistics (SUMMARIZE FROM "Data8277.csv)
  • duckplyr R package: dplyr + DuckDB