Prior to starting a historical data migration, ensure you do the following:
- Create a project on our US or EU Cloud.
- Sign up to a paid product analytics plan on the billing page (historic imports are free but this unlocks the necessary features).
- Raise an in-app support request (Target Area: Data Management) detailing where you are sending events from, how, the total volume, and the speed. For example, "we are migrating 30M events from a self-hosted instance to EU Cloud using the migration scripts at 10k events per minute."
- Wait for the OK from our team before starting the migration process to ensure that it completes successfully and is not rate limited.
- Set the
historical_migration
option totrue
when capturing events in the migration.
Migrating from Amplitude is a two step process:
Export your data from Amplitude using the Amplitude Export API.
Import data into PostHog using PostHog's Python SDK or
batch
API with thehistorical_migration
option set totrue
.
Capture the event
According to Amplitude's Export API documentation, their event structure is:
{"event_time": UTC ISO 8601 formatted timestamp,"event_name": string,"device_id": string,"user_id": string | null,"event_properties": dict,"group_properties": dict,"user_properties": dict,// A bunch of other fields that include things like// device_carrier, city, etc....other_fields}
When capturing events into PostHog, we need to convert this schema to PostHog's:
distinct_id = user_id if user_id else device_idevent_message = {"properties": {**event_properties,**other_fields,"$set": {**user_properties},"$geoip_disable": True,},"event": event_type,"distinct_id": distinct_id,"timestamp": event_time,}
In short:
- We track the event with the same name and timestamp
- For the distinct ID, we either use the User ID if present, or the Device ID (a UUID string) if not
- We track user properties using
properties: {$set}
- We track event properties using
properties
Aliasing device IDs to user IDs
In addition to tracking the events, we want to tie users' both before and after login. For Amplitude, events before and after login look a bit like this:
Event | User ID | Device ID |
---|---|---|
Application installed | null | 551dc114-7604-430c-a42f-cf81a3059d2b |
Login | 123 | 551dc114-7604-430c-a42f-cf81a3059d2b |
Purchase | 123 | 551dc114-7604-430c-a42f-cf81a3059d2b |
We want to attribute "Application installed" to the user with ID 123, so we need to also call alias:
posthog = Posthog('<ph_project_api_key>',host='https://us.i.posthog.com',debug=True,historical_migration=True)posthog.alias(previous_id=device_id, distinct_id=user_id)
Since you only need to do this once per user, ideally you'd store a record (e.g. a SQL table) of which users you'd already sent to PostHog, so that you don't end up sending the same events multiple times.